Okay, provocative title, and in fact misleading. Play-testing works, but you have to understand what value of 'works' is being used. Firstly, you need to understand why a designer might be motivated to have people test prototypes rather than just creating a game from whole clothe. Then you need to understand the concept of useability. Finally, you need to understand the motivations of play-testers in the context of designing a game.
So why might a designer be motivated to have people test prototypes? One motivation would be because the designer does not have the resources available to make sure that the game system is free of bugs, to check that the game works well enough. This is particularly important when designing software, because the point of computers is to allow you to build and implement relatively complex systems. Video games, for example, used to be tested to make sure that all the work of coding paid off somehow. They still are, since play-testing is a relatively cheap heuristic compared to going through code line by line.
Some people still consider this to be the beginning and end of play-testing, even when it comes to much simpler games like board games, with the question being whether the rules (the 'code') can be implemented by the players. However this attitude is not only wrong, but tends to produce lots of wasted 'work' trying to write rules that are unambiguous and clear, and accurately describe the game that the designers intend the players to play. Basically this is the perennial problem of legal writing writ large: you either have to assume that people will try misuse the documentation, or you have to assume that it will be used by people of good intentions and good faith.
However, the salient difference between writing legal documents and writing games is that elaborate legal documentation generates more income, whereas vague, unclear, and downright obtuse game documentation is often good enough, and more effort is a waste of resources. This is particularly true when a game has to be translated into other languages, and you only have a certain space in which to write them. Which means that treating a game's rulebook like a computer program's code is not only pointless, but wasteful as well.
Yes, the rules of a game should be written so that the intentions of the design can be implemented by whomsoever reads them, but thanks to the diversity of the audience, and the constraints of the product (translation, printing costs, etc), but the cost is usually prohibitive.
So play-testing doesn't work for rules-writing. There's a reason an equivalent isn't used in technical writing, and that's because it's a waste of fucking time. Technical writing, if budgeted properly, instead uses a technical edit where subject matter experts are asked to review a document or section of a document and to either give their approval, or provide feedback on content.
Useability testing is not like play-testing in the sense of play-testing to check the rules for errors, but in the sense of play-testing to see how the testers use the product, whether they enjoy it, and find it useful. So yes, play-testing works if the value of work that you're asking for is how people feel about game, rather than whether the game 'works', or whether the game rules are 'clear'.
Which means that you have to understand the people giving the feedback, which in the game development process means that you need to understand how people give feedback. Some testers, for example, will believe that they should be helping to write the game, others that they would prefer a different game, and still others that they are part of the design team.
For example, suppose a game involved a dice roll at some point. The tester believing that he is helping you write the game will give feedback about how you should use consistent language for indicating that the players need to roll the dice like that. The tester believing that they would rather play a different game will suggest that the dice roll should be swapped out with another mechanic, or that the whole game should be re-jigged to avoid the necessity of that roll, or some other inane suggestion. The tester that believes they are on the design team will tell you about how the roll doesn't 'work' and how you should 'fix' it. And then these testers will all be offended when you don't change the game according to their suggestions because more people were either neutral or positive about rolling that dice and gave useful feedback about whether they enjoyed it or not, rather than trying to tell you what you should do.
Once you understand that play-testers aren't privy to the design brief, and that they're not designing the game, and that they're basically there to tell you about their experience with the game rather than their opinion of the design, then you can understand why play-testing doesn't work if you are using it to check for bugs. You can also understand why play-testing works in determining whether players will actually enjoy playing the game, which is an aspect of games curiously ignored by many game designers, if you are using it to gauge how an audience will react to your game.
The implication being that if you are a Warhammer player 'play-testing' an army, you're basically wasting your time, just as much as if you were a dice player rolling test to test whether they will be lucky or not. You can't 'test' a game to see if it works or will work for you, particularly if you don't have a design brief or set of metrics indicating what constitutes "work". Some people say "practice makes perfect", which is false: only perfect practice makes perfect, meaning that you need to understand what perfection entails before you can practice for it.
Of course, the nice thing about games is that once you have an appropriate metric to measure success, you can work backwards to establishing your strategy, and you can skip the testing part. Essentially the notion is instead of wandering around looking for something, you figure out what it is, factor in that it will be in the last place you'll look, figure out where it is, and go there first without all the dicking around. That said, the dicking around can be pretty fun, and while testing is pointless when you can just deduce the answer, training is useful.
On the other hand, I've felt that there's something charming to watching a small child test whether 2+2 always equals 4 (protip kid, it doesn't!).