This is the best source of information I could find so far:
http://www.behardware.com/articles/497-7/5-1-headsets-comparative-test.html
From that review, I'm getting the impression that right now what is holding these headphones back is having to be or wanting to be cheap enough for mass market appeal. For that, using three to four speakers on each side is going to be a problem. Also, they seem to have issues with transitions in particular. I think personally that has to do with more research having gone into the Dolby Headphone technology, which 'knows' more about how the human brain interprets surround (like the wiki article you linked to), as transitions are more likely to be a result of interpretation than basic, static physical location of a sound (just as there is a complex combination of interpretation, interpolation, prediction and perception in the human visual process).
Citing that the 2 speaker based simulations are so far unsuccessful at locating a static sound at the back of the head reinforces this impression a little. Then again, common sense does seem to suggest that you should be able to recreate a full surround experience using stereo speakers, because after all we just have two ear-drums, so it's an interesting subject and I admit not having given it sufficient thought.
Still, so far I get the impression from these reviews that in actual positional audio reproduction, the best two surround headphones (among which the Zalman ZM-RS6F) have an edge over simulated surround when it comes to static positioning, especially for rear-speaker sound. However, overcoming the problems of balancing the three speakers and then enhancing the perception of movement using an algorhithm comparable to that which the algorithm based headphones use, could be very hard to pull off. In that respect, I'd almost think that in-ear headphones should be able to produce the best results, better than covering headphones, because they will be able to use the most predictable output-to-eardrum transmission enhanced by the various algorhythms. It seems cleaner that way and easier to overcome individual differences. But it would still be interesting to know more about why multi-speaker sets are so much better at producing sounds coming from the rear, and if that is the result of an as yet not well understood interpretation aspect, or maybe that there is an overlooked physical aspect at play.
Certainly, hearing stuff coming from the rear probably has at least some kind of advantage, but then there is also probably a reason why cats can turn their ears independently.
Sound reproduction in games:
Thanks to the taut and present mediums, the ZM-RS6F gave a good impression in games. The horizontal positioning of the speakers seemed to work with 3D sound. The front or rear sound position worked, even if transitions still have a lack of subtlety due to the close positioning of the speakers.
Sound reproduction with video DVD:
The ZM-RS6F had good results with 5.1 sound track movies. The restitution of effects on sound positions is good and the positions clearly identifiable. However, sound transitions between two positions(an object going from the front to the back) is still abrupt and lacked of subtlety. This doesn´t happen with speaker sets as sounds progressively fade away in the front reappear in the back with surround speakers.