Not sure where you guys are coming from, but 100Petaflops is perfectly possbile even with this CELL. Human simulation is a massively parallel problem by nature (cf. the brain parallelism between modules, and the memory/influx systems) - although also an horribly inefficient one to simulate.
I think the key reason why it's safe to say a computer being hundreds of times smarter than any human only requires 10x more FLOPS (and I disagree with the FLOPS measurement of this, it isn't FLOPS limited imo), is that the vast majority of humans' capabilities are special-purpose.
Yet, the chances of a chip simultating neuron-for-neuron a normal human are absolutely nil (differs from person to person, from second to second, and there's several orders of magnitude more neurons than transistors, and they're more complicated than transistors...). This implies that such a scheme would use general purpose processing power for highly special-purpose things (example: the vision part of the brain & the related influxes).
So, if you create something solely based on "intelligence", using text information and sending text and/or commands, you can achieve significantly more impressive results for a given amount of power. Problem though remains that unlike some of the special-purpose systems of the brain you need to simulate, the general-purpose ones are much more memory/HD-hungry, so much in fact that I doubt current technology would be sufficient, even scaled 10000x in a cluster.
IMO, the only thing preventing proper human-like simulation TODAY is the low speed of HDs. You either need a ludicrous amount of RAM, or, well, you're screwed. That's why I'm personally especially interested in Colossal-Storage-like technologies, since those promise an unified scheme for all Read/Write systems in the long-term, so you'd get at least near-RAM speed on a HD with 100Terabytes. As soon as you've got that, human-brain-like preprocessing schemes can become a reality, siignificantly simplifying the simulation problem.
From my POV, it also seems a bit naive to look at researches willing to simulate the brain's response scheme etc. - it tends to be rather inefficient, badly fit for current computation models, etc. - which is why I'd be more interested to see a scheme with self-recompiling-code that is capable of module creation and logic-linking. IMO, that can scale much better, especially so since specific modules can be coded by actual humans.
And it also gives better control over the entire architecture after it starts "running", at the (significant) condition it manages to properly separate parts of its functions.
Another big limitation obviously is branching performance, but I would tend to believe there has to be an amazingly efficient solution for such a thing in a system with hundreds of thousands of parallel threads, since the brain manages it; after all, if you think about it at a neuron level, the amount of parallelism the brain manages is downright insane, and beyond what any computer has ever done, even per-thread.
Anyhow, enough gibberish about things that don't interest you all - can I join the mass and ask "Where's our RSX information, goddamnit?!"
Uttar