Some of these comments are very disturbing, especially the ones where they say that dynamic flow control allows them to use one shader where they previously had to use several, and hence require less state changes, thus improving performance.
It's that kind of flawed thinking that annoys the hell out of me with a lot of game programmers out there
What they're doing is taking a very small, per-object cost (state changes), and changing it into a very big, per-pixel cost. So, instead of figuring out exactly which texture and shader path is required to render a chunk of geometry on the CPU (which excells at this type of stuff) on a per-object basis, they are submitted everything to the GPU and letting it figure out which to use on a per-pixel basis.. it's doing the same operations as before, only now you're doing it at every single pixel rather than just once.
Great "optimization" there. Now, static flow control on the other hand...
The biggest benefit I can see coming from PS3.0 is with doing normal CPU work on the GPU, especially now that FP32 is required for PS3.0 (so you can almost switch CPU and GPU algorithms out and expect 'pretty much' the same results, except for the parts where the FP32 implementation doesn't exactly cooresepond with IEEE standards).
With things progressing as they are, both on the CPU and GPU end, within a few years of pretty much any game's release it is CPU bound on the highest end of both with settings maxed. The solution here is to either A) have the IHVs support higher resolutions with more anti-aliasing or B) start putting a good amount of that CPU work on the GPU.. *even if* it's slightly faster to do it on the CPU. With dynamic flow control and all the extra instructions that PS3.0 provides, you can do a hell of a lot more CPU algorithms on the GPU now. Some may run a good deal slower on the GPU, since dynamic flow control is extremely expensive there, but by the time you release your product the GPUs will have gotten sufficiently fast to overcome that difference in many cases. There is, of course, a line to be drawn here though.. you certainly wouldn't want to swap out a CPU algo with a GPU-based algorithm that's 3x slower, but if you're not quite sure which one is actually faster, erring towards the GPU side is probably a safe bet.
Now, there's still the problem of getting the results of these calculations back to the CPU, but PCI-express should help there.