Vince said:
What happens when you don't accelerate one entity with N processors, but rather N entities with N processors concurrently. How is this influenced?
It's the value of B (the amount of the program that must operate serially). If the N entities each consume the same time as each other, then B is 0 and therefore S == N.
All multiprocessing systems follow Amdahl's law at some point. If you consider a vector system (SSE, early Cray, etc.) these do exactly the same thing on four separate items of data. Assuming the FPU's have constant latency, then B is 0 (applying Amdahl's law to the FPU's as parallel processing units). If the different logical units have non-constant latency, you become bottlenecked by the slowest unit, and you can calculate B. (In a 2-processor system, from the time that P0 took after P1 had finished or vice versa).
This is what makes general-purpose multiprocessing tricky and the architecture of the system very important. If the system has the concept that any process can be allocated to any processor, then that implies a complex communications architecture (because it is hard to ensure the process' data is in the right place for that processor). If instead there are units with specific purposes, then the entire system will tend to work at the speed of its slowest unit. Most existing VPU's are pretty much exactly like this latter case (hence my mentioning of bottleneck analysis above).
nondescript said:
I don't think parallel processing is as strongly limited as Amdahl's law suggests. The theoretical development of the argument is flawless, but I think the assumption that there is no possible way to speed up serial code is questionable.
Quite true, but the 'law' still holds. You are talking about reducing B, rather than invalidating the law.
nondescript said:
There's usually some way to speed up the code, even if its "serial". It's very rare to find a substantial section of code where each instruction depends on the instruction immediately before it - and if it did, I would question their coding methods.
Usually is perhaps a bit strong. There are many systems which are naturally suited to multiprocessing, many that aren't, and many that are somewhere in the middle.
3D graphics ARE naturally suited to massively parallel processing - what's a VPU after all? - and games are probably in the middle there, but it's likely to depend heavily on the type of game.
The problem is that there's not many ways to create parallel processing out of nowhere. Attempts at building vectorising and parallelising compilers haven't been particularly successful - they don't really reduce B by the factors necessary to provide big gains. I'm of the belief that parallel processing (particularly coarse-grained pp) has to be designed into the architecture of an application from the start, and it is not easy. Hence my assertions that we need good programmers - in fact, we need more than that, we need good software engineers, which is a subtly different skill.
(Spot the frustrated lecturer inside me trying to get out)