Is multiple core technology really needed for next-gen

MfA · Feb 20, 2004

I think AI, physics+collision-detection and rendering can all be efficiently parallelized ... the percentage of serial code is small enough not to need to worry about Amdahls law.

If you want to establish the relevance of Amdahls law then pick a computationally relevant task which you think wont scale.

nondescript · Feb 20, 2004

I don't think parallel processing is as strongly limited as Amdahl's law suggests. The theoretical development of the argument is flawless, but I think the assumption that there is no possible way to speed up serial code is questionable.

Just take this example: 5120 processors, and 0.1% serial code - using your spreadsheet - you get 16.34% efficiency. Yet the Earth Simulator can get 87.5% efficiency(link). By Amdahl's law, that would require that less than 0.004% of the code be serial. So how do you define serial code?

There's usually some way to speed up the code, even if its "serial". It's very rare to find a substantial section of code where each instruction depends on the instruction immediately before it - and if it did, I would question their coding methods.

I would be hesitant to take this law at face value, because it is not based on any empirical evidence (the opposite of, say, Moore's law, which is purely empirical with no attempt at justification). This law is based on a set of theoretical assumptions, which themselves are questionable.

Dio · Feb 20, 2004

Vince said:
What happens when you don't accelerate one entity with N processors, but rather N entities with N processors concurrently. How is this influenced?

It's the value of B (the amount of the program that must operate serially). If the N entities each consume the same time as each other, then B is 0 and therefore S == N.

All multiprocessing systems follow Amdahl's law at some point. If you consider a vector system (SSE, early Cray, etc.) these do exactly the same thing on four separate items of data. Assuming the FPU's have constant latency, then B is 0 (applying Amdahl's law to the FPU's as parallel processing units). If the different logical units have non-constant latency, you become bottlenecked by the slowest unit, and you can calculate B. (In a 2-processor system, from the time that P0 took after P1 had finished or vice versa).

This is what makes general-purpose multiprocessing tricky and the architecture of the system very important. If the system has the concept that any process can be allocated to any processor, then that implies a complex communications architecture (because it is hard to ensure the process' data is in the right place for that processor). If instead there are units with specific purposes, then the entire system will tend to work at the speed of its slowest unit. Most existing VPU's are pretty much exactly like this latter case (hence my mentioning of bottleneck analysis above).

nondescript said:
I don't think parallel processing is as strongly limited as Amdahl's law suggests. The theoretical development of the argument is flawless, but I think the assumption that there is no possible way to speed up serial code is questionable.

Quite true, but the 'law' still holds. You are talking about reducing B, rather than invalidating the law.

nondescript said:
There's usually some way to speed up the code, even if its "serial". It's very rare to find a substantial section of code where each instruction depends on the instruction immediately before it - and if it did, I would question their coding methods.

Usually is perhaps a bit strong. There are many systems which are naturally suited to multiprocessing, many that aren't, and many that are somewhere in the middle.

3D graphics ARE naturally suited to massively parallel processing - what's a VPU after all? - and games are probably in the middle there, but it's likely to depend heavily on the type of game.

The problem is that there's not many ways to create parallel processing out of nowhere. Attempts at building vectorising and parallelising compilers haven't been particularly successful - they don't really reduce B by the factors necessary to provide big gains. I'm of the belief that parallel processing (particularly coarse-grained pp) has to be designed into the architecture of an application from the start, and it is not easy. Hence my assertions that we need good programmers - in fact, we need more than that, we need good software engineers, which is a subtly different skill.

(Spot the frustrated lecturer inside me trying to get out)

Dio · Feb 20, 2004

MfA said:
I think AI, physics+collision-detection and rendering can all be efficiently parallelized ... the percentage of serial code is small enough not to need to worry about Amdahls law.

Probably true, but note that all of these different tasks are quite 'bursty' - doing nothing for long periods then working hard all at once.

Actually, working together all at once isn't a problem. What would be relatively 'bad' is if these don't actually all rise in load together - because depending on the system and application architecture it might be hard to reallocate resources from one basic functional task to another. It seems likely to me that for games, three of those are reasonaly well linked so I imagine this will not be a major issue.

Data transfer between tasks might be more so. Obviously, AI, physics and rendering need different copies of the world state or synchronisation, neither of which is 'cheap'. Synchronisation limits parallelism and requires shared (and therefore maybe contended) memory, but at least can be fine-grained; data copies both incur resource cost to perform copies and consumes additional RAM (likely to be an issue on console).

Again, depends on the architectures as to what is the right way to do it. It's all back to having good comms - I would expect a massively multiprocessing console to have dedicated hardware to assist / solve the data transfer issue.

nondescript · Feb 20, 2004

Dio said:
nondescript said:

I don't think parallel processing is as strongly limited as Amdahl's law suggests. The theoretical development of the argument is flawless, but I think the assumption that there is no possible way to speed up serial code is questionable.

Click to expand...

Quite true, but the 'law' still holds. You are talking about reducing B, rather than invalidating the law.

I hate using math when it is not necessary - but since you are, I will:

No, I was talking about B as a function of N, B(N), which is quite different from just reducing B. This requires modification of the law. The point of Amdahl's "law" is that the serial code kills efficency for large multiprocessing systems, because the serial code, by definition, does not speed up with multiple processors. I argue that this is invalid way to model a computational task. As the Earth Simulator example shows, this definition of serial code is so restrictive as to make the "law" virtually meaningless.

It's an interesting theoretical exercise to show the strong dependence of parallel performance based on serial code, but I think it is of little practical value. As others have said, 3D graphics are highly parallizable, so is AI, Physics, etc...so what's left? Certainly Amdahl's law seems to be no barrier to all the supercomputers out there.

Dio said:
nondescript said:

There's usually some way to speed up the code, even if its "serial". It's very rare to find a substantial section of code where each instruction depends on the instruction immediately before it - and if it did, I would question their coding methods.

Click to expand...

Usually is perhaps a bit strong. There are many systems which are naturally suited to multiprocessing, many that aren't, and many that are somewhere in the middle.

3D graphics ARE naturally suited to massively parallel processing - what's a VPU after all? - and games are probably in the middle there, but it's likely to depend heavily on the type of game.

The problem is that there's not many ways to create parallel processing out of nowhere. Attempts at building vectorising and parallelising compilers haven't been particularly successful - they don't really reduce B by the factors necessary to provide big gains. I'm of the belief that parallel processing (particularly coarse-grained pp) has to be designed into the architecture of an application from the start, and it is not easy. Hence my assertions that we need good programmers - in fact, we need more than that, we need good software engineers, which is a subtly different skill.

(Spot the frustrated lecturer inside me trying to get out)

Well, of course ...

(I know all about being a frustrated lecturer.

)

nobie · Feb 20, 2004

nondescript said:
Dio said:

nondescript said:

I don't think parallel processing is as strongly limited as Amdahl's law suggests. The theoretical development of the argument is flawless, but I think the assumption that there is no possible way to speed up serial code is questionable.

Click to expand...

Quite true, but the 'law' still holds. You are talking about reducing B, rather than invalidating the law.

Click to expand...

I hate using math when it is not necessary - but since you are, I will:

No, I was talking about B as a function of N, B(N), which is quite different from just reducing B. This requires modification of the law. The point of Amdahl's "law" is that the serial code kills efficency for large multiprocessing systems, because the serial code, by definition, does not speed up with multiple processors. I argue that this is invalid way to model a computational task. As the Earth Simulator example shows, this definition of serial code is so restrictive as to make the "law" virtually meaningless.

The situations supercomputers operate under are somewhat exotic in that they are not so much running a "program" as simply crunching through data. A typical game runs a great deal of non-serial code, but their is always the "logic" which is usually if-then-else style code. For a typical application to get near 90% non-serial code is extraordinarily rare. And even if a program is only 1% serial in nature, and the non-serial portions are infinitely fast, that 1% "becomes" 100% of the runtime.

Dio · Feb 20, 2004

It's certainly true that you would not write a serial program and a multiprocessing program in an identical way, which therefore does imply that B is a function of N. It also implies that the basic runtime varies with N, as it is probable the multiprocessing code will have additional inefficiencies than the single processing code, although these may be very small.

I totally agree with the statement on supercomputers. While 3D rendering is a very supercomputer-suited operation - supercomputers thrive on independent elements to do complex operations on, and one million pixels are excellent for that - games are probably less so.

MfA · Feb 20, 2004

Meh, I dont see any part of the game loop which is serial and for which the complexity will increase over time ... and since framerate is pretty much constant, if one processor can handle those parts now one processor can certainly handle them tomorrow.

nondescript · Feb 20, 2004

Dio said:
It's certainly true that you would not write a serial program and a multiprocessing program in an identical way, which therefore does imply that B is a function of N. It also implies that the basic runtime varies with N, as it is probable the multiprocessing code will have additional inefficiencies than the single processing code, although these may be very small.

I totally agree with the statement on supercomputers. While 3D rendering is a very supercomputer-suited operation - supercomputers thrive on independent elements to do complex operations on, and one million pixels are excellent for that - games are probably less so.

So...you agree with me now, right?

Dio · Feb 20, 2004

{in cheesy Austrian accent} "You seem to have a problem needing affirmation in your life...."

I try to avoid reducing things to black and white. They rarely are. I certainly agree that Amdahl's law applies to a closed system and the system doesn't have to be closed, so the 'hard fact' of the law doesn't quite apply if you can change the system. However, if the system is closed such that the only variable is N, Amdahl's law applies.

nondescript · Feb 20, 2004

Dio said:
{in cheesy Austrian accent} "You seem to have a problem needing affirmation in your life...."

Lol.

Just wondering if you had any point of contention left to discuss. Seeing as there is none, I can go out of "waiting for reply mode" and stop checking this thread every half-hour.

I try to avoid reducing things to black and white. They rarely are.

You don't say...

I certainly agree that Amdahl's law applies to a closed system and the system doesn't have to be closed, so the 'hard fact' of the law doesn't quite apply if you can change the system.

Three questions - what the hell do you mean by closed system? Is a PS2 a "closed system"? Is a tossed-caesar-salad-with-bacon-bits-and-tomato-soup-on-the-side a "closed system"?

However, if the system is closed such that the only variable is N, Amdahl's law applies.

Well, until you define a closed system, this statement is meaningless.

(Doh, I think I need to go back in to "waiting for reply mode")

Dio · Feb 21, 2004

I was taking an analogy from thermodynamics, where the laws of thermodynamics apply only in "closed systems" i.e. where there is no external effect.

A PS2 is not a closed system because you can run different programs, but neither is it one in which Amdahl's law is convenient or particularly useful to apply. The same goes for R300, of course.

Amdahl's law is qualitatively this: "If I have this app which runs on this many CPU's how much can I gain from adding more CPU's, and what do I have to do to the program to make it run better from adding more CPU's." The former clause is the variance of N, the latter clause is the variance of B. It is the whole of "program plus hardware" which is a closed system and on which the law applies.

Is it a useful thing to be discussing? Well, if the question is "Would it be better to have 1 CPU or 2 CPU's" maybe not, because programs coded for a 2 CPU system would likely have radically different B than one for a 1 CPU system. "Would it be better to have 4 CPU's or 44 CPU's" is where Amdahl's law really comes into play, because the assumption is that by the time you have 4 CPU's you've already put a lot of effort into reducing B.

Yeah, this is rather OT for the original post.

Wunderchu · Mar 14, 2004

Deano Calver, on "Multi-processing"

http://www.beyond3d.com/forum/viewtopic.php?t=10761

Is multiple core technology really needed for next-gen

Similar threads