Sweeney presentation on programming

Wow, this is fascinating:

The C++/Java/C# Model:
“Shared State Concurrencyâ€￾


This is hard!
How we cope in Unreal Engine 3:
1 main thread responsible for doing all work we can’t hope to safely multithread
1 heavyweight rendering thread
A pool of 4-6 helper threads
Dynamically allocate them to simple tasks.
“Program Very Carefully!â€￾
Huge productivity burden
Scales poorly to thread counts


There must be a better way!

“Proofâ€￾: Xbox 360 games are running with 48-wide data shader programs utilizing half a Teraflop of compute power...

Isn't that a lot more flops than stated? Like double?
 
Daryl said:
Wow, this is fascinating:

Isn't that a lot more flops than stated? Like double?

He's clearly not just talking about the programmable fp power of the chip, but possibly the overall power of Xenos. But for that, the figure seems a little low, since MS talked previously about 1Tflop total - which would require Xenos to make a larger contribution than 0.5Tflops.

Anyway, it's an interesting presentation! I thought the breakdown of CPU budget for different types of code was quite interesting, and fleshes out some comments Sweeney had made about Cell previously.
 
He's clearly not just talking about the programmable fp power of the chip, but possibly the overall power of Xenos. But for that, the figure seems a little low, since MS talked previously about 1Tflop total

Yeah I remember that, but I thought those numbers were said to be ridiculously inflated using toy code. So I guess I just thought he was talking about programmable. It doesn't fit with either figure though. But I think for "usable" flops, that sounds very good.

I also found it interesting he called it 48 wide. I mean I know it is but I wonder if any advantages are found from this, rather than a pipelined system.

And yes, I knew you would like the apparantly Cell friendly part.
 
I looked at it, thanks. Unless it was an undertone in the sections on general parallelism, it doesn't directly mention Sweeny's thoughts on SLI/Crossfire solutions in the PC space. There was an Interview a mod over on www.nvnews.net had with Sweeny @ E3 last year and he briefly talked about multi-GPU being more effectively utilised in the U3 engine that it (was) with then-current engines... Don't know what the current situation is on these solutions with devs in general, but I'd love to know something :).
 
27mc1.jpg


3roxor said:
Doesn't seem like the UE3 is good for multithreaded solutions like the 360 but mainly the Ps3

I think that he is refering to X360 as an 6 threaded console and PS3 as an 8 threaded console.
15rc1.jpg


Also about the future of the games...
33su.jpg
 
Last edited by a moderator:
3roxor said:
Doesn't seem like the UE3 is good for multithreaded solutions like the 360 but mainly the Ps3.

Huh?

From what I can tell, He basically said "multithreaded is a bitch in general...software/languages/compilers have not caught up to hardware." So if anything, the more your system relies on multiple threads to get the most performance, the worse off you are.
 
That was a very interesting read! I especially thought it was interesting his musings on programming languages, garbage collection, and exceptions. Is there a transcript of his talk around anywhere?

Nite_Hawk
 
Joe DeFuria said:
Huh?

From what I can tell, He basically said "multithreaded is a bitch in general...software/languages/compilers have not caught up to hardware." So if anything, the more your system relies on multiple threads to get the most performance, the worse off you are.
Well, it depends on how much work the main thread is doing versus the heavyweight rendering thread. It appears that the ideal hardware model for the unreal3 engine would be 1 or 2 heavyweight processors with a number (4-6) lesser processors hanging around to handle the helper threads. This arguably looks more like the Cell configuration than the Xenon. Still, there may be advantages either way. Cell should have a slightly beefier PPE than any of the three cores in Xenon, but the SPEs aren't as flexible or powerful. I assume Xenon would run the Main thread on one processor, the Rendering thread on another, and the rest of the helper threads on the remaining processor and the hyperthreading "processors".

On Cell, you'd have to figure out how you are going to split the rendering thread and the main thread. Which one runs on the PPE, which on one runs on one of the SPEs? The helper threads I'd assume would run on the SPEs.

liverkick said:
Shouldn't that be "6-9 hardware threads"?
Cell in PS3 has one of the SPEs disabled, so you have the PPE+7SPEs which is probably how they got that number. I don't remember if the PPE has any kind of hyperthreading like functionality, but it appears epic isn't counting it if it does...

Nite_Hawk
 
Nite_Hawk said:
Cell in PS3 has one of the SPEs disabled, so you have the PPE+7SPEs which is probably how they got that number. I don't remember if the PPE has any kind of hyperthreading like functionality, but it appears epic isn't counting it if it does...

Nite_Hawk

The PPE can handle 2 threads by itself, IIRC.
 
london-boy said:
The PPE can handle 2 threads by itself, IIRC.

Now that you mention it, I remember some discussion about that. Infact, wasn't there some rumor that the PPE may have multiple exeuction units?

Nite_Hawk
 
That was interesting. I was struck by the use of C++ -- I would have thought that issues of data locality would arise. Items and data that are "OO-logically" grouped together would seem unlikely to be needed at the same time, and thus use up valuable cache space.

While the insights on language constructs are interesting, I'd be more convinced if the starting point was "this is how you organize your data for efficiency" followed by "these are the constructs necessary to make that ordering work" followed by the more generic issues of null, array-bounds, and type-safety/casting. Nevertheless, I'd have to agree that a generic framework to capture a mjority of these errors (which are usually off-by-one errors of one kind or another in my experience) would be a welcome addition -- certainly in the non-gaming world.
 
Nite_Hawk said:
Now that you mention it, I remember some discussion about that. Infact, wasn't there some rumor that the PPE may have multiple exeuction units?

Nite_Hawk

Thats what I was referring to.
 
Nite_Hawk said:
Now that you mention it, I remember some discussion about that. Infact, wasn't there some rumor that the PPE may have multiple exeuction units?

Nite_Hawk

Well, last time i checked, the DD2 version of Cell (which now is DD3 anyway) made the PPE the same - or very very similar - as one of the X360 cores, therefore capable of 2 threads.
Could be wrong..
 
london-boy said:
Well, last time i checked, the DD2 version of Cell (which now is DD3 anyway) made the PPE the same - or very very similar - as one of the X360 cores, therefore capable of 2 threads.
Could be wrong..

That's a bit different. Xenon allows for two hardware threads per core to feed a single execution unit similar to what happens in hyperthreading.

The rumor that I remember was that the PPE actually has multiple execution units.

Nite_Hawk
 
Nite_Hawk said:
That's a bit different. Xenon allows for two hardware threads per core to feed a single execution unit similar to what happens in hyperthreading.

The rumor that I remember was that the PPE actually has multiple execution units.

Nite_Hawk

Well i can only remember that it can handle 2 threads, and i also seem to remember that it's the "hyperthreading way". But again, i could be completely wrong.
 
Back
Top