IBM's Ashwini Nanda on Cell blades, raycasting, and more

jvd said:
How would the g5 fair if it was dual core and had one core dedicated to jpeg compression ?

If it had dual core, you would dedicate another core to raycasting, why on earth would you want to do jpeg encoding on the second core ?

The G5 workstation doesn't need jpeg encoding.
 
hi :)
jvd said:
we don't know what the jpeg over head is .

It seems to me like the test is bias in favor of the cell. Since the cell is doing something that the g5 can't do .

Adding one spe to the raycasting may not increase its performance. I do not know what the performance is like for that part with out a dedicated spe for the jpeg .

What would be faster . 1x7 with no dedicated spe for the jpeg. 1x8 with 1 dedicated spe for the jpeg , 1x7 with 1 dedicated spe for the jpeg ?

The quote i was replying to is the peformance of the cell in the ps3 will be the same as the cell in this test which i don't agree with .

I believe the jpeg compression is positively affecting performance and it seems to me important enough that it has a full spe dedicated to it .

How would the g5 fair if it was dual core and had one core dedicated to jpeg compression ?

I believe the performance would go up . Not by a factor of 50x . But i believe it would go up With out knowing the jpeg overhead its hard to say .

look, lets forget the Benchmark scenario. The only use and lesson we gamers can take from that benchmark, is that we can see that ouputed graphics at that speed in a 1x7spe configuration because ps3 (and any other console and pc games) deals with Raw images and does not have to compress them to jpeg... We will have a frame buffer to ouput the raw graphics directly to screen.

thas why i said before, that this benchmark is the same as realtime minus the Jpeg compression.
When you say the Jpeg is speeding up the benchmark, its not true, the graphics are being rendered in Raw quality, and anything you do the reduce the quality after that raw image is rendered, its only taking cpu time to do it, ergo, slowing it down.
The natural output is Raw. everything else done to that image is just consuming cycles.
IF the natural output was Jpeg, then that benchmark would not need a extra spe to do it.

but if you just want to see the Benchmark scenario forgeting that we wont need jpeg stuff, then yes that benchmark would run slower in 1x7 because it envonlves Jpeg compression. how slower, we dont know... my A64 compresses Jpeg in a blink.
how good this information is for me? nothing... i play raw images just like everyother console in the world.
 
jvd said:
I disagree from reading the paper. It clearly states what is used in the benchmarking and that this is a feature unquie to cell.

It also clearly points out that the g5 doesn't have image encoding . Yet teh cell does not have it mentioned .


pretty big assumption . Why would they devote a whole spe to it if it wasn't important to performance ?

By isolating jpeg compression + PPE flow-control (to prevent memory buffer overflow if the SPEs are too fast) to a single SPE, all the overhead is centralized. The other SPEs will just run with minimal/no overhead (x 7). Essentially the dedicated SPE is there to prevent the (overhead x 7) factor. This is not uncommon when programming super/parallel computers.

If there is no jpeg compression, then the third stage goes away (freeing up the dedicated 8th SPE). People assumed that the SPEs at the 2nd stage will now send the jpeg images to the PPE (or RSX ?) directly. Assuming that the PPE can keep up (space and time), then no major rework is necessary when ported to a cell with 7 SPEs.

Since they experienced super-linear improvements with more cells/higher megahertz, the problem is still CPU-bound. So the developers will want to keep all 7 SPEs working as much as possible (Wouldn't you ?). This means that in the worst case, the 7th SPE will have additional code partly to handle the overhead (if any !!!).

The performance number should be pretty close to 50 (47 and above, assuming the 7th SPU is at least 50% utilitized for whatever reasons). Still respectable.
 
ralexand said:
Still don't understand the point of the jpeg and why not just render it to screen and display it.

The Cell Blade that's rendering it and the G5 is displaying and taking interaction etc. The Blade and the G5 are connected over the network, so in order for the Blade to feed frames to the G5 quickly enough - at realtime rates - compression is required.

You might be asking why they didn't just hook the Blade up to a monitor and have it render directly to that. But I think much of the point of this demo was also how it was being done over the network. Not everyone can have a Cell blade on their desk, or with them, where this kind of application may be required..
 
jvd,

are you trolling?

It is hard to take you seriously with the circletimessquare formatting.

One last time, the jpeg compression is to put the frame over the network. The G5 equivalent of the jpeg compression is transferring the finished frame to the videocard framebuffer, that is it. I hope you can agree it wont need a second core to transfer a frame to a framebuffer?
 
Here is the citeseer link.

You might be interested in this paper too, someone else rediscovered the same method in the context of displacement mapping (and generalized it a bit).

Oops, second link is dead ... Ill search for the paper later.
 
MfA said:
jvd,

are you trolling?

It is hard to take you seriously with the circletimessquare formatting.

One last time, the jpeg compression is to put the frame over the network. The G5 equivalent of the jpeg compression is transferring the finished frame to the videocard framebuffer, that is it. I hope you can agree it wont need a second core to transfer a frame to a framebuffer?

ahhhhh ok, this post is a good one...because I was confused on how the G5 would handle that same setup (even though its not exactly the same, but being compared to a G5..its good to know how the G5 goes about doing that). So its pretty obvious that the G5 doesn't need a second core devoted to that, to make it a fair comparison (as jvd stated).

So is that demo supposed to show that CELL is self sufficient in doing that task (Not needing to transfer to a different device, since it can do it itself) and is much quicker than a G5 to boot?
 
MfA said:
Here is the citeseer link.

You might be interested in this paper too, someone else rediscovered the same method in the context of displacement mapping (and generalized it a bit).

Oops, second link is dead ... Ill search for the paper later.

thanks! it'd be great if you manage to find that second one too :cool:
 
BlueTsunami said:
So is that demo supposed to show that CELL is self sufficient in doing that task (Not needing to transfer to a different device, since it can do it itself) and is much quicker than a G5 to boot?

Hmm? The point is Cell does need to transfer the frame to another client, since that's the setup IBM were designed the app in mind with - a client-server app. In this case G5 is the client, so it doesn't need to transfer the frame to itself over the network if it's rendering it locally.

If you wanted to make it fair, you would have the G5 doing image encoding too ;) If the Cell Blade were the client, it could do that rendering locally too and not have to transfer the image over the network, and save that SPE for more raycasting/rendering.
 
All these JPG compression stuff is a moot point. They're just trying to isolate raycasting aspect . (IE if you take the network bandwidth out of the equation, you get closer to the real number).

Quaz51 said:
and how a dual-cell 2.4ghz can give a +108% performence boost ??

Quaz51 brought up a more important question...so what's going on here?
 
From the nature of the test, one cell has 7 SPUs devoted to raycasting. A dual-cell system doesn't need to have 2 SPUs doing jpeg compression, so would likely have 15 SPUs doing raycasting, for a 2.14x theoretical improvement (114%), so 108% isn't out of the question. Just highly efficient.
 
Back
Top