IBM's Ashwini Nanda on Cell blades, raycasting, and more

one said:
:???: Why do you ignore what I wrote here?
http://www.beyond3d.com/forum/showthread.php?p=538638#post538596
3.2 8-SPE Cell just gets 50 even when you don't do JPEG encoding and the 8th SPE is idle.
because you have ye to show me where you get tehse numbers from. The paper clearly states that in all tests they are using all 8 . The numbers posted show no mention of the 8th one idle nor do the papers

It says on page 4

TRE Server
The TRE server, implemented on the STIDC Bring-up
boards, runs in both a uni-processor (UP) and
symmetric multi-processor (SMP) mode [figure 5].
The server scales across as many SPEs as there are
available in the system. One SPE is reserved for image
compression and the remaining available SPEs run ray
kernels. Three threads execute on the available Muti-
Threading (MT) slots of the PPE. The three threads
responsibilities break down as follows:

Thus a 1x8 will have 1 spe reserved for image compression and the other 7 used to run the ray kernals . The cell in the ps3 is 1x7 with 1 reserved for image compression you now have 6 left for the ray kernels .

So how do you get the same performance when you take away a whole spe ?
 
Last edited by a moderator:
jvd said:
Thus the numbers aren't compariable unless you think the ps3 suddenly has an 8th spe
The point is that the G5 was not doing image encoding at all - the performance index given would be for raycasting alone - 7SPEs.
 
jvd said:
because you have ye to show me where you get tehse numbers from. The paper clearly states that in all tests they are using all 8 . The numbers posted show no mention of the 8th one idle nor do the papers

Not to put words into One's mouth, but I think it's farily easy to see where he is coming from.

If you leave out the 1xSPE Image Compression Kernel, then you might be able to assume* that the PS3 will be able to attain the power 'rating' of 50.

On the safe assumption that, if taking Image Compression out of the picture you are not losing performance.
 
Fafalada said:
The point is that the G5 was not doing image encoding at all - the performance index given would be for raycasting alone - 7SPEs.


I disagree from reading the paper. It clearly states what is used in the benchmarking and that this is a feature unquie to cell.

It also clearly points out that the g5 doesn't have image encoding . Yet teh cell does not have it mentioned .


Not to put words into One's mouth, but I think it's farily easy to see where he is coming from.

If you leave out the 1xSPE Image Compression Kernel, then you might be able to assume* that the PS3 will be able to attain the power 'rating' of 50.

On the safe assumption that, if taking Image Compression out of the picture you are not losing performance.
pretty big assumption . Why would they devote a whole spe to it if it wasn't important to performance ?
 
jvd said:
because you have ye to show me where you get tehse numbers from. The paper clearly states that in all tests they are using all 8 . The numbers posted show no mention of the 8th one idle nor do the papers
The paper states that they use 7 SPEs for ray-kernel. As you don't need network transmission via narrow GbE ethernet in PS3 Cell-to-framebuffer you get the same quality (50) flyby images on PS3 with 7SPEs all assigned for ray-kernel. That's all what I'd like to say :rolleyes:
 
I think he's stating the case that if jpeg compression were not needed due to the higher bandwith of the gigabit port then all 7 of the PS3's SPEs could be used for ray casting kernels which would result in a score of 50...whatever that is. I'd assume the resulting image would look better as well if this were possible.

To clarify, I think he's stating the compression was necessary only due to the limiting bandwith of the network where as with the PS3 you would not need to do this with a gigabit port...or possibly since you could output straight to screen (I add that last part).

If this can't be done...I don't know why not, but things should still scale comparably and even so would still be impressive in any case.

It would be helpful if the numeric of 50 were a bit more descriptive....50 what...FPS...times the performance of the G5....applies pies...50 what?

edit:

Oops he clarified while I was typing :)
 
Last edited by a moderator:
jvd said:
I disagree from reading the paper. It clearly states what is used in the benchmarking and that this is a feature unquie to cell.
The point of interest here was raycasting performance, not the JPEG encode overhead.
The paper explicitly states one SPE encodes raycasting output generated by the other 7. So according to their performance index - 7SPEs@3.2ghz raycast ~50xfaster then 2Ghz G5.
 
So it the 50 number to be interpreted as 50 times the G5 as the G5 is rated to 1? or could it be FPS?

The E3 vids are all 30FPS so I can't used them to figure out anything unless the framerate noticeably dips below this. The speaker does note the terrain vid is realtime, and it looks well enough to be 30FPS to me...could be.

edit:

guess I'll go with Faf's interpretation of the 50 number. I type too slow for you guys...
 
Last edited by a moderator:
Fafalada said:
The point of interest here was raycasting performance, not the JPEG encode overhead.
The paper explicitly states one SPE encodes raycasting output generated by the other 7. So according to their performance index - 7SPEs@3.2ghz raycast ~50xfaster then 2Ghz G5.


Sorry fafalada . I see nothing to agree with this . We do not know how effective the cell is when the jpeg encode overhead is not taken care of by a spe .

It could be 50x faster than a 2ghz g5. Then again it might not be . Perhaps if they gave us a refernce to a cell only using 7 spes and leaving the 8th idle with no jpeg encode we can discuss it . But its apparently important enough for them to dedicate a whole spe for it
 
jvd said:
Sorry fafalada . I see nothing to agree with this . We do not know how effective the cell is when the jpeg encode overhead is not taken care of by a spe .

It could be 50x faster than a 2ghz g5. Then again it might not be . Perhaps if they gave us a refernce to a cell only using 7 spes and leaving the 8th idle with no jpeg encode we can discuss it . But its apparently important enough for them to dedicate a whole spe for it

But the other point to the argument is, if running on PS3, it wouldn't need to compress to JPEG (I believe the only reason they do compress to JPEG is for transmission over network), hence the need for 1 SPE to compress to JPEG is now free to add to the ray-casting kernel.
 
So it the 50 number to be interpreted as 50 times the G5 as the G5 is rated to 1? or could it be FPS?
Unfortunately they state "relative image rate" so if G5 was running at 1fps, then it's fps, otherwise it would be faster/slower.

jvd said:
Sorry fafalada . I see nothing to agree with this . We do not know how effective the cell is when the jpeg encode overhead is not taken care of by a spe
That's fine, but I don't care what JPEG overhead is, I was talking about raycasting performance only.
If one ever used Cell for rendering in a game you can rest assured they wouldn't compress output into JPEG (unless you find some perverse pleasure in compressing a rendered frame only to uncompress it right after on the GPU to display it).
 
jvd said:
Sorry fafalada . I see nothing to agree with this . We do not know how effective the cell is when the jpeg encode overhead is not taken care of by a spe .

It could be 50x faster than a 2ghz g5. Then again it might not be . Perhaps if they gave us a refernce to a cell only using 7 spes and leaving the 8th idle with no jpeg encode we can discuss it . But its apparently important enough for them to dedicate a whole spe for it

Sorry, but I don't see why you are so obsessed about the JPEG compression in this study? Surely the terrain creation is the relevant part here?
 
Faf:
Thanks, I think I got it.

If I understand correctly the thread on the PPE that delivers the compressed jpegs to the network layer could be eliminated completely. If the uncompressed image is already sitting in a buffer would you really need a whole thread just so get it on screen?

Even if this could be...it may not speed things up as far as rendering as the real work is still done on the SPEs, but you could still use the remaining PPE resources for some other task.

Interesting stuff.

Still in a game you can't allocate all your SPEs to ray casting. I'd hope that G5 could manage 1 or 2 FPS so you could use fewer SPEs and still retain a decent overall FPS. 1FPS normally doesn't sound like wishful thinking...but it is ray casting we're talking about here. Perhaps this could still be used on a smaller scale even if the G5 couldn't get a frame done in one second.
 
Last edited by a moderator:
Your going to need to get the jpeg from somewhere and if not from another spe on the ring buss then from the flexio which would be sloewr and increase latancy .
 
jvd, I don't think the jpeg compression (positively) affects the raycasting performance. It's simply the final image being output to the client. Judging by the performance comparison in the paper, where the G5 is doing no image encoding and only raycasting, 7 SPEs are performing 50x as fast as the G5. How do you think the image encoding is affecting performance?

To put it another way, do you not think that 8 SPEs working on raycasting would yield >50x performance increase?

The G5 comparison, if anything, is biased against Cell, since one of the SPEs is doing something that the G5 isn't. Unless there's another way in which that image encoding is optimising performance that I can't see?
 
Last edited by a moderator:
we don't know what the jpeg over head is .

It seems to me like the test is bias in favor of the cell. Since the cell is doing something that the g5 can't do .

Adding one spe to the raycasting may not increase its performance. I do not know what the performance is like for that part with out a dedicated spe for the jpeg .

What would be faster . 1x7 with no dedicated spe for the jpeg. 1x8 with 1 dedicated spe for the jpeg , 1x7 with 1 dedicated spe for the jpeg ?

The quote i was replying to is the peformance of the cell in the ps3 will be the same as the cell in this test which i don't agree with .

I believe the jpeg compression is positively affecting performance and it seems to me important enough that it has a full spe dedicated to it .

How would the g5 fair if it was dual core and had one core dedicated to jpeg compression ?

I believe the performance would go up . Not by a factor of 50x . But i believe it would go up With out knowing the jpeg overhead its hard to say .
 
A G5 can't compress/encode a jpeg?

In any case, the paper states no image encoding was done on the G5...not that it only didn't use compression. This suggests to me the G5 built up the data in the buffer with the results from ray casting, but did not go any further.

It appears the Cell did as there was no special note by the Cell numbers to indicate that the Cell did no image encoding. There would be no need for noting the G5 did not do image encoding in all cases this was true; if you did this a note at the top or elsewhere would suffice and make more sense.

It would appear that the test is biased from the start against Cell. Why this bias could be acceptable is that Cell is reported to be able to decode/encode several HD video streams simultaneously and with noting this encoding a single..err...line of "probably" less than HD rez jpegs was an acceptably small bias against the Cell's performance. Also, as the G5 was not intended to be demoed, but the Cell, and given that ray casting is expensive enough in itself it makes sense that image encoding was eliminated on the G5.
 
Last edited by a moderator:
They used image compression because over gigabit you can only transfer 45 fps worth of frames at 100% usage, and you are unlikely to get anywhere near that ... so the network would bottleneck.
 
jvd said:
we don't know what the jpeg over head is .

Any overhead is only going to count against Cell, since it's taking it on, and the G5 is not.


jvd said:
It seems to me like the test is bias in favor of the cell. Since the cell is doing something that the g5 can't do .

No it's not, it's just doing something the G5 isn't doing. And it makes sense, because doing the raycasting on the G5 is doing it locally on the client..there's no point in encoding the image and putting it over the network to another machine, since you're doing it on the machine that's displaying/controlling the demo.

jvd said:
Adding one spe to the raycasting may not increase its performance. I do not know what the performance is like for that part with out a dedicated spe for the jpeg .

The jpeg/image encoding does not affect the raycasting performance (AFAIK?). It's simply done to output the image resulting from the raycasting out over the network, because if they left it uncompressed, it'd be difficult to put out x frames per second over the network to the client. I mean, a 720p frame uncompress = ~3.5MB. Multiply that by even 30fps and you're hitting ~850Mbit/s. Considering you don't get near the theoretical 1Gbit/s, it's easy to see why compression is needed.

Someone else want to confirm that it's simply output manipulation and has nothing to do with the raycasting?

jvd said:
What would be faster . 1x7 with no dedicated spe for the jpeg. 1x8 with 1 dedicated spe for the jpeg

As far as raycasting is concerned, these would be the same as far as I can see.

jvd said:
1x7 with 1 dedicated spe for the jpeg ?

This is a reference to the PS3 CPU, I assume, but you wouldn't be doing the "jpeg bit" if doing similar work on PS3. You wouldn't be feeding the frames over the network to someone else, at least not in a game scenario. You would not need jpeg/image encoding, like the G5, and thus performance would be the same as in the 8 SPE setup (with one SPE taken for image encoding).

jvd said:
I believe the jpeg compression is positively affecting performance and it seems to me important enough that it has a full spe dedicated to it .

The only reason it's being done, as said many times before, is to accomodate passing frames over the network at realtime rates. The cell blade is doing the rendering, and passing the frames to the client. How does this positively affect the raycasting performance? If anything it's negative, since it's taking a SPE away from raycasting/rendering.

jvd said:
How would the g5 fair if it was dual core and had one core dedicated to jpeg compression ?

I believe the performance would go up . Not by a factor of 50x . But i believe it would go up With out knowing the jpeg overhead its hard to say .

jvd - the G5 doesn't have to worry about image encoding! It isn't doing any! It doesn't have to pass the image out to another machine. I thus don't think it would help its performance at all - only hurt it, in fact - unless you can specifically point out how image encoding is helping the raycasting/rendering performance.
 
Last edited by a moderator:
Back
Top