Acert93 said:
Maybe you can explain how counting the SPE LS (256K * 7) as main memory bandwidth is any different from Major Nelson counting the eDRAM as system memory?
This isn't about counting the BW on SPE LS (if it were, the bandwidth between SPU's and LS is astronomical
). If I'm right in understanding Jaws he's not talking about using the combined Cell SPE's LS as BackBuffer cache in the same way the eDRAM is. He's talkng about bypassing the use of main memory BW by outputting 3D data straight from CPU to GPU through a seperate set of pipes.
The chief distinction between this and the infamous Major Nelson paper on Xenos is the broad representation of the internal BW figure as additional to the entire render process, instead of it's true nature which is only
part of the render process. If I try knocking up a table, though don't count on it to be accurate - only representative...
For PS3...
Code:
Render Phase Where Occurs Bandwidth Consumed
Create Geom Cell Internal (xxx GB/s)
Pass geom to GPU EIB EIB (35 GB/s)
V and P shading RSX Internal (xxx GB/s)
Write to backbuffer RAM DDR/+XDR (22/47 GB/s)
Process Backbuffer RSX/+Cell? DDR/+XDR (22/47 GB/s)
Write FrontBuffer RSX DDR/+XDR (22/47 GB/s)
For XB360...
Code:
Render Phase Where Occurs Bandwidth Consumed
Create Geom XeCPU Internal (xxx GB/s)
Pass geom to GPU RAM DDR (22 GB/s)
V and P shading XeGPU Internal (xxx GB/s)
Write to backbuffer Xenos Smart-eDRAM interconnect (35 GB/s)
Process Backbuffer Smart-eDRAM Internal (xxx GB/s)
Write FrontBuffer RAM DDR (22 GB/s)
In the case of PS3, main RAM BW is thrashed in 3 of the steps (actually I missed texture fetches, so V & P shading also accesses RAM in both systems), including BackBuffer work which is often and costly from my limited understanding.
In the case of XB360, main RAM BW is thrashed in 2 steps, getting data to the GPU and output (ignoring textures
).
If the EIB didn't exist, PS3 would also have to intrude on RAM BW for passing data to the GPU, so there is a saving made of up to 35 GB/s.
The available 256 GB/s BW on eDRAM is internal
storage for local processing, that works on a subset of the rendering process. Figures for other phases of the rendering process on local storage, like on the GPU or CPU, were not included in the MN article. ie. Geometry creation occurs between CPU logic circuits and local storage, shading occurs between GPU shader logic and local storage. Yet the figures for these phases where logic acts on local storage weren't present in the MN article. The bandwidth saved by moving the backbuffer processing onto a chip with fast local store is important, but no more than the bandwidth saved from the GPU working on internal registers and local stores instead of working directly on the system RAM!
It was the highlighting of an internal processor bandwidth, between logic and local storage, that is off with the MN article as it's something not used generally (never heard of it before now!) and wasn't used across the board uniformly for both
entire systems. If they want to do this they should include ALL bandwidths between logic and local storage.
Whereas Jaws is talking about transfer of data from one processor to another, which is a non-local storage BW figure that can be counted along with all the others.
That's the chief distinction, at least that I make. Don't count BW between logic and local storage on the same chip, only count BW where data is passed from one processor to another.