There's a lot of confusion about Xenos, eDRAM and 256 GB/s, especially amongst the less technical of us (me at least!). As best I can tell it's because the structure of the graphics system hasn't been well explained. I present here my understanding so that the technical bods can dissect and agree/disagree, fundamentally to consider the idea that there is a separate processing entity, the BackBuffer Processing Unit, that has been introduced to the graphics system.
-----
The Xenos system consists of two processing parts. One has unified shaders and performs the usual graphics stuff of assembling poly, texturing and shading (rasterization). The other part performs frequent memory intensive task like Z/stencil rendering, alpha blended polygon rendering, overdraw, etc.
There is a bandwidth of 22.4 GB/s between the GPU's shader unit and system RAM.
There is a bandwidth of 32 GB/s write, 16 GB/s read between Xenos shader processor and the Back Buffer Processor
The BackBuffer Processor has 10 megabytes of fast local storage.
The logic on the BBPU can access this data directly just as any processor can access its local storage, at sufficient bandwidth that it never has to wait. MS have given a figure of 256 GB/s, but to all extents and purposes it can be considered as limitless bandwidth. The logic will never wait.
If this is right, it raises the question as to why the 256 GB/s bandwidth is talked about (apart from blind marketting speak)? Bandwidth between logic and it's local storage never gets mentioned from what I know. No-one's listing the bandwidth between their level one cache and logic on their CPUs! I guess it might be because eDRAM is (or has been) slower than normal cache memory (SRAM?) so it's speed was a limiting factor. By saying the eDRAM's bandwidth is that fast, MS are indicating that the eDRAM is effectively as fast as SRAM. 10 Mb = 10 Level 1 cache. However, this figures results in a confusing situation where this bandwidth is mixed with the conventional understanding of bandwidth as data transfer between seperate processing units and storage, so it doesn't appear to be a fair figure when applied in that way.
Furthermore, although this bandwidth isn't bandwidth between separate processing units, it does have a beneficial effect in terms of not interferring with RAM -> GPU bandwidth. If the BBPU was not present, it's functions (Z/stencil rendering, alpha blended polygon rendering, overdraw, etc.) would have to be performed in the only available storage large enough, which would be main RAM, and this would cut into the 22.4 GB/s bandwidth.
Therefore the benefits of the BBPU should not be listed as 'bandwidth' but 'bandwidth saved' or 'bandwidth freed'. By not having to work on system RAM, the intensive tasks performed on the BBPU free the system bandwidth for other task like textures, models etc.
-----
Does this sound right? Does this paint an accurate picture of where these numbers come from and what they actually mean?
-----
The Xenos system consists of two processing parts. One has unified shaders and performs the usual graphics stuff of assembling poly, texturing and shading (rasterization). The other part performs frequent memory intensive task like Z/stencil rendering, alpha blended polygon rendering, overdraw, etc.
There is a bandwidth of 22.4 GB/s between the GPU's shader unit and system RAM.
There is a bandwidth of 32 GB/s write, 16 GB/s read between Xenos shader processor and the Back Buffer Processor
The BackBuffer Processor has 10 megabytes of fast local storage.
The logic on the BBPU can access this data directly just as any processor can access its local storage, at sufficient bandwidth that it never has to wait. MS have given a figure of 256 GB/s, but to all extents and purposes it can be considered as limitless bandwidth. The logic will never wait.
Code:
System RAM
DDR
|
|
22.4 GB/s
|
|
+-----------+-----------+--------+---------------------+
| | | | |
| SHADER:UNIT | | BACKBUFFER |
| | | | PROCESSOR |
| +-------+-------+ | 32GB/s | +-----------------+ |
| | | ------------>| | |
| | Unified |<------------ | Processing | |
| | shaders | |16 GB/s | | Logic + 10mb | |
| | | | | | local storage | |
| +---------------+ | | +-----------------+ |
| | | |
+-----------------------+--------+---------------------+
Furthermore, although this bandwidth isn't bandwidth between separate processing units, it does have a beneficial effect in terms of not interferring with RAM -> GPU bandwidth. If the BBPU was not present, it's functions (Z/stencil rendering, alpha blended polygon rendering, overdraw, etc.) would have to be performed in the only available storage large enough, which would be main RAM, and this would cut into the 22.4 GB/s bandwidth.
Therefore the benefits of the BBPU should not be listed as 'bandwidth' but 'bandwidth saved' or 'bandwidth freed'. By not having to work on system RAM, the intensive tasks performed on the BBPU free the system bandwidth for other task like textures, models etc.
-----
Does this sound right? Does this paint an accurate picture of where these numbers come from and what they actually mean?