Complete Details on Xenos from E3 private showing!

In the interview with FiringSquad ATI said the 1024-bit bus is internal to the eDRAM logic. The actual bus between GPU and eDRAM module is running at 256Gbit/s at 2GHz. So the bus is actually 128-bit at 2GHz.
 
256Gbit/sec (gigabit not gigabyte) is 32 GB/sec is it not?


bandwidth between parent die and edram daughter die is 32 GB/sec (48 GB/sec total) and the bandwidth between edram on daughter die and the logic (192 processors, etc) on the daughter die is 256 GB/sec

correct me if i'm mistaken.
 
This seems to be the most detailed info to date...B3D experts, analyze it and let us know what do you think about! :p
 
All I can say it seems ATI have done a tremendous job on this GPU. IT really looks like its shaping up to be a novel, efficient and well engineered part.


Cheers

Imran
 
Megadrive1988 said:
and the bandwidth between edram on daughter die and the logic (192 processors, etc) on the daughter die is 256 GB/sec
What is this bandwidth between eDRAM and on chip logic? Isn't that like the bandwidth between cache and logic of a CPU? How can this be considered 'bandwidth' in the conventional sense, referring to passing data between processing components? :?
 
The XBox GPU seems to be much more interesting afa the architecture goes compared to the RSX. That doesn't necessarily mean that it's going to be faster but i can't help thinking that the ATI GPU is more forward looking then the RSX. Which i'm guessing will pay off in the next gen GPU's (after R520 , G70 gen). This is of course based purely on speculation on my part :)

Though to be fair, a TBDR seems like a even more interesting arhitecture and that hasn't proved to be a huge sucess in the PC area, at least not yet :)
 
Shifty Geezer said:
Megadrive1988 said:
and the bandwidth between edram on daughter die and the logic (192 processors, etc) on the daughter die is 256 GB/sec
What is this bandwidth between eDRAM and on chip logic? Isn't that like the bandwidth between cache and logic of a CPU? How can this be considered 'bandwidth' in the conventional sense, referring to passing data between processing components? :?
That bus is where all the data relating to blending, z-compare, AA sample filtering, etc is passed back and forth. In a traditional architecture, these can take up a pretty significant amount of the regular framebuffer bandwidth.

I think.
 
Shifty Geezer said:
Megadrive1988 said:
and the bandwidth between edram on daughter die and the logic (192 processors, etc) on the daughter die is 256 GB/sec
What is this bandwidth between eDRAM and on chip logic? Isn't that like the bandwidth between cache and logic of a CPU? How can this be considered 'bandwidth' in the conventional sense, referring to passing data between processing components? :?

Yep, you're right, it's not bandwidth in the normal sense we're used to when talking about frame buffers.

Back in the bad old days, caches couldn't fit onto the CPU die. All that's happened is that we've forgotten about cache bandwidth and tend to think of the latency and size of a cache.

The upshot is this is like having a compute-free frame buffer, which can keep up with the GPU even when running at 4GP/s. Well, actually, I don't know what the peak fill-rate is. But the idea that the GPU will never have to queue output fragments is pretty compelling.

Jawed
 
PC-Engine said:
If we can count the 48GB/s bandwidth of GS in PS2 then why can't we count this 256GB/s of bandwidth of the eDRAM?

We better don't say Xbox was more powerful than PS2 regardless to the eDRAM...
 
Now that I know that Dave was actually at e3... i think he can give us much better article and understanding than the one presented in this thread.
 
DaveBaumann said:
This appears to just be from the press conference they gave.

So beyond the bad translation... is there anything about this device that hasnt been speculated, or beaten to death, that we dont yet know about or understand? 8)

Usually you have a handle on these things AND a way of explaining them that readers at all levels can grasp... which to me is the most important quality of being the lead editor of a site like this. *shrug*

My only real question regarding xenos is that we know it is capable of fp32 blending but with only 10 MB on edram, how is it capable of HDR?
 
PC-Engine said:
If we can count the 48GB/s bandwidth of GS in PS2 then why can't we count this 256GB/s of bandwidth of the eDRAM?
Cause you should count the bandiwith where the bottleneck is.
It's not fully clear at this moment but it seems it's a plausible thing that main GPU core is attached to the edram with a 48 GB/s bus (32 + 16).
PS2 edram (and it was edram for real :) ) was connected via a 2560 bits bus runing at 150 mhz.
Even with a tremendous bandwith like that (1999!) there are texture and pixel caches sit between the GPU core and the edram banks.
 
blakjedi said:
DaveBaumann said:
This appears to just be from the press conference they gave.

So beyond the bad translation... is there anything about this device that hasnt been speculated, or beaten to death, that we dont yet know about or understand? 8)

I think there's a lot more to learn...

- HOS tesselator?
- Just how many pixels is the output per clock?
- What are those 192 units on the EDRAM chip?
- Memory subsystem is closer to that of a CPU.. but how?
 
PC-Engine said:
If we can count the 48GB/s bandwidth of GS in PS2 then why can't we count this 256GB/s of bandwidth of the eDRAM?
If we can count this 'on chip' bandwidth as part of a 'system aggregate' bandwidth like MS did, does that not mean that the PS3's 'system aggregate' should also include bandwidth between 7 SPEs logic and local storage + 1 PPE and cache + Level 2 cache on Cell?

Maybe they should come out with a new measure - electrons in motion/square mm/second? :rolleyes:

Seriously, why is this talked of as bandwidth (unless just marketting)? The true bandwidth between GPU and eDRAM modules is 256 Gbits/s, right? 32 MB/s? Anything with a 256 GB/s number is phoney?
 
The 256GB/sec internal bandwith is important because it means that the system will probably never be backbuffer bandwith limited.
AFAIK the bandwith needed to copy the 720P 32 bit backbuffer into the framebuffer is only about 220 MB/sec, even with tiling. This thing is very similar to the deferred shading that PowerVR architectures do...

RSX on the other hand has to fit all its backbuffer operations (Z/stencil rendering, alpha blended polygon rendering, overdraw, etc.) into a far smaller amount of bandwith, thus it is likely that it'll become bottlenecked by it.


I'd also emphasize the Xenos's ability to read/write the CPU L2 cache. This should allow games to stream content like this:

Compressed stuff in main memory -> uncompressing on CPU into L2 cache -> process on Xenos into rendered image in EDRAM -> output to the framebuffer

This method utilizes a very small amount of the main system memory bandwith, especially compared to doing the same on PS3. For example you could do a lot of procedural textures and geometry this way.
 
Laa-Yosh said:
I think there's a lot more to learn...

- HOS tesselator?

Yes.

- Just how many pixels is the output per clock?

8 Colour / 16 Z+Stencil

- What are those 192 units on the EDRAM chip?

Extropolation of the number of units - Z, Stencil, Colour, Blend, etc. Probably (confirmation pending) multipled by 4 as it probably does everthing single cycle with 4x FSAA (apart from the extra edge processing required in the shader).
 
Back
Top