ATI MSAA/ eDRAM module patent for R500/ Xenon?

Megadrive1988 said:
Jaws - Faf - Panajev - DemoCoder - DeanoC and others:

would it be correct to say that Xenon / Xbox360 sort of
has 2 graphics chips?

I do not mean two full GPUs / VPUs, but 1 main GPU / VPU
(R500 or R5xx) and then this seperate module with embedded DRAM
and some circuitry for final processing, FSAA, etc. ?

This don't make sense because in a typical EDRAM design you can build a chip with large connection bus(this is the main idea, think about the 2560 bit connection bus of the Playstation 2 that permit to obtain 48 GB/s ).
With the approach you suggest, you can build a typical external link between control logic and frame buffer memory between 256-512 bit and you lost all the possibilities offered by EDRAM because you can't rise the clock speed very much(i think 450-650 Mhz is the maximum for this generation).
You need bandwidth but the clock is fixed, so you need EDRAM and more polygons you render more bandwidth you need, more effects you apply and more bandwidth you need.

Megadrive1988 said:
also, what is the likelyhood that there will be more than 10 MB of eDRAM
(or EDRAM) ?


The problem is, how many transistors can you afford?
I think the best approach will be for the Ps3 because if you do vertex processing with one CPU you can put pixel pipelines and EDRAM in the graphics processor, but if you do all the things with your Graphics Processor(so Pixels and Vertex) yuo can't put a lot of EDRAM or you will loose to much money per unit.

Clearly imho.

vliw
 
Megadrive1988 said:
Jaws - Faf - Panajev - DemoCoder - DeanoC and others:

would it be correct to say that Xenon / Xbox360 sort of
has 2 graphics chips?

I do not mean two full GPUs / VPUs, but 1 main GPU / VPU
(R500 or R5xx) and then this seperate module with embedded DRAM
and some circuitry for final processing, FSAA, etc. ?


also, what is the likelyhood that there will be more than 10 MB of eDRAM
(or EDRAM) ?

Check the 'leak' figure again,

http://www.beyond3d.com/forum/viewtopic.php?p=495971#495971

They use the term 'EDRAM' module and is quite clearly a separate chip. Also confusingly they use 'E'DRAM as opposed to 'e'DRAM even though they refer in the 'leak' text that it's 'E' for 'Embedded' DRAM. Though some call it 'Enhanced' DRAM. It's relative and really irrelevant as it still a separate custom chip with custom logic with eDRAM. In this case there's far less logic than usual...

So yeah...it's like you describe and I wouldn't call them two graphics chip.

The R500 seems designed around AA and this eDRAM module so that you get 720P with 4*MSAA for free essentially with compression saving bandwidth to the framebuffer. Out of 10 MB, ~7.4 is used for the framebuffer and the rest to make this scheme work I guess. So the next logical step would be a 20 MB eDRAM module so that you get 8*MSAA at 720p or 4*MSAA at 1080p.

Don't really see this happening 'coz they already have a huge chip in the R500, a still beafy PPC CPU and a third custom eDRAM chip and not forgetting the rumoured 512MB of system RAM...and you'll have to double the off-chip bandwdth to the eDRAM module too...so heat and cost issues will prevail...

EDIT:

Also if you want a 20 MB eDRAM module you'll have to also double the R500 ALUs/ uninified shader units i.e. fillrate. This is because it internally renders 32 pixels per cycle and downsamples them to 8 pixels per cycle before writing to the frame buffer.
 
The GameMaster said:
...
Assuming that the R500 is roughly 300 million transistors, having 10MB of eDRAM that would add another ~300 million transistors...

10 MB of eDRAM is ~ 80 Million transistors...not sure where you get 300 million from... :?

neliz said:
I thought nv's PS3 gpu also had a "L2" cache and that the size was also considerable.
But it was ment more for instruction and texture caching...

Whatever gave you that idea? There's nothing confirmed for the GPU yet...
 
Is there really any point in having an external eDRAM module? The benefit of eDRAM is internal bandwidth, but once you go external it'll be limited by any chip to chip interconnect... :? It's a neat idea though.
 
The EDRAM is a dedicated framebuffer, anti-aliasing, blending, hyper-Z device. It has a dedicated bus running at around 18GB/s.

The beauty is, this bus is totally separate from the bus connects the R500 to the CPU and system RAM (where the textures, vertices and shader code, etc. come from) which runs at 33GB/s.

So it seems there's a vast amount of bandwidth in XBox 360, with the frame buffer's bandwidth "isolated" from the bandwidth involved in texturing, for example.

Where's that drool smilie?!

Jawed
 
PC-Engine said:
Damn that's pretty sweet, but why not just have GDDR3 for the framebuffer instead of eDRAM? :?

If I read Jawed explanation correctly, it seems its a special kind of memory device, so you can't use GDDR3 ?
 
V3 said:
PC-Engine said:
Damn that's pretty sweet, but why not just have GDDR3 for the framebuffer instead of eDRAM? :?

If I read Jawed explanation correctly, it seems its a special kind of memory device, so you can't use GDDR3 ?

So it's basically a chip with a lot of eDRAM and a little logic? Or basically like Flipper with the eDRAM moved offchip but connected with a very fast chip interconnect? What kind of chip interconnect offers 18GB/s bandwidth?
 
You know you guys read far too much into some of the microscropic details on some of these diagrams.

It's not like they're engineering drawings or design documents.
 
ERP said:
You know you guys read far too much into some of the microscropic details on some of these diagrams.

It's not like they're engineering drawings or design documents.
I bet there's a hidden meaning in this. Quick, everyone! Try to decode the secret message that will unlock all information about next-gen hardware!
 
Jawed said:
The EDRAM is a dedicated framebuffer, anti-aliasing, blending, hyper-Z device. It has a dedicated bus running at around 18GB/s.
...

Not sure where you're getting 18 GB/s?

It's 16+32 GB/s read/write bandwidth to the eDRAM module according to the 'leak'.

PC-Engine said:
V3 said:
PC-Engine said:
Damn that's pretty sweet, but why not just have GDDR3 for the framebuffer instead of eDRAM? :?

If I read Jawed explanation correctly, it seems its a special kind of memory device, so you can't use GDDR3 ?

So it's basically a chip with a lot of eDRAM and a little logic? Or basically like Flipper with the eDRAM moved offchip but connected with a very fast chip interconnect? What kind of chip interconnect offers 18GB/s bandwidth?

Rambus FlexIO, the chip interconnect for CELL, is ~77 GB/s aggregate bandwidth, i.e. ~45 GB/s outbound and ~32 GB/s inbound.

The Xenon leak also has high bandwidth for the R500, ~33 GB/s read and ~22 GB/s write bandwidth to the north bridge.
 
Jaws said:
Jawed said:
The EDRAM is a dedicated framebuffer, anti-aliasing, blending, hyper-Z device. It has a dedicated bus running at around 18GB/s.
...

Not sure where you're getting 18 GB/s?

It's 16+32 GB/s read/write bandwidth to the eDRAM module according to the 'leak'.

I think that will be "underselling" somewhat as well.
 
DaveBaumann said:
Jaws said:
Jawed said:
The EDRAM is a dedicated framebuffer, anti-aliasing, blending, hyper-Z device. It has a dedicated bus running at around 18GB/s.
...

Not sure where you're getting 18 GB/s?

It's 16+32 GB/s read/write bandwidth to the eDRAM module according to the 'leak'.

I think that will be "underselling" somewhat as well.

*Sees Dave sneakily remove his laughing smiley!* ;)

What you mean taking compression in mind so that's 'effectively' 4x that bandwidth? Well that's a given as afterall that's what the patent's about...compression and bandwidth saving.

This neatly brings me onto 'G' for Gigapixel in the G70 *cough* and it's bandwidth saving features for nVidia's next gen offering? It would be a suitable match, no? :p
 
Jaws said:
Jawed said:
The EDRAM is a dedicated framebuffer, anti-aliasing, blending, hyper-Z device. It has a dedicated bus running at around 18GB/s.
...

Not sure where you're getting 18 GB/s?

It's 16+32 GB/s read/write bandwidth to the eDRAM module according to the 'leak'.

The leak describes the link twixt R500 and EDRAM as capable of supporting two quads plus z/stencil per clock. That's 4 bytes per pixel, plus 4 bytes of z/stencil = 36 bytes as far as I can tell. So assuming 500MHz you get 18GB/s.

Jawed
 
What, kinda like NV2A was advertised as having 4GPixel fillrate when it really was 1GPixel? All because 2xAA was supposed to be "free"? We saw how that worked out in the end huh... Guess this time around, Sony will be victim of the NVIDIA math.
 
Jawed said:
Jaws said:
Jawed said:
The EDRAM is a dedicated framebuffer, anti-aliasing, blending, hyper-Z device. It has a dedicated bus running at around 18GB/s.
...

Not sure where you're getting 18 GB/s?

It's 16+32 GB/s read/write bandwidth to the eDRAM module according to the 'leak'.

The leak describes the link twixt R500 and EDRAM as capable of supporting two quads plus z/stencil per clock. That's 4 bytes per pixel, plus 4 bytes of z/stencil = 36 bytes as far as I can tell. So assuming 500MHz you get 18GB/s.

Jawed

http://www.beyond3d.com/forum/viewtopic.php?p=495971#495971

Leak said:
Eight pixels (where each pixel is color plus z = 8 bytes) can be sent to the EDRAM every GPU clock cycle, for an EDRAM write bandwidth of 32 GB/sec. Each of these pixels can be expanded through multisampling to 4 samples, for up to 32 multisampled pixel samples per clock cycle.

8 pixels per cycle * 8 Bytes per pixel * 0.5 GHz ~ 32 GB/s write bandwidth.
 
As a matter of interest, two quads per clock coming out of R500 is half (ish) the theoretical rate of R420, 4GP/s versus 8.3GP/s.

Am I reading the leak wrong?

Jawed
 
Jawed said:
As a matter of interest, two quads per clock coming out of R500 is half (ish) the theoretical rate of R420, 4GP/s versus 8.3GP/s.

Am I reading the leak wrong?

Jawed

No, you do get 4 GPixels/sec, but that cannot be compared in the same way TBDR/ PowerVR fillrates cannot be compared with IMR GPUs fillrate. The cost of fillrate is essentially 'free' 4*MSAA.

As L-B pointed out above, in marketing speak you could call that 16 Gpixels/sec 'effective' fillrate.
 
Sigh, I was wondering why the Z was "asymmetric", and trying to reconcile that with the "4 z/stencil" per clock on the diagram. Doh, that's 4 "quads" of Z per clock, which is 64 bytes, too.

That makes much more sense. Apologies for trying to decode the diagram, and not referring to the text as well.

Thanks,
Jawed
 
Right, well, now that's sorted, I'm wondering about the relatively weak "pixel pipeline" capability of R500.

Going back to the implied 4GPixel/s fill-rate of R500 at 500MHz: that's 8 pixels per clock, compared with R420 which is 16 pixels per clock.

That seems to imply to me that the pixel shader core in R500 is smaller scale than R420. Sure it's SM3, 32-bit etc., but in terms of equivalent pipelines/TMUs/ALUs, R500 has a lower capacity than R420.

We're getting conflicting ideas about whether the EDRAM is actually on-die or a separate device.

I'm wondering, now, if ATI has traded shading power for the EDRAM module.

A while back I asserted that in ATI's counting, R420 has 76 ALUs (excluding texture address calculation ALUs) and that R500's equivalent is only 48 ALUs for the same functionality (vertex and pixel shader ALUs).

http://www.beyond3d.com/forum/viewtopic.php?p=496088#496088

So, compared with R420's 8GP/s coming from a pool of 64 ALUs arranged in 16 pipelines, it seems likely that R500 is getting 4GP/s from a pool of 48 ALUs, not all of which are pixel shading for 100% of the time...

So, by cutting back the transistors expended on the vertex and pixel shader engines, R500 has space for the on-die EDRAM.

Presumably to generate anti-alias samples, the pixel pipeline needs to do no work beyond the interpolators. All the work to generate these samples is done by the interpolators - there's no texturing, blending, shading etc. required. Is that correct?

Jawed
 
Back
Top