ATI MSAA/ eDRAM module patent for R500/ Xenon?

one in another thread said:
Jaws said:
The pros and cons have been discussed in the other thread not only still on the first page but not only like 5 lines down but also been linked in this thread TWICE!
Well I just reiterated that 2-chip GPU = BS as vliw wrote in the other thread, in the light of this news about NEC ;) It's a SoC.

Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
 
Jawed said:
Well, it would appear the EDRAM in Xbox 360 is going to be on die :D

http://www.beyond3d.com/forum/viewtopic.php?t=22333

This page:

http://www.necelam.com/edram90/index.php?Subject=edramoptions

says that 256Mb will consume half of a 15x15mm die at 90nm. So 10MB will consume about 1/3 of that, i.e. about 5x7.5mm. Less than 40mm squared, not very much.

So, if you have to fit EDRAM on die, you might have to make do with less ALUs ;)

Jawed

That's the high density/low power lower peformance version of NEC's eDRAM technology. The high performance version that runs at higher clockspeeds takes up more space ie less dense.
 
Jaws said:
one in another thread said:
Jaws said:
The pros and cons have been discussed in the other thread not only still on the first page but not only like 5 lines down but also been linked in this thread TWICE!
Well I just reiterated that 2-chip GPU = BS as vliw wrote in the other thread, in the light of this news about NEC ;) It's a SoC.

Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
Cost rules :p I've heard most of the die-space in GS was filled by the wiring for the wide bus.
 
one said:
Jaws said:
one in another thread said:
Jaws said:
The pros and cons have been discussed in the other thread not only still on the first page but not only like 5 lines down but also been linked in this thread TWICE!
Well I just reiterated that 2-chip GPU = BS as vliw wrote in the other thread, in the light of this news about NEC ;) It's a SoC.

Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
Cost rules :p I've heard most of the die-space in GS was filled by the wiring for the wide bus.

And the die area of an R500 without eDRAM in your estimation?
 
Jaws said:
one said:
Jaws said:
one in another thread said:
Jaws said:
The pros and cons have been discussed in the other thread not only still on the first page but not only like 5 lines down but also been linked in this thread TWICE!
Well I just reiterated that 2-chip GPU = BS as vliw wrote in the other thread, in the light of this news about NEC ;) It's a SoC.

Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
Cost rules :p I've heard most of the die-space in GS was filled by the wiring for the wide bus.
And the die area of an R500 without eDRAM in your estimation?
I don't know about the R500 but Flipper's texture read bandwidth is 10.4GB/sec so... :)
 
Jaws said:
Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?

I think.....that x360 is not a top notch design as for example Ps3 will, maximum performances at relative low cost, like NGC.
Sure will be great but i think for not much time.
This is my impression but i can make a mistake...will see.

vliw
 
PC-Engine said:
That's the high density/low power lower peformance version of NEC's eDRAM technology. The high performance version that runs at higher clockspeeds takes up more space ie less dense.
That's definitely worth noting.

Looking here:

http://www.necelam.com/edram90/index.php?Subject=edramaccess

Shows the performance of the high-speed option. 250MHz random access, as opposed to 100MHz in the high-density option.

The only beef I have with these numbers is that the patent describes accesses to the frame buffer in blocks of data at a time - say 8 pixels (64 bytes) at a time. So the performance profile isn't entirely random as far as I can tell.

So who wants to work out the memory bandwidth for blending/AA? Is 256GB/s, as per the leak, a good starting point?

Presumably there's nothing to stop the blending/memory architecture from consisting of memory split up into banks, one per channel, so four banks in total. That would help bandwidth.

Jawed
 
if the edram is a second chip i would hope it would be at least 16 to 24 mbs instead of 10 or so .
 
one said:
Jaws said:
one said:
Jaws said:
one in another thread said:
Jaws said:
The pros and cons have been discussed in the other thread not only still on the first page but not only like 5 lines down but also been linked in this thread TWICE!
Well I just reiterated that 2-chip GPU = BS as vliw wrote in the other thread, in the light of this news about NEC ;) It's a SoC.

Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
Cost rules :p I've heard most of the die-space in GS was filled by the wiring for the wide bus.
And the die area of an R500 without eDRAM in your estimation?
I don't know about the R500 but Flipper's texture read bandwidth is 10.4GB/sec so... :)

What about framebuffer bandwidth of Flipper?
 
one said:
Jaws said:
one said:
Jaws said:
one in another thread said:
Jaws said:
The pros and cons have been discussed in the other thread not only still on the first page but not only like 5 lines down but also been linked in this thread TWICE!
Well I just reiterated that 2-chip GPU = BS as vliw wrote in the other thread, in the light of this news about NEC ;) It's a SoC.

Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
Cost rules :p I've heard most of the die-space in GS was filled by the wiring for the wide bus.
And the die area of an R500 without eDRAM in your estimation?
I don't know about the R500 but Flipper's texture read bandwidth is 10.4GB/sec so... :)

Well I suggest you read the entire thread and what this patent is supposed to mean, if you haven't read so already, and bring something new to the discussion.

I'll just repeat myself again. Earlier estimation of the R500 WITHOUT eDRAM was 240-320 mm2. Considering the R420 die at 130nm and estimating that the R500 and all it's logic would have twice as many transistors,

R500 ~ 280 mm2 at 90nm without eDRAM

Considering that the PS2's GS was 188 mm2 at 180 nm at launch, that's a freakin' HUGE chip already.

Earlier in the thread the R500 eDRAM module was estimated with custom logic was ~ 50 mm2 at 90nm. Considering a PPC G5 is ~ 60-70 mm2 at 90nm, that's already significant.

The R500 + eDRAM ~ 280 + 50 ~ 330 mm2

FREAKIN HUGE compared to the GS.

They could keep everything on die with a monster cost/bad yield and it's still possible. Or still get the benefits of this patent with ~ 256 GB/s 'equivalent' from 'leak' and split into two chips like the 'leak' diagram shows. And don't be so quick before you start calling BS on the idea... :rolleyes:

And if you read the R500 patent/this patent/leak diagram etc... it's quite clear that the R500 is built around AA and a fixed downsampled pixel output of 8 pixels per cycle so it doesn't need ultra high bandwidth that an on chip 'GPU' eDRAM could potentially provide. Instead more shader ALUs can be used. And it's not just eDRAM but it has 'attached' blending/z/stencil/MSAA/compression/decompression logic on board...which people cannot seem to get their heads arounds...and how much die space would you need to create 256 GB/s without compression/decompression method etc. of this patent...

There are pros and cons to both...
 
Jaws said:
Well I suggest you read the entire thread and what this patent is supposed to mean, if you haven't read so already, and bring something new to the discussion.
Eh I should've quoted
Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
instead to answer to. 48GB/s is not too shabby with all compressions even in today. I think they estimated that's enough.
Jaws said:
I'll just repeat myself again. Earlier estimation of the R500 WITHOUT eDRAM was 240-320 mm2. Considering the R420 die at 130nm and estimating that the R500 and all it's logic would have twice as many transistors,
R500 ~ 280 mm2 at 90nm without eDRAM
You think Unified Shader Architecture doesn't affect the die size efficiency?

PC-Engine said:
Read my post above.
frame-buffer = 9.6GB/sec (before the spec downgrade)
 
So what's the total internal R/W bandwidth of Flippers eDRAM? Nevermind I found the answer...18.2GB/s. Sounds very similar to this 18GB/s external bus bandwidth for the EDRAM module. ;)
 
I haven't read the whole thread so i apologise in advance if this has been asked and answered before, but isn't the point of eDRAM the fact that it's on the same die, thus allowing much higher speeds and bandwidth than external memory?
If there's an external module, doesn't that defy the whole idea behind eDRAM? Or at least, will it not be much more expensive to get the same performance out of it than it would have if it were on die?
I mean, PS2 has a huge 2048bit bus in the GS because the eDRAM is on the same die. I can't imagine how expensive that would be if it were and external module.

Or am i missing something?
 
PC-Engine said:
So what's the total internal R/W bandwidth of Flippers eDRAM?
At least 10.4GB/sec. Do you think R500 has 2 separated eDRAM blocks like in Flipper?
 
london-boy said:
I haven't read the whole thread so i apologise in advance if this has been asked and answered before, but isn't the point of eDRAM the fact that it's on the same die, thus allowing much higher speeds and bandwidth than external memory?
If there's an external module, doesn't that defy the whole idea behind eDRAM? Or at least, will it not be much more expensive to get the same performance out of it than it would have if it were on die?
I mean, PS2 has a huge 2048bit bus in the GS because the eDRAM is on the same die. I can't imagine how expensive that would be if it were and external module.

Or am i missing something?

With a clever design, the external bus could have effective bandwidth of 256GB/s so it wouldn't be the bottleneck and at the same time you have this eDRAM module with VERY fast internal memory bandwidth. Logically it's the same as having the eDRAM on the same die.

one said:
PC-Engine said:
So what's the total internal R/W bandwidth of Flippers eDRAM?
At least 10.4GB/sec. Do you think R500 has 2 separated eDRAM blocks like in Flipper?

If I were to guess yes since the texture cache needs more bandwidth than the framebuffer.
 
PC-Engine said:
With a clever design, the external bus could have effective bandwidth of 256GB/s so it wouldn't be the bottleneck and at the same time you have this eDRAM module with VERY fast internal memory bandwidth. Logically it's the same as having the eDRAM on the same die.

Right, and i guess, if they're thinking about it, it must be relatively cost effective too...
 
one said:
Jaws said:
...
Why do we get ONLY 48 GB/s on the R500 eDRAM module when we got 48 GB/s on the PS2's GS 5 YEARS AGO?
instead to answer to. 48GB/s is not too shabby with all compressions even in today. I think they estimated that's enough.
...

The main gist of the patent is a clever way to get ultra high 'effective' bandwidth without the main disadvantage of bringing it on board the GPU. i.e. losing other logic die area at it's expense. One could argue how much more logic/feaures the PS2's GS could've had if it also had a separate eDRAM module...The bottom line is total system transistor count has increased without having ultra bad yields with ultra large die sizes...

one said:
Jaws said:
I'll just repeat myself again. Earlier estimation of the R500 WITHOUT eDRAM was 240-320 mm2. Considering the R420 die at 130nm and estimating that the R500 and all it's logic would have twice as many transistors,
R500 ~ 280 mm2 at 90nm without eDRAM
You think Unified Shader Architecture doesn't affect the die size efficiency?

The R420 was SM2.0. The R500 is SM3.0 + ...

Any potential die size saving from unified shaders (arguable) will be cancelled from being SM3.0 +. Extra control/branching/storage logic etc...this was laready discussed earlier in this thread btw...
 
london-boy said:
I haven't read the whole thread so i apologise in advance if this has been asked and answered before, but isn't the point of eDRAM the fact that it's on the same die, thus allowing much higher speeds and bandwidth than external memory?
If there's an external module, doesn't that defy the whole idea behind eDRAM? Or at least, will it not be much more expensive to get the same performance out of it than it would have if it were on die?
I mean, PS2 has a huge 2048bit bus in the GS because the eDRAM is on the same die. I can't imagine how expensive that would be if it were and external module.

Or am i missing something?

You should bloody well read the thread.

The current state of the argument is that two chips are more likely.

The patent describes a blending and anti-aliasing frame buffer device. The frame buffer is what chews up most of the 10MB. The blending functionality is located, according to the patent, on the same die as the memory.

The anti-aliasing fetching and coordinating with pixel-fragments is, according to the patent, on a separate device (the GPU, if interpreted in a certain way).

The patent shows a bus between the GPU and the memory. The patent describes how fragment data packing can be used to efficiently communicate fragment data twixt GPU and blender/frame-buffer, in other words, use less bandwidth. Separately, the patent describes various data packing scenarios for the anti-alias sample data.

The point of the patent is not data packing or compression. It is primarily a method of using a combination of a fragment FIFO, an AA multi-sample memory, a blender and a frame buffer to produce a two-step blend and anti-alias, in response to each fragment delivered by the GPU.

The patent delineates the frame buffer memory device with on-chip blending from the fragment FIFO and AA multi-sample memory. It does this in recognition of the fact that you want to minimise the amount of circuitry you put in a DRAM memory system.

What's unclear is how this all relates to EDRAM and R500. Things have changed in the years since the patent was filed :!: In my view, the bus that's indicated in the patent is nothing more than an on-die bus. The constraint of limited circuitry with memory is no longer pressing.

Who knows.

Jawed
 
Back
Top