R500 + eDRAM... what are the benefits?

cho · May 19, 2005

and does ATI move the ROP(some part?) into the eDRAM module ?

Acert93 · May 19, 2005

We have heard it is 150M transistors, is this correct?

Also read in an interview with Todd H that they have over 1M gates on the eDRAM.

DemoCoder · May 19, 2005

The ATI architecture is essentially a client-server distributed architecture. Kinda cool way to increase fab/yields when your process doesn't allow you to fit as much on a single core as you'd like.

nelg · May 19, 2005

DemoCoder said:
The ATI architecture is essentially a client-server distributed architecture. Kinda cool way to increase fab/yields when your process doesn't allow you to fit as much on a single core as you'd like.

Would it be fair to say that if the overhead is low that it would scale very well?

Dave Baumann · May 19, 2005

It should probably scale better than current solutions as it makes savings all round - power is saved because you are always using ALU's for one thing or another and never waiting for either VS or PS to finish something (and its not really possible to clock either down while they are not doing anything), and you'll also make saving on things like buffering between VS and PS.

nelg · May 19, 2005

DaveBaumann said:
It should probably scale better than current solutions as it makes savings all round - power is saved because you are always using ALU's for one thing or another and never waiting for either VS or PS to finish something (and its not really possible to clock either down while they are not doing anything), and you'll also make saving on things like buffering between VS and PS.

If performance per/transistor is roughly on par with non unified architectures then it seems like a great design to populate a product line with. If memory buses are the same along with the arbiter and the only difference is the number of ALU's then Sireric might need to find something more exciting to work on.

Would such a design be similar in that if a number of ALUs were bad they could just be disabled like pipelines are today?

AlNom · May 19, 2005

is 4xAA confirmed? What type? (MSAA, TAA...programmable too?)

Compression types?

Xmas · May 19, 2005

One thing I wondered about, what happens when a frame is finished and the backbuffer needs to be transferred to external memory? Does the chip simply stall for that short period, having no backbuffer to write to, or maybe do some vertex only work?
I mean, 3.5 MiB/frame (color data for 720p) at 60 fps seems to require such an insignificant period of time that it may not be worth any kind of double-buffering effort. But with render-to-texture, that requirement may only go up.

Rockster · May 19, 2005

Isn't the 64 concurrent threads derived from 48 which can be assigned to each of the ALU's and 16 others which are assigned to the TMU's?

How many components can be operated on in the ALU's, 5? And can operations use any mix of components. 4+1 is obvious, but are other combinations supported?

What kind of flexibility is there with the TMU's. Is there filtering for FP32? Can the TMU's be stacked to feed fewer ALU's to acheive single cycle trilinear or FP samples. What filtering levels are supported?

How expensive is branching in this architecture? Is the initial Z pass for a more automated, sophisticated form of Ultra-Shadow? What type of Hyper-Z is being employed?

eSa · May 19, 2005

I'm not gonna say anything... Just read;

http://www.hardocp.com/article.html?art=Nzcx

Especially, this part:

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is â€œantialiasing for freeâ€ done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. So all of your games on Xbox 360 are not only going to be in High Definition, but all will have 4XAA applied as well.

WHATTA HELL ???!!

Jawed · May 19, 2005

Rockster said:
Isn't the 64 concurrent threads derived from 48 which can be assigned to each of the ALU's and 16 others which are assigned to the TMU's?

Excellent idea!

Jawed

therealskywolf · May 19, 2005

eSa said:
I'm not gonna say anything... Just read;

http://www.hardocp.com/article.html?art=Nzcx

Especially, this part:

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is â€œantialiasing for freeâ€ done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. So all of your games on Xbox 360 are not only going to be in High Definition, but all will have 4XAA applied as well.

Click to expand...

WHATTA HELL ???!!

That sounds good.

Acert93 · May 19, 2005

That gives me a little more optimism for how the real games, especially 2nd gen games designed for the new architecture, will perform.

Idea to toss about: Does Sony have an early advantage with the fact the RSX looks to be an evolution of the current design + access to SLI 6800U with ballpark performance compares to MS who is stuck with a 50% performing SM 2.0 card with no eDRAM and totally different architecture?

It would seem development for the PS3 would be easier at this point, yes? Or should I say, PS3 game development should be more of a known factor.

PiNkY · May 19, 2005

So they rasterize all transformed triangles to the EDRAM module, where they do Z/Stencil and thus have zero overdraw for pixelshading?

Jawed · May 19, 2005

No. R500 outputs pixel fragments (and AA samples) into the EDRAM module.

Jawed

PiNkY · May 19, 2005

No. R500 outputs pixel fragments (and AA samples) into the EDRAM module.

I doubt this if they really do a Z-only pass first as Dave Baumann stated

Evidently the system is automatically set up to do a Z only pass first...

Jawed · May 19, 2005

The GPU can write pixel fragments or just Z/stencil data. It's not rocket science. You can get twice as many un-coloured Z/stencil pixels per clock as you can get normally coloured fragments.

Jawed

PiNkY · May 19, 2005

That's what i meant: they do a z-only pass, rasterizing their vertex-shader output to the edram and only pixel-shade visible fragments

R500 + eDRAM... what are the benefits?

cho

Acert93

Artist formerly known as Acert93

DemoCoder

nelg

Dave Baumann

Gamerscore Wh...

nelg

AlNom

Moderator

Xmas

Porous

Rockster

eSa

Jawed

therealskywolf

Acert93

Artist formerly known as Acert93

PiNkY

Jawed

PiNkY

Jawed

PiNkY

Similar threads