R500 + eDRAM... what are the benefits?

We have heard it is 150M transistors, is this correct?

Also read in an interview with Todd H that they have over 1M gates on the eDRAM.
 
The ATI architecture is essentially a client-server distributed architecture. Kinda cool way to increase fab/yields when your process doesn't allow you to fit as much on a single core as you'd like.
 
DemoCoder said:
The ATI architecture is essentially a client-server distributed architecture. Kinda cool way to increase fab/yields when your process doesn't allow you to fit as much on a single core as you'd like.
Would it be fair to say that if the overhead is low that it would scale very well?
 
It should probably scale better than current solutions as it makes savings all round - power is saved because you are always using ALU's for one thing or another and never waiting for either VS or PS to finish something (and its not really possible to clock either down while they are not doing anything), and you'll also make saving on things like buffering between VS and PS.
 
DaveBaumann said:
It should probably scale better than current solutions as it makes savings all round - power is saved because you are always using ALU's for one thing or another and never waiting for either VS or PS to finish something (and its not really possible to clock either down while they are not doing anything), and you'll also make saving on things like buffering between VS and PS.
If performance per/transistor is roughly on par with non unified architectures then it seems like a great design to populate a product line with. If memory buses are the same along with the arbiter and the only difference is the number of ALU's then Sireric might need to find something more exciting to work on. :LOL: Would such a design be similar in that if a number of ALUs were bad they could just be disabled like pipelines are today?
 
One thing I wondered about, what happens when a frame is finished and the backbuffer needs to be transferred to external memory? Does the chip simply stall for that short period, having no backbuffer to write to, or maybe do some vertex only work?
I mean, 3.5 MiB/frame (color data for 720p) at 60 fps seems to require such an insignificant period of time that it may not be worth any kind of double-buffering effort. But with render-to-texture, that requirement may only go up.
 
Isn't the 64 concurrent threads derived from 48 which can be assigned to each of the ALU's and 16 others which are assigned to the TMU's?

How many components can be operated on in the ALU's, 5? And can operations use any mix of components. 4+1 is obvious, but are other combinations supported?

What kind of flexibility is there with the TMU's. Is there filtering for FP32? Can the TMU's be stacked to feed fewer ALU's to acheive single cycle trilinear or FP samples. What filtering levels are supported?

How expensive is branching in this architecture? Is the initial Z pass for a more automated, sophisticated form of Ultra-Shadow? What type of Hyper-Z is being employed?
 
I'm not gonna say anything... Just read;

http://www.hardocp.com/article.html?art=Nzcx

Especially, this part:

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is “antialiasing for freeâ€￾ done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. So all of your games on Xbox 360 are not only going to be in High Definition, but all will have 4XAA applied as well.

WHATTA HELL ???!!

:oops: :oops:
 
Rockster said:
Isn't the 64 concurrent threads derived from 48 which can be assigned to each of the ALU's and 16 others which are assigned to the TMU's?
Excellent idea!

Jawed
 
eSa said:
I'm not gonna say anything... Just read;

http://www.hardocp.com/article.html?art=Nzcx

Especially, this part:

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is “antialiasing for freeâ€￾ done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. So all of your games on Xbox 360 are not only going to be in High Definition, but all will have 4XAA applied as well.

WHATTA HELL ???!!

:oops: :oops:

That sounds good.
 
That gives me a little more optimism for how the real games, especially 2nd gen games designed for the new architecture, will perform.

Idea to toss about: Does Sony have an early advantage with the fact the RSX looks to be an evolution of the current design + access to SLI 6800U with ballpark performance compares to MS who is stuck with a 50% performing SM 2.0 card with no eDRAM and totally different architecture?

It would seem development for the PS3 would be easier at this point, yes? Or should I say, PS3 game development should be more of a known factor.
 
So they rasterize all transformed triangles to the EDRAM module, where they do Z/Stencil and thus have zero overdraw for pixelshading?
 
The GPU can write pixel fragments or just Z/stencil data. It's not rocket science. You can get twice as many un-coloured Z/stencil pixels per clock as you can get normally coloured fragments.

Jawed
 
That's what i meant: they do a z-only pass, rasterizing their vertex-shader output to the edram and only pixel-shade visible fragments
 
Back
Top