FS ATI interview on R500 + Block Diagram

The article is incorrect: each ALU can do 10 floating ops per cycle, 8 + 2 (vec4 + scalar fmadd)
 
If we don't count reciprocal and special ops in NV40 pixel pipes we have:
8 ops * 6 Vs + 12 ops * 16 PS = 240 floating ops per clock cycle, half of what R500 can do, dunno about R420
 
just wanted to point out:

Geforce 6800 Ultra:
53 ops per clock * 400MHz = 21.2 billions ops per second

Geforce 6800 Ultra Extreme Edition
53 ops per clock * 450MHz = 23.9 billion ops per second

PS3 GPU (RSX)
136 ops per clock * 550MHz = 74.8 billion ops per second

Xbox 360 GPU (R500)
192 ops per clock * 500MHz = 96 billion ops per second
 
nAo, I feel like a pretzel with all the numbers! :?

You were pretty up front when the shader ops number came out (i.e. you said it was almost worthless benchmark). I look forward to when you DO get some worth while numbers so you can untwist them :D
 
dukmahsik said:
just wanted to point out:

Geforce 6800 Ultra:
53 ops per clock * 400MHz = 21.2 billions ops per second

Geforce 6800 Ultra Extreme Edition
53 ops per clock * 450MHz = 23.9 billion ops per second

PS3 GPU (RSX)
136 ops per clock * 550MHz = 74.8 billion ops per second

Xbox 360 GPU (R500)
192 ops per clock * 500MHz = 96 billion ops per second

192?
 
48 ALUs* 4 FMAD units each* 2 Ops per clock per FMAD per clock * 500Mhz = 192 Billion floating-point ops per sec

maybe?

what about the edram? doesn't it have 192 of something or rather too?
 
dukmahsik said:
just wanted to point out:

Geforce 6800 Ultra:
53 ops per clock * 400MHz = 21.2 billions ops per second

Geforce 6800 Ultra Extreme Edition
53 ops per clock * 450MHz = 23.9 billion ops per second

PS3 GPU (RSX)
136 ops per clock * 550MHz = 74.8 billion ops per second

Xbox 360 GPU (R500)
192 ops per clock * 500MHz = 96 billion ops per second
I don't think these numbers are comparable, neither do I thint the 53 ops/clock number for NV40 is true.
There is no odd number of ALUs in a 6800U, therefore an odd number of ops/clock doesn't make sense at all.
 
I think I know where Nvidia gets the 53 instructions per second number.

Each vertex pipe can issue 1 instruction per clock => 1*6 processors = 6 vertex instructions per clock.

Each pixel pipe can issue 3 instructions per clock 2 for SU0 and 1 for SU1 (remember this is just the number of issued instructions, not executed operations) => 3*16 processors = 48 pixel instructions per clock.

If we add the numbers up (6 issued vertex instructions/clock + 48 issued pixel instructions per clock), we get 54. Close enough?

Does that sound reasonable, Xmas?

We could do the same for R500 which could issue 2 ops per general ALU => 2*48 = 96 pixel/vertex instructions per clock.

While it seems RSX may have the edge in the number of instructions issued per clock, I think it will be close to R500 in the number of operations per clock that it could execute; remember, R500 has 5D alus.
 
Just wondering if R500 could run vertex and pixel streams/threads simultaneously among its units or if it has to be either or but not both?
 
This is one of the things I've got to follow up. From someting that Bob told me it appears that the entire 48 ALU's operate on the same shader - this dictates that they are all either processing VS or PS instructions (and would also imply wastage on smaller triangles. However, I'm still having trouble with this for two reasons: the diagram I saw had 8 groupings of 6 ALU's (if they weren't independant quads, why group them like that?) and there are only 16 texture samplers - if they were all working on the same shader at the same time then all of them would want texture data at the same time (presumably).

I'll follow up on this later.
 
Nite_Hawk said:
One does wonder why exactly the PS3 GPU has around twice as many transistors as the xbox360 GPU. From this perspective, it will be interesting to see where the ATI GPU is deficient in comparison. It certainly sounds like ATI is making better use of what transistors they have.

Nite_Hawk
I think it's misleading because some of the logic is on the 10MB EDRAM. (smart memory) ATI does their counts of transistors a little more conservatively than Nvidia does too, right?
 
Also bear in mind that there is basically nothing in the graphics other than the shader/tesselator/HierZ logic. Even elements such as HierZ which will eat up lots of transistors on the PC parts because of the resolutions they need to support will be smaller in Xenos.
 
So what is this all about? Can anybody answer this?


X360 GPU:9 MIllion Dot Production Per Second

PS3 GPU:56 Billion Dot Production Per Second
________
Triumph Tt600
 
Last edited by a moderator:
dukmahsik said:
48 ALUs* 4 FMAD units each* 2 Ops per clock per FMAD per clock * 500Mhz = 192 Billion floating-point ops per sec

maybe?

You're comparing shader ops with floating point ops. I'm not sure about your 6800 Ultra number, but the 136/cycle number given for RSX was shader ops, not floating point. Multiple floating point ops in a shader op (one vector op - a fmadd = 8 floating point ops, for example).
 
DaveBaumann said:
tEd said:
[Is it true that they only have 4 texture units? I was little surprised to say at least

No, its 4 groups of 4. They are grouped in four as these are the most common sampling requirements.

Xenon has 32 memory fetch units, 16 have filtering and address logic (textures) and 16 just do a straight lookup from memory (unfiltered and no addressing modes AKA vertex fetch).

Unification means that any shader can use either type (filtered or unfiltered) as it see fit (no concept of dependent reads or otherwise). This means that the XeGPU has an almost CPU like view of memory.
 
Back
Top