What can be defined as an ALU exactly?

Ailuros

Epsilon plus three
Legend
Supporter
Moderator's note: This discussion was split from this thread.
Let's continue the discussion about ALUs in a more appropriate forum, and in a more appropriate thread too.


radeonic2 said:
Well ati's people are generally altered.. or enhanced you might say.
How else do you think they were able to see that they needed a 48 shaderpipeline part while nvidia's still messing around (for future products) with a little 24 shaderpipe part and at best a 32?

G70 has 48 ALUs.
 
Is that your own comment or did Nvidia pay you to say that? ;)


Edit: Sorry, I couldn't resist. :)
 
rwolf said:
Is that your own comment or did Nvidia pay you to say that? ;)


Edit: Sorry, I couldn't resist. :)

The net result is that G70 features 48 fragment shaders of the same capabilities, with one of them having to handle the texture processing instructions.

ps.gif


http://www.beyond3d.com/previews/nvidia/g70/index.php?p=02

You may ask Wavey to send his check back first :p
 
Tim said:
Yes and the R590 has 96.

R580 rather and yes encounting the ADD abilities of the mini-ALUs is perfectly legitimate, but it's not a full ALU either. Read the original message that I replied to. If you count based on MADDs exclusively, there are 48 on G70, as they are 48 on R580.

Deeper analysis will show a higher ALU-throughput on R580 though for many reasons.
 
Last edited by a moderator:
Ailuros said:
R580 rather and yes encounting the ADD abilities of the mini-ALUs is perfectly legitimate, but it's not a full ALU either. Read the original message that I replied to. If you count based on MADDs exclusively.

It does not make any sense at all to count MADDs exclusively.
 
Tim said:
It does not make any sense at all to count MADDs exclusively.

More than counting supposed "half-ALUs" as you just did. Make it alternatively 36 FLOPs per SIMD channel on R580 vs. 16 FLOPs per SIMD channel on G70 and we have an agreement.
 
I think Ailuros makes a fair point. I dont think its arguable as to whether or not that the R580 has a higher shader throughput at moment. But to say the G70 is only operating at 24 shader units is a gross over simplification of the G70 pipeline structure. Certainly not the extent of comparing it to a mini ADD ALU. And it certainly only serves to confuse the masses. Anyone can exaggerate the number of shader/pipelines when trying to make comparisons. But the actual capabilities of the ALUS is the most important thing being discussed here.
 
Last edited by a moderator:
Ailuros said:
More than counting supposed "half-ALUs" as you just did.
Not really, Ail. In each case neither of the second ALU's are capable of operating the entire shader functionality on their own, they are both merely parts of the entire shader pipeline. Yes, the second ALU on NVIDIA's parts has more capability on its own, but only with the entire pixel shader pipeline can the full shader instruction capabilities be performed.

An ALU is just that, an Arithmetical Logic Unit, that doesn't dictate what artithmetic ops must be produced. A "Pixel Shader ALU" has some functionality and capabilitiy requirements attached to and you have to look at the entire pipeline in both cases for that.
 
Dave Baumann said:
An ALU is just that, an Arithmetical Logic Unit, that doesn't dictate what artithmetic ops must be produced. A "Pixel Shader ALU" has some functionality and capabilitiy requirements attached to and you have to look at the entire pipeline in both cases for that.

No doubt, yet I did in fact encount in the FLOP comparison above the ADDs (while it should be known right now that on GFs one would have to encount texture OPs also). I'm reacting only to those funky numbers circulating (see also last post above) considering ALU unit counts.

An alternative way would be to count ALUs differently and count 12 FLOPs/ALU and 16 FLOPs/ALU - texture OPs, times core frequency for either/or. Both sides math for purely theoretical throughputs is usually exaggerated and real-time game performance or even current synthetic applications can show any possible differences.
 
Dave Baumann said:
Not really, Ail. In each case neither of the second ALU's are capable of operating the entire shader functionality on their own, they are both merely parts of the entire shader pipeline. Yes, the second ALU on NVIDIA's parts has more capability on its own, but only with the entire pixel shader pipeline can the full shader instruction capabilities be performed.

An ALU is just that, an Arithmetical Logic Unit, that doesn't dictate what artithmetic ops must be produced. A "Pixel Shader ALU" has some functionality and capabilitiy requirements attached to and you have to look at the entire pipeline in both cases for that.

Another issue is the general assumption that functionally equivalent ALUs perform the same. The ALU itself might be pipelined and the performance of various instructions might depend on the number of stages.

http://edu.cs.tut.fi/pd2005/lecture5/node1.html
 
Last edited by a moderator:
For a modern game like FEAR, Q4, SC, etc. does the G70's second ALU do arithmetic ops most of the time or texture ops most of the time?

No doubt there are games where the "unified" ALU is doing texture stuff most of the time, and it that case, its fragment shading ALU power can be discounted.

What we're really talking about here are the benefits of a decoupled architecture vs the benefits of a unified architecture, if you take my meaning. Both are good.

ERK
 
ERK said:
For a modern game like FEAR, Q4, SC, etc. does the G70's second ALU do arithmetic ops most of the time or texture ops most of the time?
IIRC, only the first ALU has tex ability. Corollary, does it matter whether ALU0 or ALU1 can perform tex?
 
ERK said:
For a modern game like FEAR, Q4, SC, etc. does the G70's second ALU do arithmetic ops most of the time or texture ops most of the time?

No doubt there are games where the "unified" ALU is doing texture stuff most of the time, and it that case, its fragment shading ALU power can be discounted.

What we're really talking about here are the benefits of a decoupled architecture vs the benefits of a unified architecture, if you take my meaning. Both are good.

ERK

Considering the minimal performance hit for Anisotropic Filtering in these titles on G70 hardware. The secondary texture unit cannot be stalling that much. I have a feeling that the increased ALU to Tex rate in most games that it will likely hide the latency associated it. I believe Ailuros/Demirug has more information on this paticular subject the but primary ALU will never stall completely. But the numbers I seem to recall hearing were in the range of 30-40% in a worse case scenerio. If I'm doing any misquoting those two can feel free to correct me regarding the percentage range.

Chris
 
Ailuros said:
More than counting supposed "half-ALUs" as you just did.
I don't think so. Just look at per-pipe performance of NV40 versus G70. They're very similar, even with complex lighting shaders. The biggest improvement is 23%, but mostly we're talking about single digit gains. In contrast, for several of these shaders R580 gets 2-3x the performance of R520.

NVidia may have the theoretical capability, but somehow the second ALU doesn't do very much. From the data, a G70 shader pipe is a lot closer to one ATI shader pipe than two.

Anyway, a lot of this talk is moot because the R5xx architecture is nowhere near as dense as G7x. Just wait until the 7600 comes.
 
Back
Top