Second ("Mini") ALU on R420 Architecture

Dave Baumann

Gamerscore Wh...
Moderator
Legend
ATI's Pixel shader architecture for both the R300 and R420 series have relied on a primary ALU and a secondardy ALU without the full capabilities. In R300 we know that the "mini" ALU had DX8.1 modifier capabilities, however has anyone any evidence of something different in R420? I've heard mention of this unit now being an ADD unit, additional to the primary ALU, but not see anything to back that up.
 
Dave, this ADD in some of the R420 presspapers is a dirty math trick.

ATI claim in this papers that this Mini ALU can do "Y = X+X". This is right but it is the same as "Y=X*2". "*2" is an old DX 8.1 modifier.
 
Demirug said:
ATI claim in this papers that this Mini ALU can do "Y = X+X". This is right but it is the same as "Y=X*2". "*2" is an old DX 8.1 modifier.
Interesting twist of words. Out of curiosity, how old/recent are these presspapers?

Uttar
 
I've never actually seen that described anywhere, although I have heard a few references to add capabilities.
 
Uttar said:
Interesting twist of words. Out of curiosity, how old/recent are these presspapers?

Uttar

It was in one of the papers (PDF) we got a part of the pressstuff for the lauchreport. For clarification I am work as a freelancer for a german print magazin. Maybe I am able to find it but I had a harddisk crash lately and I am not sure if this is in one of the backups.
 
Ah okay, that's more than I wanted to know about those presspapers really, so unless someone else or yourself is interested in having more details about it, don't waste time on searching through tons of backups etc. - I've been through a HD crash so I know how annoying those tend to be :cry: Thanks god we didn't lose much of anything in it (I know of a videogame company which lost their authentication server and CDKey database in one in the last year or so though...)

Anyhow, back on topic, I think someone mentionned (you? I honestly can't remember) a while ago on these very forums that NVIDIA also had the equivalent of those Mini-ALUs, just they didn't show them in their diagrams. I've yet to see any complete and detailed information from NVIDIA about them; that is, I'm sure plenty of information is available out there, but I've never seen it all in one place.

From what I gather, it seems likely that NVIDIA's equivalent mini-ALUs are just as advanced, but considering the information at my (our?) disposal that seems a bit hard to estimate. On the other hand, most ATI developer presentations tend to emphasize swizzling as something you shouldn't abuse on their architecture, when on the other hand you really should on NVIDIA's. Now that I think about it, I wonder whether the different HLSL profiles take that into consideration - I'd guess so, though. And the compilers help no matter what.


Uttar
 
Yes, I have mentionned that nVidia use mini-ALUs too but I am sure that others have likewise write this.

We have try to get some information about this mini-alus from David Kirk during the NV40 launch phase. All we got was somethink like: "Maybe the devleoper support will publish this details but this is not nessary because the driver will do all the hard work". We try it again for the G70 but the only thing David Kirk tell us was "Yes, the mini ALUs are still there."

The HLSL compiler don't care much about the mini-alus if you use an SM2 or higher model because there is no way to tell the driver to use mini ALus. But I know that MS have build the compiler in such a way that it prefere sometimes one way to do something. Maybe this ways can be better mapped to the mini alus.

Anyway the dirty work is done in the driver. In the case of nVidia it first translate a shader in the internal shaderlanguage from nVidia (IIRC SHD) after this they use different compilers for the different chips that know about what can be done.

But until today I was still not able to get a full list. But I am very sure that the ALUs conatins two mini ALUs. One for source operations and one for the destination. Likewise I am sure that the TMU contains a mini ALU that can scale the results of a texture instruction. This can be used for normalmaps.

With much patience it should be possible to get all the information out of the shadertools from nVidia but this is a long way to go.
 
Demirug said:
All we got was somethink like: "Maybe the devleoper support will publish this details but this is not nessary because the driver will do all the hard work".
That is one most original statement he did there (and yes, I do realize you're paraphrasing, that's np :smile:) - obviously, the full details aren't of much use, but knowing whether x8 is "free" or not for example is not exactly what I would call information that is "not necessary" for developers to properly optimize shaders. Obviously, the compiler cannot optimize x9 to x8. Well, I hope it can't, anyway ;) And don't go tell me developers use powers of two anyway because they "are good", it was just an example of some of the things the mini-ALUs can do.

Uttar
 
Last edited by a moderator:
Uttar said:
That is one most original statement he did there (and yes, I do realize you're paraphrasing, that's np :smile:) - obviously, the full details aren't of much use, but knowing whether x8 is "free" or not for example is not exactly what I would call information that is "not necessary" for developers to properly optimize shaders. Obviously, the compiler cannot optimize x9 to x8. Well, I hope it can't, anyway ;) And don't go tell me developers use powers of two anyway because they "are good", it was just an example of some of the things the mini-ALUs can do.

Uttar

If your shader need a *9 it will help you nothing if you know that a *8 can do faster (as example). The big problem in PC game bussines is the different hardware. If you are working to hard to the metal of one chip it can happend that your shadercode will not work very well on a other chip. This can even happen with two chips from the same IHV. Because of this hide some details from the developer can save them from this error. In the case of console game development knowing all details is a good thing because you can be sure that your code should only run on this chip.

A PC 3D API should allow you to tell the driver what you want but not how it should be done. This give the driver the possibility to find the best way for each chip. Something like Hotspot JITing in VMs.

But current APIs are not build like this because there are still to many developers out there that fear they lost controll over the hardware. As example some even don't like the new virtual video memory system from Longhorn/Vista.
 
Couldn't, in theory, scalings of the form (2^n) for floating point numbers be implemented by adding (n) to the exponent field? That doesn't seem to be that complex of a unit, what you would seem to need is an 8-bit integer adder. You would have to clamp to the valid range but this is something that would have to be done anyway.

I'm probably wrong, though. Beats me.
 
That's the way it's done. And because scaling beyond *8 and /8 is quite rare, you can limit one input value to 2 bits + sign, which means even less complexity than a full 8 bit adder. Even with range checking and special value handling, that's a rather inexpensive unit.
 
Demirug said:
ATI claim in this papers that this Mini ALU can do "Y = X+X". This is right but it is the same as "Y=X*2". "*2" is an old DX 8.1 modifier.
I've got Eric and Dio saying that it is actually a full adder there, although there appears to be a little confusion as to whether it was added in RV350 or there from the start.
 
It's a full adder that can take two independent inputs, it's been there since R300, and it hasn't significantly changed since R300.

(Sorry for being around much recently by the way - I have less to say at the moment, and I've not adapted to the new board format. phpBB may be a bit rubbish, but at least I don't have to spend hours before I know how to use it properly).
 
Assuming that what Dio said above is the case, I wonder if there are more than 1 version of these mini adders. If I recall correctly, R3x0 and R4x0 shader processors had 2 mini ALUs and 2 full ALUs, 1 mini and 1 full for vectors and 1 and 1 for scalars. Perhaps the vector mini ALU also has an adder and maybe it is SIMD with 8 independent inputs. Anyone care to clarify?
 
Last edited by a moderator:
You've even confused me with that one.

There is a 'mini' ALU which is a vector unit and a 'full' ALU which is a vector unit. Both are then split into a vector portion and a scalar portion, but there are data pathways across the vector/scalar split.
 
Dio said:
You've even confused me with that one.

There is a 'mini' ALU which is a vector unit and a 'full' ALU which is a vector unit. Both are then split into a vector portion and a scalar portion, but there are data pathways across the vector/scalar split.
Okay, I edited my question above. Anyways, given that there is an ALU split between the vector and scalar side, are there separate mini adders for both vector abd scalar pathways?

Edit: Nevermind, after reading what you wrote, I understand. It's just that ATI released a power point image of the R3x0 pipeline and it showed 4 distinct units in the shader processor; I never thought of it as just a set of 2 vector units, each with its own vector and scalar pathways.
 
Last edited by a moderator:
You can look at it, and I do, as simply 4 units, since they really are seperate. 2 sets of scalar units, 2 sets of vector units. Only when used in full 4 component mode do they work in pairs.
 
So, in conclusion, these GPUs are capable of dual-issuing:

  • vec3 + scalar MUL or ADD or MAD
  • vec3 + scalar ADD
Is that right?

Jawed
 
Back
Top