NV30,35 & R300/R350 Pixel Shader Pipes Compared (New inf

I've looked at those pipeline diagrams, but I find it hard to really compare the computational power of the NV3x and R3x0 pipelines right now. Could somebody translate this into some straight performance numbers for us average (or under average? :oops:) Beyond3D readers?

Is it possible to state the theoretical calculation potential (ignoring the Nv3x register weakness for now)? Perhaps some MFLOPS numbers or something like that? Interesting would be something like R3x0 MFLOPS, Nv35 FP16 MFLOPS and Nv35 FP32 MFLOPS, all ignoring the register problems of the Nv35.

Thanks!
 
Peak numbers aren't necessarily the question. It is better to ask: how well does the architecture map to the programs you actually want to run?
 
Dio said:
Peak numbers aren't necessarily the question. It is better to ask: how well does the architecture map to the programs you actually want to run?
Absolutely. Nevertheless peak numbers can be interesting, or not? I mean the Nv3x shaders seem to suffer greatly from the register problems. But let's say the Nv40 would solve the register problem, while keeping the rest of the pipelines as they are. Would that bring th Nv40 on par with the R3x0 pipelines? That's where peak numbers might gives us a clue. Or what do you think?
 
I think it's meaningless. You can make a machine that can run at 10x performance if it gets 10 identical MUL instructions in a row, but since you don't, what use is that extra peak performance?

The architecture has to match the typical workload to get good performance.
 
Dio said:
Peak numbers aren't necessarily the question. It is better to ask: how well does the architecture map to the programs you actually want to run?

For current games it's more like how well does the program map to the hardware. ;)
 
madshi said:
Dio said:
Peak numbers aren't necessarily the question. It is better to ask: how well does the architecture map to the programs you actually want to run?
Absolutely. Nevertheless peak numbers can be interesting, or not? I mean the Nv3x shaders seem to suffer greatly from the register problems. But let's say the Nv40 would solve the register problem, while keeping the rest of the pipelines as they are. Would that bring th Nv40 on par with the R3x0 pipelines? That's where peak numbers might gives us a clue. Or what do you think?

Well it won't AFAIK. So in this case at least, not too interesting ;)


Uttar
 
sireric said:
We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well.
This all lies on OpenGl guy. Our performance lays in his hands. ;)
 
Luminescent said:
sireric said:
We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well.
This all lies on OpenGl guy. Our performance lays in his hands. ;)
Actually, I don't work on the shader stuff much... go bug Dio ;)

Now, if you have problems with AA, then come see me :D
 
Anyway, it's andypski who always seems to be on the receiving end of "the fate of the company is in your hands."
 
phffft! The fate of the company is on his shoulders and he's larking around on something as insignificant as a "honeymoon"?? ;)
 
DaveBaumann said:
phffft! The fate of the company is on his shoulders and he's larking around on something as insignificant as a "honeymoon"?? ;)

One day, Dave, you'll learn what's RL. :LOL:
Oh wait. What's RL? :oops:


Uttar
 
Highly interesting thread :oops:

Colourless said:
Tridam said:
Colourless said:
I would imagine that the Mini FP24 units would be used for the PS1.x instruction modifiers. They could possibly be used for other things too though.

http://www.beyond3d.com/forum/viewtopic.php?p=131019#131019

I have strange results with R350/R300. It seems able to do one MUL for free with every instruction.

That behaviour miight further suggest that the extra units are intended to do the PS1.x instruction modifiers since they are all just muls of 0.125, 0.25, 0.5, 2, 4 or 8.

Might have to run a test or 2 myself. If i'm correct, than you shouldn't get the mul for free if you use an instruction modifier.
Not sure.
Instruction modifiers on 1.4 are input bias, input inversion and the scales you mentioned.

While for the scales a full multiplier is overkill (adding a constant to the exponent part is sufficient), inversion and bias require a full float adder.
So, IMO, if the goal was making the 1.4 modifiers free, they went clearly beyond what is needed. But lo and behold:
Legacy pixel shaders on DirectX® 9 hardware
When designing the RADEON? 9500/9700 family of chips, one
important objective was to create architecture backwards compatible
with legacy shader models that would provide the highest
performance possible. This resulted in pixel shader engine
architecture that natively supports shader instruction co-issue, and
most of the source argument and instruction modifiers. Since 2.0
pixel shader model has very limited support for modifiers, they have
to be emulated with extra instructions. This means that some of the legacy pixel shaders
featuring many modifiers will execute faster than their 2.0 pixel shader equivalents
.
Source: Page 19 of this
Umm? Equivalents do require extra instructions, of course. But if the legacy execution model for PS1.4 used the same resources, there wouldn't be a difference in execution speed as suggested by the above snippet.
My current bet is on a hidden exponent adder to handle power-of-two-muls.
 
For your last comment (as I understand it as saying that the bold text contradicts the stated expectations): to me, it just looks like that bold text is referring to the compiler's inability to always schedule maximum usage of the units. With modifiers being used, the functionality that can be expressed is simpler and more restricted (the pairing opportunity is specified by the shader expression of the operations, where for PS 2.0 it must be found by analysis).

I don't think that works to counter any of the expectations stated prior, and even supports them given the comments about the current PS 2.0 compiler status.

...

On a Completely Different Note, getting other people to say "proxels" brings a smile to my face (see :D), and I appreciate that with my recently passing through the ordeal of an Official Day of Getting Older (kids and masochists call them "Birthdays").
 
madshi said:
Uttar said:
Well it won't AFAIK. So in this case at least, not too interesting ;)
You mean the register problem will *not* be solved in NV40? Then I can only say: OUCH.
I think he means the NV40 pipelines won't stay the same as in NV35. Although I don't have an inside source like Uttar I'd say this is a pretty safe bet.
 
sireric said:
Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Is this a hint as to what is coming with the Catalyst 3.8 drivers?

Catalyst Maker has been tipping them as a big revision of the drivers with exciting stuff in it. I had been assuming we might see such things as built into the driver per game profiles and perhaps the super sampling AA we saw on the Macintosh release of the Radeons. But it would appear to be more and also the timing would be about right to be released with Half Life 2, which is a nice heavy shader intensive game where such an optimising compiler upgrade would shine and coincidentally further increase the performance delta between ATi parts & nVidia parts.
 
Back
Top