Cg Profile Performance - a brief article.

If I understand correctly in GL2 every IHV will have to write their own compiler for gslang.
So the performance of Cg with the NV30 profile is interesting, as it is what nVidia's GL2 driver will likely use.
 
Hyp-X said:
If I understand correctly in GL2 every IHV will have to write their own compiler for gslang.
So the performance of Cg with the NV30 profile is interesting, as it is what nVidia's GL2 driver will likely use.

Because of the decisions made about gslang (no support for reduced precisions) it may be that the ARB Cg profile is more representative of what GL2 on the NV30 will be able to do than the NV30 profile.
 
antlers4 said:
Hyp-X said:
If I understand correctly in GL2 every IHV will have to write their own compiler for gslang.
So the performance of Cg with the NV30 profile is interesting, as it is what nVidia's GL2 driver will likely use.

Because of the decisions made about gslang (no support for reduced precisions) it may be that the ARB Cg profile is more representative of what GL2 on the NV30 will be able to do than the NV30 profile.


Sadly yes. gslang wont provide an implementation of reduced precision (boo!) but they are keywords 'reserved for future' use e.g.

"long short double half fixed unsigned"

So perhaps the NV30 architecture is too far ahead of its time!

[edit]

Actually, some of the comments in the gslang doc makes me FUME.

"When writing general programs, programmers have long given up worrying if it is more efficient to do a calculation in bytes, shorts or longs and we do not want shader writers to believe they have to concern themselves similarly."

AAARGH! What absolute, unadulterated monkey poo. IMO.
 
pocketmoon_ said:
Sadly yes. gslang wont provide an implementation of reduced precision (boo!) but they are keywords 'reserved for future' use e.g.

"long short double half fixed unsigned"

So perhaps the NV30 architecture is too far ahead of its time!

Quick question -

Why do you want lower precisions so much? Do you have a particular calculation in mind that for some reason can't be done with higher precision?
 
andypski said:
Why do you want lower precisions so much? Do you have a particular calculation in mind that for some reason can't be done with higher precision?

You want to lower precision for certain architectures like the NV30, where higher precision = lower performance. So if you don't "need" the higher precision, you don't want to use it....at least on the NV30.
 
Joe DeFuria said:
You want to lower precision for certain architectures like the NV30, where higher precision = lower performance. So if you don't "need" the higher precision, you don't want to use it....at least on the NV30.

Surely implementing lower precision in hardware is inherently faster. Thats why Intel etc give you the CHOICE of using doubles, floats, ints etc. NV30 has that choice - other architectures dont have the flexibility and arguably don't offer the performance they could be offering if they had at least support for half data types.






[/b]
 
LeStoffer said:
Chalnoth said:
Personally, I don't care that much about how Cg performs with the NV30 profiles.

Pssst: You will for damn sure if you're going to run a NV3X card (the word was twitchy)! ;)
The point I was trying to make was that Cg won't see widespread use unless it's useful across a wide range of video cards.
 
pocketmoon_ said:
Surely implementing lower precision in hardware is inherently faster.

I definitely would not word it that way.

It's probably more accurate to say that if you build two "non flexible" architectures, each of which handle different precsions, the lower precision is inherently faster at the same cost.

However, we're talking about two different design trade-offs here, and you have to factor in cost to the equation. So you can say that higher precision is inherently more "costly" than higher precision. But you can also say tat flexibility to handle different types is also more costly than handling just one.

Thats why Intel etc give you the CHOICE of using doubles, floats, ints etc.

That's also why Intel is a general purpose CPU, not a much more limited and specialized processor.

NV30 has that choice - other architectures dont have the flexibility and arguably don't offer the performance they could be offering if they had at least support for half data types.

But that would change the cost. I do understand your argument, but we might as well argue that NV30 is not arguably as fast as it should be when running high precision calcs.

At least with these particular parts, where clock-for clock, the "inflexible but high precision" seems to be running calcs faster than the LOW precision mode of the "flexible" architecture....well...it seems to me that nvidia may have sacrificed absolutle performance for flexibility.

Put another way:

Modern GPUs are not any faster when applying a single perspetive-correct, bilinearly filtered texel, vs. a simple flat-shaded pixel. The former is much more work. Why aren't these parts architected such that flat shading is so much faster than perspective correct, bilinear filtered texture mapping?

The answer again is really just economics.
 
pocketmoon_ said:
Surely implementing lower precision in hardware is inherently faster. Thats why Intel etc give you the CHOICE of using doubles, floats, ints etc. NV30 has that choice - other architectures dont have the flexibility and arguably don't offer the performance they could be offering if they had at least support for half data types.

Well, that's one way of looking at it -

Another is that support for lower precision data types (in languages and hardware) comes from legacy implementations, and nVidia's support for, for example, 12 bit fixed comes from the same philosophy.

eg. You start in the distant past with an 8 bit processor, where an int is 8 bits and then you move to a 16 bit processor and int is 16 bits etc. etc.

The presence of the lower precision types in processor architectures could then also be viewed as a legacy issue, where you need to be instruction-compatible with earlier versions of the same architecture. Low precision types in this case don't necessarily run at any faster speed than the higher precision types.

In fact sometimes the situation is quite the opposite and the lower precision types can be slower unless used carefully (for a good example of this look at partial register stalls on the x86 architecture).

A modern x86 CPU does not implement ALU operations on 8-bits any faster than it does 32 bits, so Intel are not giving you this choice because it is faster - actually I suspect that it would simplify their job considerably if they could do away with all support for data types smaller than 32 bits, but this just isn't practical for them due to backwards compatibility.

On new architectures that are uncluttered by legacy support I guess you could take the view that support for small operands then becomes as much a cross-compiler and cross-platform issue as anything else - eg. how do you compile applications that were written assuming smaller types? If you don't provide direct support for 8 and 16-bit data types then you may need multiple instructions and registers to emulate the correct behaviour, and applications written in a legacy style using small data types may then run slowly. As a result you have to produce your wonderful pristine new architecture with a whole load of wasted gubbins (again, look at the partial precision example above for the type of problems in scheduling that can be introduced)

On the other hand, implementing lower precision in hardware can potentially be significantly faster in some cases because it takes less transistors to make an 8 bit adder than a 32 bit one. You may also be able to make the same transistors for your 32 bit adder double up and do, for example, 4x8bit adds.

I certainly don't think it's as clear-cut as you make out above, though.
 
Just a quick question: what prevents NVidia from 'silently' support half floats in their GLslang compiler, just spilling out a warning like "this shader won't compile on all hardware" but still running it (with increased performance)?
 
Xmas said:
Just a quick question: what prevents NVidia from 'silently' support half floats in their GLslang compiler, just spilling out a warning like "this shader won't compile on all hardware" but still running it (with increased performance)?

Us,

Try slipping that pass us devs... I'd imagine JC and others would be a lot more vocal than I was about PS_2_0 doing the same thing :devilish:

Edit:
Unless you mean in a non forced way, but then it would be another languge with new datatypes?
 
antlers4 said:
Hyp-X said:
If I understand correctly in GL2 every IHV will have to write their own compiler for gslang.
So the performance of Cg with the NV30 profile is interesting, as it is what nVidia's GL2 driver will likely use.

Because of the decisions made about gslang (no support for reduced precisions) it may be that the ARB Cg profile is more representative of what GL2 on the NV30 will be able to do than the NV30 profile.

Not really.
Look at the full precision NV30 profile results they are still better (in one test 2x as fast) than the ARB profile.
 
DeanoC said:
Edit:
Unless you mean in a non forced way, but then it would be another languge with new datatypes?
Yes, I mean just that, compiling a shader where you actually write "half" in the source code. It's a reserved word anyway.

The compiler would actually accept a superset of proper GLslang code, but output a warning message that you're leaving the standart path.

That would be comparable to many C/C++ compilers which not only support the ANSI C/C++ standard types but also compiler-specific things like _intx for an integer with x bytes (however, without the warning).

edit: oh, I just realized the OpenGL extensions mechanism perfectly suits this issue, so NV_half_float2 should be a given.
 
Back
Top