How Cg favors NVIDIA products (at the expense of others)

Tim Murray · Jul 17, 2003

Additionally, Cg is currently the only HLSL that works in OpenGL.

Not entirely true. Probably shouldn't say more, but meh, just wait until, oh, I dunno, the end of July. You'll see.

Ilfirin · Jul 17, 2003

Having the driver implement a compiler is a good thing. But only if no driver is allowed to be released if its compiler doesn't fit the language spec absolutely. I would hate to have the situation we have now with C/C++, where depending on the compiler one program may or may not compile despite the hardware having full support for it and everything being written to spec. Obviously the case with shaders is probably going to lean more towards the other end of the spectrum (hardware not supporting all language features) though, but there should still be a lot of effort put into making sure everything you put in the shader that fits the spec works correctly (within some margin of error, of course) on all the hardware that can execute it. WHQL could serve this purpose for HLSL.

There should also be a way (in the API) of forcing the standard compiler on over the driver's, for obvious reasons.

Just my 2 cents..

DemoCoder · Jul 17, 2003

Ilfrin, this could be achieved by not feeding the source code to the compiler, but having the DX9 function parse it first and do type checking and sementic analysis and only provide an abstract syntax tree or intermediate representation to the driver. Since the parser is controlled by MS, the language grammar can't be altered.

Joe DeFuria · Jul 17, 2003

DemoCoder said:
Since the parser is controlled by MS, the language grammar can't be altered.

What does it matter if the language grammer is altered or not?

DemoCoder · Jul 17, 2003

If the grammar is altered, then that means developers will have to do device driver detection and upload custom source code for each card which defeats the purpose of having a high level language.

Joe DeFuria · Jul 17, 2003

DemoCoder said:
If the grammar is altered, then that means developers will have to do device driver detection and upload custom source code for each card which defeats the purpose of having a high level language.

I realize that...but I'll try and ask a more pointed question.

Why would an IHV want to alter the grammar in the first place?

DemoCoder · Jul 17, 2003

Maybe to make a simpler, easier to use programming language.

I have no problem with allowing multiple grammars as input, but the intermediate representation of the parsed language has to be "rich enough" to handle the semantics you want. Everytime someone has built a virtual machine (Sun Java and Microsoft .NET) they have left something out.

If Microsoft builds an intermediate representation for parsed languages and it leaves out, say, half precision types, or integers, you're screwed no matter what your grammar is. Maybe one day, some GPU will have mutable memory (pointers), in which case, no matter what your grammar says, the compiler won't be able to pass the pointer concept into the runtime.

The situation I find ideal is that you can choose whatever language grammar you want based on aesthetic preference, but the driver will implement a single optimizer which receives optimized intermediate representation and further optimizers it for the video card, without knowing what the source syntax looked like.

Today, that intermediate representation is DX9 assembly, which I think lacks the expressiveness required to capture the full semantics of HLSL.

Doomtrooper · Jul 17, 2003

DemoCoder said:
Doomtrooper, there is no way to be optimal for all hardware. You wouldn't want a single device driver for all video cards and require register level compatability to make it work, and you don't want a single compiler for all video cards.

I would say looking at the performance of HLSL vs CG, that HLSL does a pretty damn good job of being 'optimal' for all hardware, considering it is outperforming the HLSL designed by Nvidia itself, anf giving great performance to R3.xx series cards too.

What else do we want here ??

Joe DeFuria · Jul 17, 2003

DemoCoder said:
Maybe to make a simpler, easier to use programming language.

Is Cg "a simpler, easer to use programing lnguage grammar", than HLSL?

DemoCoder · Jul 17, 2003

It is impossible for a device agnostic compiler to be optimal for all hardware. The comparison between Cg and HLSL is irrelevent. The question is, how much more performance could ATI squeeze out of their card if they implemented the compiler themselves.

This depends on the match between DX9 assembly and your understand hardware. I don't know about ATI, but I am willing to bet that 3dLab's architecture would benefit tremendously if they had control of the compiler vs DX9 generic.

DemoCoder · Jul 17, 2003

Joe DeFuria said:
DemoCoder said:

Maybe to make a simpler, easier to use programming language.

Click to expand...

Is Cg "a simpler, easer to use programing lnguage grammar", than HLSL?

They are almost identifical, same with OGLSLANG. Since shaders are immutable (cannot write to the environment except at the very end), they correspond almost exactly with pure functional programming languages like Haskell and ML.

Hence, a language with non-mutation might aid development, because functional languages make it easier for the compiler to detect bugs and are often more concise and easy to understand, and secondly, they offer up more routes for optimization due to algebraic reasoning.

Moreover, type inference systems could be used to establish (with hinting) the needed precision for a given calculation without having to have the programmer explicitly state it.

It would be an interesting project to code some shaders up in FP-style and see how they compare to C-style imperative.

Joe DeFuria · Jul 17, 2003

DemoCoder said:
Joe DeFuria said:

DemoCoder said:

Maybe to make a simpler, easier to use programming language.

Click to expand...

Is Cg "a simpler, easer to use programing lnguage grammar", than HLSL?

Click to expand...

They are almost identifical, same with OGLSLANG. Since shaders are immutable (cannot write to the environment except at the very end), they correspond almost exactly with pure functional programming languages like Haskell and ML.

OK, so HLSL and Cg are almost identical....though one reason for an IHV writing its own gramatical language is simply to have their version of a "simpler and easier to use programming language".

But the Cg language isn't easier to use at all by definition, since it's virtually identical.

So why does it exist if not for a desire for an easier or simply language? Is there some other reason why an IHV might want to develop or have control over the gramatical syntax?

DemoCoder · Jul 17, 2003

I think Cg exists because when Nvidia started it, there weren't any alternatives. It continues to exist only because of corporate inertia and because neither MS HLSL or OGLSANG are interoperable between APIs, nor can they generate optimal code.

Why does Intel provide a C compiler when both GCC and Visual Studio exist?

Entropy · Jul 18, 2003

DemoCoder said:
I think Cg exists because when Nvidia started it, there weren't any alternatives. It continues to exist only because of corporate inertia and because neither MS HLSL or OGLSANG are interoperable between APIs, nor can they generate optimal code.

Why does Intel provide a C compiler when both GCC and Visual Studio exist?

Because, at the time it was produced, Intel deemed it imperative to be competitive in SPEC benchmarks relative to various RISC offerings.
They therefore decided to produce a compiler that did a superlative job on the then current SPEC suite (and actually "optimised" to the point of producing erroneous output - quickly fixed, mind you, the parallells to the gfx world are obvious, as is the fact that gfx benchmarking is way behind where CPU benchmarking was a decade ago). According to SPEC rules, the compiler had to be offered as a product to the public. Which it was, but at an exorbitant price and in a state that made it effectively useless for anything than its' intended purpose - producing high SPEC scores for Intels x86 processors. Intels compiler really highlighted just to how high an extent the SPEC suite is not only a system benchmark, but equally as much a compiler benchmark. Through that, it helped dampen naive enthusiasm about SPEC considerably, and since the Intel compiler wasn't useful as a production tool at the time, its use by Intel initiated an intense debate over whether the spirit, if not the letter, of SPEC rules had been violated. However, it also helped drive a general improvement in the quality of generated code by x86 compilers.

I would say that to this day, Intel provides a high quality compiler in order to score better on benchmarks than they might otherwise do.

Entropy

DemoCoder · Jul 18, 2003

Intel does research, just like everyone else. Part of that research is compiler theory (especially true for Itanium)

Intel provided the compiler because the existing compilers were unable to do instruction scheduling for the newer Intel pipelines nor were they able to do any vectorizations such as striping, or deal with branch prediction hints. Intel's compiler sparked lots of revisions in GCC (hell, Intell hacked GCC themselves for up to a 30% improvement, which was taken and folded into PGCC) and now GCC3 has a modern instruction scheduler. Intel contributed lots to this. Ditto for ongoing work on Itanium.

You are insinuating that Intel's compiler purpose was to cheat on benchmarks. I say that when developing a compiler, the only way you can measure your success is by benchmarking, thus all compilers are developed to optimize benchmarks and test cases.

Were it not for Intel's compiler, our CPUs would be 20% slower. This is not benchmark cheating, it is real.

Many optimizations in compilers can be done generically, but it is absurd to think that one compiler can produce optimal code for all CPUs or GPUs without specific knowledge of the underlying hardware constraints.

Like Intel, Nvidia probably began Cg as someone's internal research pet-project, long before any standard HLSL projects were initiated. Eventually some product managers took note of it and sought to use it in marketing, and hence we are where we are today. NVidia's reasons for continuing to use Cg is that they can continue to extend it to match the capabilities of their hardware, whereas the generic DX9 HLSL cannot be altered, and OGL2.0 still isn't a reality.

I do not fault them for creating it, I think they should push Microsoft to make HLSL more pluggable in the API so that more control can be pushed from the operating system or dev-tool to the driver at runtime.

Joe DeFuria · Jul 18, 2003

DemoCoder said:
You are insinuating that Intel's compiler purpose was to cheat on benchmarks. I say that when developing a compiler, the only way you can measure your success is by benchmarking, thus all compilers are developed to optimize benchmarks and test cases.

We're getting away from my question which is less about the compiler, and more about the language.

Did Intel release new versions of the language spec in addition to their compiler?

NVidia's reasons for continuing to use Cg is that they can continue to extend it to match the capabilities of their hardware, whereas the generic DX9 HLSL cannot be altered...

Bingo. Cannot be altered.. Thing is, DX9 HLSL cannot be altered for a reason. Standardization and stability.

demalion · Jul 18, 2003

DemoCoder said:
Doomtrooper said:

Hardly, HLSL is outperforming CG on a 5600, so how can that above statement be correct. A developer has a much better compiler to use that has already been optimized for all hardware, vs. one.

Click to expand...

Doomtrooper, there is no way to be optimal for all hardware. You wouldn't want a single device driver for all video cards and require register level compatability to make it work, and you don't want a single compiler for all video cards.

This statement (though not necessarily disagreement with the isolated statement it quotes, IMO) seems slightly facetious, and largely based on a body of assumptions that you did not establish as valid...you can have "one" compiler behave differently, so there being only "one" MS compiler for DX HLSL does not validate your statement. Aside from that, and focusing only on the "ps_2_0" proifle, the idea of "shorter shader is better" is an idea with independent merit for shaders, and with the simplicity of expression with regard to graphics shading (very specific math and data fetching operations being the majority of workload), it is not the given you propose that "you don't want a single compiler for all video cards". Designing a chip to execute each instructions as quickly as possible is not something only ATI can choose to do, it is just something that nVidia decided not to do when they designed a chip with so many issues with executing a shader workload. Dealing with things after the fact of someone having created such hardware, the "single compiler" hypothetical (i.e., as a label that does not fit what it appears HLSL will be) is indeed undesirable, but so are the hardware characteristics that perform poorly relative to other hardware when implementing a specified workload even when these issues are taken into account.

That's for the "ps_2_0" profile only, and you seem to be making assumptions that neglect even the simple observation that only major register performance issues on nVidia's part prevent their architecture from benefiting from the exact same type of compiler output. Why does that inherently represent other IHVs at all, and ATI's benefit from the "ps_2_0" profile does not? Not being able to take advantage of it indeed might represent other IHVs, but wasn't it a mistake for nVidia for floating point shading?

Of course, it might also be dictated by lack of design capability or limitation of design evolution decisions (for nVidia, and other IHVs), but that possibility does not warrant omitting a discussion that establishes why this possibility is assumed when it seems vendors would want to seek to avoid it.

You want the compiler to be part of the device driver, and you want each vendor to ship their own optimizer that optimizes it at runtime.

Well, if you're talking about the LLSL, you're describing what MS is doing. I'm not sure if this "You" is Doomtrooper, or if you are just "speaking for what everyone should want"...if the latter, I'd presume you meant HLSL. My reply should hopefully address both, cases.

...
All Microsoft's compiler can do is perform generic optimizations like common subexpression elimination, copy propagation, and dead code elimination. It cannot schedule instructions (reorder them to take advantage of each card's architecture) nor alter it's register allocation strategy based on card architecture.

A very definite statement, but I'm not sure where you get "cannot". Perhaps I've just failed to understand something about the DX compiler, but I don't see how Cg's custom backends are doing anything the HLSL target system cannot do as well, in DX. It might not now, and MS may decide it will not (possibly with nVidia's encouragement if their strategy is for Cg to replace DX or atleast to push for that as part of a strategy), but "cannot" is a different kettle of fish than "does not". It is not suitable to confuse the two, because it proposes an observation of what seems to be the result of political and economic maneuvering on MS and nVidia's part (does not) as a technical limitation (cannot) without providing support (in what you've said so far, AFAICS) for that assertion.

Some people expect the device driver to do this on the shader assembly, but that is essentially trying to have the device driver reverse engineer the full semantics of the source code out of compiled binary at runtime, and you won't get the same level of optimizations as if you start with the original source.

The only point I disagree with here is your assurance that the observed deficiencies for the NV3x are deficiencies in the HLSL->LLSL compiler's absolute capability, and not simply a deficiency in the hardware that the compiler does not yet take into account (in a public release). The question in that case relates to this thread and to the when and how of the answers to some of my questions fall out. Basically, you maintain that the ps_2_a profile isn't a profile that accomplishes what you say atleast as well Cg does, and I'm pointing out that I don't see why you are sure that this (if true, which it might be at current) is a technical limitation and not a product of "politics" and/or the result of the majority of nVidia's current lineup being ill-suited to floating point processing.

Since both HLSL and GLSLANG contain semantics which are not expressible in shader languages (e.g. loops, branching, etc) some of that information will be erased by the time it reaches the device (e.g. branches turned into predictation) which means if the hardware actually contains *real branches* it will have to "infer" this intent somehow from a batch of predicated instructions.

Hmm...you again seem to maintain that the capabilities represented by HLSL profiles are not applicable to the problem you propose. If you have some reason for stating this, provide an indication for me please. Also, while I recognize that the extended 2.0 and 3.0 "LLSL" capabilities might prevent the most efficient utilization of the branching architecture for a specific hardware design, I'm not sure how this is established to be the case currently, or how Cg is demonstrating itself to being better at handling this. With nVidia being "rich" in comparison to other IHVs (except maybe ATI), I'm also puzzled as to how Cg, if you are proposing that it is better in this regard, is demonstrating that other IHVs would be able to execute even as well as nVidia (which, demonstably, is not very well at the moment) such that the increased "in house" control offered by Cg would provide benefit, and why the DX HLSL could not have evolved in the meantime to provide that benefit to those IHVs, assuming it does not in its current form.

I say the device driver should contain the backend of the compiler, and the front end merely does generic optimizations and stores the result as a serialization of the internal representation of the compiler.

Yes, but this contradicts the idea of offering a LLSL/"assembly" paradigm that reflects the full capabilities as an alternative to the HLSL. Some developers want this paradigm, some do not. I'm not disagreeing with the idea of "not", but disagreeing that you've established that your commentary above establishes your conjecture as to why that "not" is a given as being factual at the moment (though I tend to think it will be, maybe as soon as mainstream PS/VS 3.0 implementation, depending on how the HLSL/profile situation and vendor hardware turn out at that time).

Speaking of DirectX:

MS decided in favor of the standard assembly being the key focus of implementation, OpenGL decided against and seems to discourage development of further standards. Cg does the first as well, except it can be extended to support more than one standard "assembly", and that assembly can end up being suitable for specific hardware. The thing is, the same can be said of HLSL, as both it and the LLSL evolve, but it is a matter of who is providing the compiler expertise and who the IHVs have to work with if issues/bugs crop up in the framework.

The difference seems to be that:

With Cg the vendor has to provide and develop the backend, so therefore has direct control (and direct expense) in return for giving up control of frontend toolset development to nVidia and establishing a political and economic dependency on nVidia.
with HLSL, the vendor has to depend on MS targetting their hardware's peculiarities for a profile, if that hardware has any with regard to the current shader language featureset, in return for giving up control of frontend toolset development to MS and establishing a politicial and economic dependency on Microsoft. They also have the benefit, atleast at current AFAIK, that it is in Microsoft's interest to help them in this regard if they wish, and have the alternative (note the bold statement above) of simply designing hardware capable of the performing to the spec without outstanding issues.

I don't see the difference as one of shader compiling capability, but a political and economic one of where control lies, and therefore a matter of how technical execution follows the limitations/possibilities of that control. The "technical" differences between Cg and HLSL are expressions of that control, but are not related to what the compilers are capable of doing in the fashion that you propose, atleast as far as I've noticed being established as of yet.

BTW, I look forward to GLslang representing an alternative model in line with the non-centralized nature of OpenGL, and the competition going forward the different approaches will produce. I do think it is the more onerous initial path, but I expect DX to end up being driven to evolve along similar lines unless IHV competition is removed as a factor (which it is at current until "DX 9 and higher" hardware from other IHVs finally makes an appearance, but hopefully that will change).

DemoCoder · Jul 18, 2003

Joe DeFuria said:
Bingo. Cannot be altered.. Thing is, DX9 HLSL cannot be altered for a reason. Standardization and stability.

Alteration of the compiler optmizer would not effect the standard.

DemoCoder · Jul 18, 2003

Demalion, your entire response is predicated on the assumption that I am talking about and supporting Cg, which is incorrect. Thus, many of your responses refer to things that are not even in my messages (I have never said Cg solves any of the problems. To critique DX9 HLSL is not to endorse Cg)

demalion said:
This statement (though not necessarily disagreement with the isolated statement it quotes, IMO) seems slightly facetious, and largely based on a body of assumptions that you did not establish as valid...you can have "one" compiler behave differently, so there being only "one" MS compiler for DX HLSL does not validate your statement.

You cannot ship a single compiler that will statically compile optimized code for all graphic cards both future and present. It would have to be a HUGE monolithic compiler, and everytime a new vendor shipped a graphics card, or someone updated their part of the compiler, Microsoft would have to reship it, and all developers would have to reship their games with the recompiled shaders.

If you want to play semantics, yes, you could theoretically, have one large uber compiler (like GCC), but how no one will get any benefit in their games if new optimizations are discovered later, nor will people with future graphics cards benefit on older games.

The optimizations have to happen at dynamically at runtime if you want them to be optimal per card and up to date.

Aside from that, and focusing only on the "ps_2_0" proifle, the idea of "shorter shader is better" is an idea with independent merit for shaders, and with the simplicity of expression with regard to graphics shading (very specific math and data fetching operations being the majority of workload), it is not the given you propose that "you don't want a single compiler for all video cards"

Your criticism falls flat on its face the minute you vary the number of executable units in the GPU, since then the instruction stream would have to be reordered to take advantage of maximal parallelism. Add in difference pipeline depths, branches, and texture cache sizes, and there is no way you are going to be optimal unless it is done in the driver.

This is precisely why Intel and HP are researching runtime recompilation of Itanium, taking a JIT-like approach to native code, because varying the number of functional units makes a huge difference in how you schedule instructions, and static compilation will ensure poor performance on future chipsets.

Also the assertion that "shorter is better" is not always neccessarily true. The classic example is loop unrolling. Given a loop (say, iterational count 10) for a vertex shader, which do you predict will be faster: unrolling the loop into 10 copies of the body (longer), or the smaller code with the branch. The answer depends on how well the GPU handles branches, whether the unrolled code will fit into the vertex shader cache or not, and a host of other issues that will vary from GPU to GPU.

You're naive "short is better" optimizer might end up alot slower if branches are expensive.

Well, if you're talking about the LLSL, you're describing what MS is doing.

Where is MS providing an API for pluggable runtime ccompiler backends?

(lots elided, irrelevent, based on a misunderstanding. My message is talking about runtime compilation, not static command line compilers with a fixed number of backends)

The only point I disagree with here is your assurance that the observed deficiencies for the NV3x are deficiencies in the HLSL->LLSL compiler's absolute capability, and not simply a deficiency in the hardware that the compiler does not yet take into account (in a public release).

How is a static command line compiler that is infrequently updated going to address the issue of new hardware coming out (which is not taken into account). Let's say FXC takes the NV35 into account. What happens when the NV40 and R400 arrives? Games will be shipping with old versions of FXC and compiled shaders and games won't get the advantage immediately of any compiler updates until they download game patches! Whereas a backend runtime compiler will instantly affect the performance of all games.

Basically, you maintain that the ps_2_a profile isn't a profile that accomplishes what you say atleast as well Cg does, and I'm pointing out that I don't see why you are sure that this

I maintain nothing. I have said nothing about ps_2_a and Cg. Where are you pulling this stuff from? I am talking about a general improvement to the architecture of Microsoft's compiler. I have not once said Cg "solves" the problem I am discussing better than anything else.

Hmm...you again seem to maintain that the capabilities represented by HLSL profiles are not applicable to the problem you propose.

It is not the profiles that are the problem, and I never said Cg solved it either. I am talking about compiler architecture here. All compilers have an intermediate representation used for optimization. Instead of passing assembly language to the device drivers for further optimization, I am talking about a simple change in architecture that allows IHVs to plug into this IR data structure and take over the generation of microcode or assembly themselves in the final phase of the compiler.

Either that, or upgrade the DX9 assembly so that it is rich enough to recover the full semantics of the original source code.

This is a simple computer science compiler theory issue, why does this have to turn into some inference that I am endorsing Cg.

Entropy · Jul 18, 2003

DemoCoder said:
Intel does research, just like everyone else. Part of that research is compiler theory (especially true for Itanium)

Intel provided the compiler because the existing compilers were unable to do instruction scheduling for the newer Intel pipelines nor were they able to do any vectorizations such as striping, or deal with branch prediction hints. Intel's compiler sparked lots of revisions in GCC (hell, Intell hacked GCC themselves for up to a 30% improvement, which was taken and folded into PGCC) and now GCC3 has a modern instruction scheduler. Intel contributed lots to this. Ditto for ongoing work on Itanium.

You are insinuating that Intel's compiler purpose was to cheat on benchmarks. I say that when developing a compiler, the only way you can measure your success is by benchmarking, thus all compilers are developed to optimize benchmarks and test cases.

Were it not for Intel's compiler, our CPUs would be 20% slower. This is not benchmark cheating, it is real.

I don't know why you react as you do to my post other than that you are attaching some emotional bias you carry in the Nvidia-Cg case over to the Intel case. I'm insinuating nothing, and we are writing much the same thing.

However, your last paragraph cuts to the core of why Intels compiler caused such a heated debate at the time. SPEC is a system level benchmark where "system" does not stop at CPU+memory, but includes the software, most notably the compilers. This was and remains a necessity in order to be able to compare systems running on CPUs using different instruction sets.

SPEC was not conceived as a toy benchmark for the bored. The fundamental idea and problem in any benchmarking is transferability of results. Before Intels compiler, a customer could look at the configurations of hardware and compiler submitted, and assuming the same compiler flags could be used, the remaining problem was mapping benchmark results to the behaviour of your own application(s).
Not so with Intels compiler.
Since it was essentially useless for real projects, Intels compiler broke the above assumption of utility. A customer could not look at the benchmark results, and make assumtions about how his own codes would behave, since the scores where not from a production level compiler. This was in contrast to all other vendors at the time. In short, Intels compiler made it impossible to use the benchmark results to make useful performance predictions. Ergo, the results were useless.
For anyone but Intel PR, that is.

And that was what caused such so much uproar at the time.

Of course some accused Intel of cheating, but it was on a moral ground, as they clearly did not break the formal demands of SPEC. After all the compiler was in principle purchaseable for instance. But Intel did break the implicit assumptions that made comparisons of results between processor families possible.

Entropy

How Cg favors NVIDIA products (at the expense of others)

Tim Murray

the Windom Earle of mobile SOCs

Ilfirin

DemoCoder

Joe DeFuria

DemoCoder

Joe DeFuria

DemoCoder

Doomtrooper

Joe DeFuria

DemoCoder

DemoCoder

Joe DeFuria

DemoCoder

Entropy

DemoCoder

Joe DeFuria

demalion

DemoCoder

DemoCoder

Entropy

Similar threads