DemoCoder said:
Doomtrooper said:
Hardly, HLSL is outperforming CG on a 5600, so how can that above statement be correct. A developer has a much better compiler to use that has already been optimized for all hardware, vs. one.
Doomtrooper, there is no way to be optimal for all hardware. You wouldn't want a single device driver for all video cards and require register level compatability to make it work, and you don't want a single compiler for all video cards.
This statement (though not necessarily disagreement with the isolated statement it quotes, IMO) seems slightly facetious, and largely based on a body of assumptions that you did not establish as valid...you can have "one" compiler behave differently, so there being only "one" MS compiler for DX HLSL does not validate your statement. Aside from that, and focusing only on the "ps_2_0" proifle, the idea of "shorter shader is better" is an idea with independent merit for shaders, and with the simplicity of expression with regard to graphics shading (very specific math and data fetching operations being the majority of workload), it is not the given you propose that "you don't want a single compiler for all video cards". Designing a chip to execute each instructions as quickly as possible is not something only ATI can choose to do,
it is just something that nVidia decided not to do when they designed a chip with so many issues with executing a shader workload. Dealing with things after the fact of someone having created such hardware, the "single compiler" hypothetical (i.e., as a label that does not fit what it appears HLSL will be) is indeed undesirable, but so are the hardware characteristics that perform poorly relative to other hardware when implementing a specified workload even when these issues are taken into account.
That's for the "ps_2_0" profile only, and you seem to be making assumptions that neglect even the simple observation that only major register performance issues on nVidia's part prevent their architecture from benefiting from the exact same type of compiler output. Why does that inherently represent other IHVs at all, and ATI's benefit from the "ps_2_0" profile does not? Not being able to take advantage of it indeed
might represent other IHVs, but wasn't it a mistake for nVidia for floating point shading?
Of course, it might also be dictated by lack of design capability or limitation of design evolution decisions (for nVidia, and other IHVs), but that possibility does not warrant omitting a discussion that establishes why this possibility is assumed when it seems vendors would want to seek to avoid it.
You want the compiler to be part of the device driver, and you want each vendor to ship their own optimizer that optimizes it at runtime.
Well, if you're talking about the LLSL, you're describing what MS is doing. I'm not sure if this "You" is Doomtrooper, or if you are just "speaking for what everyone should want"...if the latter, I'd presume you meant HLSL. My reply should hopefully address both, cases.
...
All Microsoft's compiler can do is perform generic optimizations like common subexpression elimination, copy propagation, and dead code elimination. It cannot schedule instructions (reorder them to take advantage of each card's architecture) nor alter it's register allocation strategy based on card architecture.
A very definite statement, but I'm not sure where you get "cannot". Perhaps I've just failed to understand something about the DX compiler, but I don't see how Cg's custom backends are doing anything the HLSL target system
cannot do as well, in DX. It might not now, and MS may decide it will not (possibly with nVidia's encouragement if their strategy is for Cg to replace DX or atleast to push for that as part of a strategy), but "cannot" is a different kettle of fish than "does not". It is not suitable to confuse the two, because it proposes an observation of what seems to be the result of political and economic maneuvering on MS and nVidia's part (does not) as a technical limitation (cannot) without providing support (in what you've said so far, AFAICS) for that assertion.
Some people expect the device driver to do this on the shader assembly, but that is essentially trying to have the device driver reverse engineer the full semantics of the source code out of compiled binary at runtime, and you won't get the same level of optimizations as if you start with the original source.
The only point I disagree with here is your assurance that the observed deficiencies for the NV3x are deficiencies in the HLSL->LLSL compiler's absolute capability, and not simply a deficiency in the hardware that the compiler does not yet take into account (in a public release). The question in that case relates to
this thread and to the when and how of the answers to some of my questions fall out. Basically, you maintain that the ps_2_a profile isn't a profile that accomplishes what you say atleast as well Cg does, and I'm pointing out that I don't see why you are sure that this (if true, which it might be at current) is a technical limitation and not a product of "politics" and/or the result of the majority of nVidia's
current lineup being ill-suited to floating point processing.
Since both HLSL and GLSLANG contain semantics which are not expressible in shader languages (e.g. loops, branching, etc) some of that information will be erased by the time it reaches the device (e.g. branches turned into predictation) which means if the hardware actually contains *real branches* it will have to "infer" this intent somehow from a batch of predicated instructions.
Hmm...you again seem to maintain that the capabilities represented by HLSL profiles are not applicable to the problem you propose. If you have some reason for stating this, provide an indication for me please. Also, while I recognize that the extended 2.0 and 3.0 "LLSL" capabilities might prevent the most efficient utilization of the branching architecture for a specific hardware design, I'm not sure how this is established to be the case currently, or how Cg is demonstrating itself to being better at handling this. With nVidia being "rich" in comparison to other IHVs (except maybe ATI), I'm also puzzled as to how Cg, if you are proposing that it is better in this regard, is demonstrating that other IHVs would be able to execute even as well as nVidia (which, demonstably, is not very well at the moment) such that the increased "in house" control offered by Cg would provide benefit, and why the DX HLSL could not have evolved in the meantime to provide that benefit to those IHVs, assuming it does not in its current form.
I say the device driver should contain the backend of the compiler, and the front end merely does generic optimizations and stores the result as a serialization of the internal representation of the compiler.
Yes, but this contradicts the idea of offering a LLSL/"assembly" paradigm that reflects the full capabilities as an alternative to the HLSL. Some developers want this paradigm, some do not. I'm not disagreeing with the idea of "not", but disagreeing that you've established that your commentary above establishes your conjecture as to why that "not" is a given as being factual at the moment (though I tend to think it will be, maybe as soon as mainstream PS/VS 3.0 implementation, depending on how the HLSL/profile situation and vendor hardware turn out at that time).
Speaking of DirectX:
MS decided in favor of the standard assembly being the key focus of implementation, OpenGL decided against and seems to discourage development of further standards. Cg does the first as well, except it can be extended to support more than one standard "assembly", and that assembly can end up being suitable for specific hardware. The thing is, the same can be said of HLSL, as both it and the LLSL evolve, but it is a matter of who is providing the compiler expertise and who the IHVs have to work with if issues/bugs crop up in the framework.
The difference seems to be that:
- With Cg the vendor has to provide and develop the backend, so therefore has direct control (and direct expense) in return for giving up control of frontend toolset development to nVidia and establishing a political and economic dependency on nVidia.
- with HLSL, the vendor has to depend on MS targetting their hardware's peculiarities for a profile, if that hardware has any with regard to the current shader language featureset, in return for giving up control of frontend toolset development to MS and establishing a politicial and economic dependency on Microsoft. They also have the benefit, atleast at current AFAIK, that it is in Microsoft's interest to help them in this regard if they wish, and have the alternative (note the bold statement above) of simply designing hardware capable of the performing to the spec without outstanding issues.
I don't see the difference as one of shader compiling capability, but a political and economic one of where control lies, and therefore a matter of how technical execution follows the limitations/possibilities of that control. The "technical" differences between Cg and HLSL are expressions of that control, but are not related to what the compilers are capable of doing in the fashion that you propose, atleast as far as I've noticed being established as of yet.
BTW, I look forward to GLslang representing an alternative model in line with the non-centralized nature of OpenGL, and the competition going forward the different approaches will produce. I do think it is the more onerous initial path, but I expect DX to end up being driven to evolve along similar lines unless IHV competition is removed as a factor (which it is at current until "DX 9 and higher" hardware from other IHVs finally makes an appearance, but hopefully that will change).