HLSL 'Compiler Hints' - Fragmenting DX9?

Ilfirin · Oct 21, 2003

DemoCoder said:
I find the idea of hour long shader compiles to be kind of absurd given the size and volume of shaders. Even the worst C compilers with full optimizations dish out tens of thousands of lines per second. Something is very wrong with the compilers used or the way the build is being done it sounds like.

It's a direct result of the engine design a lot of people are going for (which I personally think is the wrong way to do things). Where the shaders are generated automatically from a GUI interface, put in the hands of the artists. Artists then tend to go overboard with nearly a unique shader on every unique object, so you could easily end up with several thousand shaders throughout an entire game. Many of those shaders then need to be recompiled multiple times with different pre-processor definitions (for various different reasons).

OpenGL guy · Oct 21, 2003

DemoCoder said:
CPU compilers don't spend as much time on optimizations as they could because compile times are are an issue and in a typical CPU application, 90% of the execution time is spent in 10% of the code, so there is a limit to how much generalized improvements can make a difference.

However you can specify if you prefer fast code, small code or short compile times with compiler options.

With GPUs, shaders are hotspots, critical sections. Optimizations are far more important. With CPUs, a 5% generalized improvement won't necessarily affect the runtime unless it happens to affect the hotspots. With GPUs, shaving off 5% of shaders is more likely to affect the overall game speed.

Shaders aren't always hot spots, but probably will be in most future apps. However, shaving 5% on a shader used 1% of the time doesn't gain you much.

Couple that with the fact that IHVs will likely be adding in "detection" for some games, e.g. "if game == HL2 and shader == water, then set subexpression_elimination_threshold = 0.8" to tweak performance. C compilers have been consistently updated for the last decade.

C compilers get updated, no question about it. But do C compilers contain detection for someone's C programs? No. As you state above, compile times for shaders are generally not an issue, so you're free to take as long as you want to optimize. But application detection raises the problem of confusing ISVs. "Application X does Y fast, yet application Z does Y slowly."

But it will, just as drivers change behavior on each release, sometimes with regressions. Just look at NVidia's.

Uh, this is supposed to be a bad thing. Changes in behavior and regressions should be avoided whenever possible. Adding more complexity to the driver is just inviting more issues.

Yes, I have written several compilers. Compiler theory is a field of academic study, on which there is 30 years of research, which is taught in colleges around the world, which has numerous papers in the public on how-to. Any books, college courses, or Phd papers on ICD authoring?

Writing an ICD is not some mystic art. I've never written a compiler and have no idea how to go about writing one. Yet, I could probably create an ICD from "scratch" (I'd cheat and use some other publicly available ICD as a skeleton then put in all the finishing touches).

Despite the fact that I have actual experience writing compilers, if I did not, it would not be difficult to learn. Tools exist to assist novices in generating parsers from specifications, and even to generate ASTs. From ASTs, there exist well known algorithms for generating IR and performing optimizations on the graph. And from IR, there are well known algorithms published for generating the final machine code. In fact, there are "step by step" compiler implementation books available for neophytes.

But you're assuming that computer programming is generally easy. I've never had a course on compilers. I've never had a course on C for that matter. I wouldn't relish the idea of sitting down with books to figure out how to write a compiler.

If you have never learned how to write a Fibonacci Heap, doing so why would simply a matter of consulting the literature. If you have never programmed an ICD, the best you can currently do, atleast from my research, is to download something like MesaGL and look at the internals. Basically, you have the OpenGL spec itself, and you have sample code, and that appears to be it.

How are your two examples different? It's just a matter of degree. Some people learn well from seeing examples (I'm one of those), others learn better from studying the underlying theory and working everything out for themselves (I can do that if I have to but am inherently a lazy person).

Fact is, ICD writing, atleast from my point of view, is a factor of on the job experience. Compiler writing is a blend of public knowledge and job experience. Where can I go to learn how to write an ICD?

Job experience, classroom experience. No real difference in my opinion.

Fine. Add to it as an external DLL, like GLUT and have it generate ARB_fragment_programs. Explain to me how this is a difficult job for you. Seems to me that it could be accomplished with hardly any changes to the actual driver at all. I could overlay an OpenGL2.0 wrapper on your driver today which delegates compilation to an external lib.

That would be fairly simple, however you'd miss out on possible optimizations at the source level. Those are the sort of optimizations that are difficult.

Of course, this is the inelegant "hack" implementation, but perhaps you can explain why it is complex.

Inelegant and unoptimal at once.

OpenGL guy · Oct 21, 2003

Ilfirin said:
DemoCoder said:

I find the idea of hour long shader compiles to be kind of absurd given the size and volume of shaders. Even the worst C compilers with full optimizations dish out tens of thousands of lines per second. Something is very wrong with the compilers used or the way the build is being done it sounds like.

Click to expand...

It's a direct result of the engine design a lot of people are going for (which I personally think is the wrong way to do things). Where the shaders are generated automatically from a GUI interface, put in the hands of the artists. Artists then tend to go overboard with nearly a unique shader on every unique object, so you could easily end up with several thousand shaders throughout an entire game. Many of those shaders then need to be recompiled multiple times with different pre-processor definitions (for various different reasons).

Let's not forget about shaders that look different but compile to the same result. These sorts of duplicates should be removed when possible so that the driver doesn't need to maintain data structures for the duplicates.

DemoCoder · Oct 21, 2003

OpenGL guy said:
But application detection raises the problem of confusing ISVs. "Application X does Y fast, yet application Z does Y slowly.

Yes, it raises the issue, but that doesn't mean they won't do it. Didn't ATI do some shader replacements in the driver with functionally identical ones?

You're right, most compilers come with hundreds of variables you can tweak to control the optimizer. This means that OGL2.0 needs an interface to expose the compiler environment to the developer if he needs to tweak it, so the dev has control, instead of the IHV doing the tweak behind his back, e.g.

glSetCompilerFlag*(GL_COMPILER_LOOPUNROLLTHRESHOLD, 0.8f);

(or better, take strings)

Writing an ICD is not some mystic art. I've never written a compiler and have no idea how to go about writing one. Yet, I could probably create an ICD from "scratch" (I'd cheat and use some other publicly available ICD as a skeleton then put in all the finishing touches).

But there's a big difference between picking up some open source and hacking it, and tying that source into the low level registers, interrupts, internal MS kernel structures, dealing with AGP bus, etc. It's far more specific. If writing an ICD was as easy as just picking up the publically available source, why did it take so long for all of the IHV's to get it right?

Perhaps you didn't have the experience of having to write a compiler, but those getting an BS or MS in CS in good US colleges will run into the concepts: automata theory, principles of programming languages, discrete mathematics, graph theory, etc, and many colleges have compiler courses that teach how to write Scheme or Pascal compilers.

I would say that the population of computer science majors who have had to write a compiler is far larger than the number of people who have authored ICDs.

But you're assuming that computer programming is generally easy. I've never had a course on compilers. I've never had a course on C for that matter. I wouldn't relish the idea of sitting down with books to figure out how to write a compiler.

Then go download free compilers: GCC for example, which has tons of well documented source code, and numerous tutorials and examples for how to extend it, including public documentation with the source code to explain the optimizers. That alone is far more useful than what I can find as far as OpenGL skeletons.

That would be fairly simple, however you'd miss out on possible optimizations at the source level. Those are the sort of optimizations that are difficult.

Which is exactly why I've been arguing that putting the full compiler in the driver is better than the DX "separate compiler tool, driver optimizes assembly only" approach -- missed optimization opportunities. However, I also feel it is a strike against forwards compatibilility and future proofing, since I am willing to bet that once ATI and nVidia start shipping OGL2.0 drivers, updates will be frequent at first.

You may disagree, but look at the changes that happened between the first ATI 9700 driver release and today's Catalyst. Obviously, the first couple of releases are "make sure it works, meets the spec, and passes all unit tests type of release", and ATI's first attempt at OGL2.0 compilation won't be near optimal just like Nvidia's first Cg compiler was atrocious and still needs lots of improvement.

KimB · Oct 21, 2003

demalion said:
On the one hand, you state that HLSL requires statically linked upgrades to "all your programs" to get improvements when you download a new driver.

On the other hand, you state that IHVs have already done "90%" of the work necessary for implementing a glslang compiler by implementing the driver side optimizer for DX "LLSL".

If they've done "90%" of the work for their driver side optimizer, why do you have to "download patches for all your programs" to get the performance improvements from the driver. I have the same question with other numbers that are less than "90%", but not "0%".

Optimization will always be easier to do if starting from a higher level.

Humus · Oct 21, 2003

Ilfirin said:
DemoCoder said:

I find the idea of hour long shader compiles to be kind of absurd given the size and volume of shaders. Even the worst C compilers with full optimizations dish out tens of thousands of lines per second. Something is very wrong with the compilers used or the way the build is being done it sounds like.

Click to expand...

It's a direct result of the engine design a lot of people are going for (which I personally think is the wrong way to do things). Where the shaders are generated automatically from a GUI interface, put in the hands of the artists. Artists then tend to go overboard with nearly a unique shader on every unique object, so you could easily end up with several thousand shaders throughout an entire game. Many of those shaders then need to be recompiled multiple times with different pre-processor definitions (for various different reasons).

That's a problem with the application, not the API. Switching shaders is an expensive state change, so using a different shader for every object is quite stupid.
The API shouldn't adjust to the needs of badly written applications.

Humus · Oct 21, 2003

OpenGL guy said:
So now you want to require each IHV to provide their own HLSL compiler. IHVs already have to create such a beast for OpenGL 2.0, because the API resides in the driver itself. In this case, the API is not inside the driver. Why should a new API require an IHV to commit more people to driver development?

Is it better that we require MS to commit the same amount of people to write compilers to suit every hardware out there? We're just moving the work onto a third party.

OpenGL guy said:
I think this is a step backwards. More IHVs with more compilers will mean more bugs. Application A works around bug B in an IHV's compiler, then IHV C has to workaround the application's incorrect behavior. Or the developer is completely unaware that that Bug B exists and thinks this is proper behavior. I see this already with assembly level code because of broken HW or drivers, I don't think it will get better. I also don't think this is good because having the HLSL's behavior potentially change on each driver release will cause confusing among ISVs.

Why this paranoia about bugs? Are IHVs worse at writing compilers than what MS is? We already have to deal with bugs in the HLSL compiler. The difference is that with an IHV bug you always have to choice to switch platform until the problem is solved. Or you can use #define to work around the problem for that particular IHV without affecting other IHVs. With a DX9 HLSL bug you're pretty much stuck until MS updates directx.

There will always be bugs, inconsistencies between IHVs and other such problems. Nothing new under the sun. DX9 HLSL didn't solve that problem one bit. Thinking that we would be overflowed with bugs because of JIT compilation is IMO mostly just paranoia.

OpenGL guy said:
Writing a compiler from scratch is easier than writing an ICD.

Click to expand...

What are you basing this on? Have you written an optimizing compiler from scratch? Have you written an ICD?

Well, here at Luth a course in compiler technology is a standard length course. Roughly one month of studies. The closest thing to ICD programming offered would be the course in reactive programming, where we wrote a driver for a simple AD/DA converter to control some odd pendulum device. That was quite hard to get right. I didn't read the compiler technology course, but I haven't heard anyone who read it complain about it being hard or anything, but I've heard plenty of complaints about the reactive programming course. And controlling a AD/DA would be a simple task compared to controlling a GPU implementing the full OpenGL spec.

Dio · Oct 21, 2003

OpenGL guy said:
But do C compilers contain detection for someone's C programs?

Actually, this has happened in the past - for the purpose of cheating on benchmarks (Dhrystone, Whetstone cheats are quite legendary and easily found with Google searches).

Bjorn · Oct 21, 2003

The closest thing to ICD programming offered would be the course in reactive programming, where we wrote a driver for a simple AD/DA converter to control some odd pendulum device.

Been there, done that

GameCat · Oct 21, 2003

OpenGLGuy, while I understand your objection to putting something as complex as a compiler in the driver, you already have one there. While I have no knowledge of the R300 microcode for shaders I'm sure it doesn't correspond 1:1 with PS 2.0. So to take advandtage of the architechture you need a compiler for PS 2.0 that plays to your hardwares strengths.

The argument then boils down to whether the extra complexity exposed by a higher level "assembler" sent to the driver is prohibitive or not. If we neglect the more complicated parsing of higher level languages (which is fairly trivial to implement anyway) then we are left with optimizing some sort of intermediate representation. But you're already doing this! What aspect of the more high level "assembler" is it that makes it so much more complex to optimize than the current one? That you have to do register allocation? I'm truly curious.

And as for bugs in the compilers it will of course be a problem. But it already is a problem with the current "assembly" compilers in the drivers and I personally doubt the difference with a higher level IR will be large. I just don't see how changing the abstraction level of the IR will introduce such a huge number of bugs. Aggressive optimizations might of course, but then the driver developer is free to prioritize stability over performance.

The big question IMHO is whether the extra work imposed on the IHVs by glslang's approach have slowed down implementations to the point were people are switching to Direct3D. We need a high level shader language for OpenGL yesterday. I actually thing Cg was fairly good for OpenGL in the sense that there is a (admittedly crappy) compiler for a HLSL that works under OpenGL.

DemoCoder · Oct 21, 2003

The theories involved in understanding compilers are pretty fundamental to understanding all of computation, even metamathematics.

Once you understand how automata, grammars, lambda calculus, and graph theory work, you can apply it to many many problems in computer science.

This web page you're looking at alone uses 3-4 grammars (html, css, http, javascript) plus additional ones on top of that for CSS layout.

Optimizing a OpenGL scenegraph, for example, can utilize tree-grammars.

You'd be surprised how often techniques you learn leading to compiler class can be applied to solve problems in elegant and efficient ways in many applications, and often in ways that are less buggy, formally verifiable too.

Even the DX9 LLSL can be parsed then optimized using a grammar to "rewrite" it to microcode. (as opposed to maximal munch)

Eolirin · Oct 21, 2003

I think what OpenGL Guy was trying to point out about bugs in the compiler code was that with DX HLSL at least all the bugs are the same. Since there's only one implementation of the compiler any bugs in it are obviously going to be the same across every platform. When you go into multiple compiler implementations you end up with the potential for an exponentially larger number of problems. To the point that workarounds for those problems may be extremely difficult if not impossible because they break some of the other IHV's drivers. At the very least there's an exponential increase in complexity unless every IHV manages to provide flawless compilers. Something that's definately not going to happen. You now have to remember the flaws in every single compiler instead of just one.

You're also dealing with a greater amount of resources being needed for driver development. While yes, the amount of resources that MS needs to develop HLSL would be roughly equivelent to any one driver team the fact of the matter is that MS is much larger and can afford much greater amounts of staff. There'd be a bar to entry that would be very high especially for the smaller IHVs. Then you also need to take into account that what MS does once the IHVs would need to do multiple times as each one would need their own implementation. Even taking into account hardware specific profiles and optimizations MS doesn't have to rewrite all of the compiler code to add things like that. Each IHV would need to, from the ground up.

There are many flaws in HLSL which have been pointed out, but it's the more easily controlled solution on a logistic level. That doesn't necessarily make it the best, and looked at from a performance level there's definately a completely different picture. But it remains to be seen if the logistical, not the technical, problems can be overcome with gsLang. It's really not an issue of being able to write the code needed so much as being able to coordinate all of the disparate groups of people in a way that makes development easy and efficent.

demalion · Oct 21, 2003

DemoCoder said:
demalion said:

If they've done "90%" of the work for their driver side optimizer, why do you have to "download patches for all your programs" to get the performance improvements from the driver. I have the same question with other numbers that are less than "90%", but not "0%".

Click to expand...

Quite obviously, because there is a limit to how much can be done to optimize DX9 "LLSL" in the driver and therefore improvements have to be made in the compiler itself.

Yes, it is obvious there is a limit, no it is not obvious that "improvements have to be made in the (HLSL) compiler itself". You keep on turning "might" into "definitely", and allowing your recognition of the LLSL compiler disappear.

If this statement were not true, then why would the FX profile even need to be created.

To reduce the amount of different types of optimization strategies the HLSL compiler isn't capable of expressing and because the FX has limitations that require special attention.

The fact is, NVidia's driver could only do so much on the PS2.0 FXC output, and therefore needed a special hack to the compiler.

Why is the ability to address an architectures limitations a "hack" in DX HLSL, and a feature in glslang?

The fact that new profiles have to be consistently created...

No, the fact is that a new profile had to be created for the NV3x.

...is an admission that LLSL does not have representational power to allow the driver to do the best optimizations.

For the NV3x and it slowing down with temporary register usage beyond 2 or 4, yes. This seems to be a problem in a design intended for running long shaders, regardless of API...don't you think? Perhaps that might be relevant to what that issue is an "admission" of?

Now, there are two questions. #1 does NVidia have access to the FXC source and FX profile heuristics, or does Microsoft have to maintain this code? #2 How often can NVidia ship updates to the compiler and #3 If the compiler is updated, how will the improvements be realized in games?

What do these questions relate to?

...

Hey, wouldn't it be neat if you had different "thingies" for changing compiler behavior, based on introducing such architecture specific heuristics as necessary?

Click to expand...

Yeah, and wouldn't it be nice if these compiler options were a constant and didn't have to change based on runtime state?

This doesn't seem to change that they're there, which was my actual point in making that comment in reply to your statements. But hey, why bother recognizing that I made my point, when there is an opportunity to restate your preference for a constant and driver specific HLSL->GPU opcode compiler?

Wouldn't it be nice if LLSL worked as well for optimization in the driver as you continue to assert, even though you have no apparent experience writing either compilers or drivers, and can offer no explaination as to why LLSL + profiles remains the best solution for extracting maximum driver performance.

Let's try this again: "For future reference, please don't have a discussion with someone who thinks HLSL is perfect and call them by my handle." When will you be willing to have your discussion of what I "assert" reflect what I've actually stated instead of what you arbitrarily want to argue against?

OK, so how do these things answer any of the points I brought up?

Click to expand...

Because some heuristics can be best decided at runtime or after profiling. Based on driver state, some drivers might even perform a kind of "just in time" compilation, recompiling shaders based on other factors, such as bandwidth usage. OpenGL drivers do this today by deciding whether to allocate memory in video ram or AGP memory based on utilization statistics. A "heuristic" hardcoded into the driver with no regard to actual runtime factors would not be as optimal.

The points I refer to:

demalion said:
It is actually better than the "problem" we have right now for implementing game performance improvements, because targetting the "LLSL" and the "90% of the work of glslang" you propose for IHVs is not statically linked. Your self contradiction is mind boggling, as is this bogeyman of "DX HLSL will make you have to download patches every month!".

Do you see "the HLSL methodology will always extract the best possible optimizations" in there?

But continuing this misplaced, but relevant, discussion:

Based on driver state, some drivers might even perform a kind of "just in time" compilation, recompiling shaders based on other factors, such as bandwidth usage. OpenGL drivers do this today by deciding whether to allocate memory in video ram or AGP memory based on utilization statistics. A "heuristic" hardcoded into the driver with no regard to actual runtime factors would not be as optimal.

An example would be a library call to normalize() which may evaluate to a cube map, slow but accurate macro, or fast but with errors macro, based on what other pipeline state exists.

This does not sound quite "trivial" to implement for a GPU. If it is, here is an opportunity to actually show me something new and meaningful about why glslang will be able to do offer significant advantage with the ease you maintain.

Note: PS 2.0 has a normalize macro as well.

DC, stuff like this just doesn't make sense. You proposed that IHVs did a "bad job". I provided a reason why it seems a valid characterization for one particular IHV's compiler: because their hardware needed it badly.

Click to expand...

Well, it sounded to me like you were saying that the blame for NVidia's poorly optimizating drivers is because their hardware sucks, instead of that their driver developers have a difficult task.

I'm saying they have such a difficult task for achieving good performance, because their hardware is "deficient". Blame for the driver developers poorly optimizing drivers->difficult task. Difficult task->hardware is "deficient". Sucks doesn't generally seem appropriate to me, it sounds like an absolute not a comparison.

BTW, when I said there were obvious problems with your P4/SSE analogy, and that comparing "SSE to 3dnow!" would be required to get past at least some of them, I meant you to stop and give some thought their being a problem with just continuing to use it.

...Future GPU's might have bizarre architectures that don't fit what FXC/PS2.0 was designed around today.

Yes, they might. Is this supposed to be something I disagreed with?

OpenGL2.0 provides more semanic awareness to the driver, and more freedom for optimizations, something which might allow "defficient" HW which runs old software poorly, to later, as compiler technology advances, reach it's full potential.

You see that "might" in there? When you use that word, you're saying things I said in my initial post. When you say "must" and "WILL" and talking about application patches being required every month, or ignoring that there are issues with the way glslang went, you're saying something I disagree with. Please pay attention to me pointing out the distinction this time.

...bad quote, representing the opposite of my viewpoint...
Or it could be that you couch every statement in terms of doubt and "maybes" and half-assertions, and you never state anything of importance.

I do think the issues with glslang implementation are important, as well as accurately considering what it faces in competing with DX HLSL . You're not agreeing with and not being willing to discuss something does not automatically equal "not important".

What can anyone learn from any of your discussions of LLSL and GPU compilers?

If they read as carelessly as you seem to be, obviously nothing. However, I don't claim to be here as a teacher, but as an explorer. I'd say student, but your view of a student seems to be as someone on the receiving end of any abuse you are in the mood for, and who doesn't dare propose a dissenting opinion, which is rather markedly incompatible with my personal view and approach to both roles.

Have you added anything to the discussion besides "well, maybe glslang won't do as advertised, I think, perhaps, but have no information to offer"
Really.

Why are so many of your questions rhetorical and insulting? What does this "add to the discussion", then?

I started off with a description of my view of the glslang/HLSL comparison, and details about my observations that went into that. If someone hadn't thought about things I proposed, they might learn something, or spark new thoughts outside of what they'd thought of before, if they choose. Or, they could replace consideration of what I'd said with poorly related analogies that simply pre-package their opinion, because they've pre-determined there is nothing to learn from someone.

Those are the two examples that seem to work towards and away from people learning things, IMO. But anyways, where were you going with this?

If I ask you to read my initial discussion of the comparison to the approaches again, or otherwise point out that this was already a topic of my conversation before your analogies steered away from it, will this illustrate the problem with them more clearly?

Click to expand...

Why don't you succinctly state it here.

Err...why don't you just go actually read it?

Too much of a stretch? The "advantages", "reality intrusion", "my opinion" summary should make it pretty easy to figure out the advantages, disadvantages, and summary of their situations I propose. I don't see what truncating it and repeating it, and lengthening this already long post, will accomplish.

I propose that more profiles be created as necessary to reflect a LLSL expression suitable for "intermediate->GPU opcode" optimisation by IHVs. How many will that be? Well, that depends on the suitability of the LLSL and profile characteristics itemized at that time.

Click to expand...

Well, why don't you tell us how you would evaluate the suitability of LLSL?

I'd look at precision issues (all upcoming hardware I know of specifications for process at fp24 for pixel shading) and register issues (unknown) of the hardware. In general, look at factors that cause performance penalties for a base PS 2.0 shader. If there are issues with these, which I do think IHVs will try to avoid, the issue of suitability (in comparison to the glslang methodology) would be how difficult it would be to address this compared to implementing and maintaining a glslang compiler.
I'd then consider possibilities for performance improvement beyond 1 op/clock, and the requirements for achieving that. The issue of suitability would again be the comparative difficulty in addressing and maintaining this compared to the glslang compiler.

Many people, not just me, have already told you the shortcomings.

Well, there are some theoretical shortcomings, and there are your shortcomings like "you'd have to compile an application every month". I've tried to freely discuss the first ones, and here you are focusing on my reaction to the second as if it is the same thing. This makes it seem to me like you're more interested in attacking me than discussing something while there remains the possibility that I might disagree.

And further, I have just told you that runtime or profile based optimization (wherein the developer runs the game, collects profile info from the driver, and then reruns the compilation feeding back the profile information) could do even better.

Yes, you mean that automatic version of what Carmack and Valve have talked about, right? That would be nice to see, but it doesn't exactly sound like something that is going to appear any time soon. If you have information indicating that it will, please share it, as I said above...your proposing just any possibility and that it will manifest "somehow" doesn't do much to answer my commentary.

And just as ATI and NVidia do driver-based game detection, no doubt, they would also be able to do compiler-heuristics tweaking on a per-game basis as well. Point being, your proposal that developers and endusers remain hostage to a centrally managed uber-compiler from Microsoft has serious problems, and you don't seem to even partway get it.

Who is this guy saying HLSL is without issues? I thought I was the guy who was pointing out that you are exaggerating the DX HLSL issues and ignoring any significance of those for glslang?

Your alternative proposes that a new and unique type of compilation paradigm will be 1) introduced every month

Click to expand...

No, my position is that N developers will ship M driver updates per year, yielding N*M updates to the game community, today.

Well, IHVs ship driver updates, not game developers. So how do these driver updates get multiplied? Are you proposing each person has "N" unique video card types installed?

I expect M compiler updates per N developers as well, due to the fact that IHVs will be discovering tweaks all the time which yield improved shader performance.

I'll let you convince OpenGL guy he'll be doing this, it would take me too long to try and resolve the conflicts apparent in it at the moment.

It's not a "new and unique" compilation paradigm. It's HL2 gets released, and the following month, ATI and NVidia discover simple heuristic tweaks which yield say, a 5-10% performance boost. Everytime a new games comes out, they might need to issue an update. Or, 3dMark2004 comes out, etc.

OK, who hid the driver side rescheduler on us? They "might" just release a new driver to offer performance improvements through that, right?

2) unable to be resolved by a back end optimizer of the "shorter instruction count" principle intermediary or whatever other characteristics in prior profiles there are.

2) depends on how many different approaches are needed for what IHVs do. We have 2 at the moment, one being "shorter operation itemization count, and executing within the temp register requirement without significant slow down". I think this is a pretty generally useful intermediate guideline that more than one IHV should be able to utilize.

Click to expand...

You've been given umteen examples why LLSL as it exists today is not a good representation format for these optimizations.

No, I've been given a few examples of why it might not be optimal by itself, and watched you arbitrarily turn "might" into "will" as it suited your agreeing with yourself, and complaining that I didn't "add anything" for inconveniently continuing to use "might".

NVidia couldn't resolve the issue in their driver alone because of this.

Well, looking at the hardware performance issues, the factors that prevented the LLSL from serving for this seem to be things that don't make sense for hardware trying to execute complex shaders in real time...doesn't requiring idle or redundant usage of transistor expensive computational units for certain clock cycle to overcome a design limitation for something needed more as shader complexity increases somewhere beyond the "PS 1.3" level (i.e. temporary registers) qualify as something pretty simple: a mistake?. This seems like something IHVs would be trying to avoid, whether talking about performance in glslang or DX HLSL.
This seems to indicate an extraordinary problem. Why "will" all problems for LLSL be similarly extraordinary?

And overriding principles (shortest register count, shortest instruction count) are too simplistic to capture everything that needs to be done, because they are competing goals.

No, they're not competing goals if you don't have a significant performance penalty simply from increasing temporary register utilization beyond a very low number. Perhaps you mean from a hardware design standpoint? I agree, but then I'm not saying glslang doesn't have a theoretical advantage, I'm saying it is obviously in an IHVs interest to avoid certain mistakes as high priorities with a given goal, if possible.

The most optimal code is not neccessarily at the extremes (shortest actual shader, or shader with fewest registers used), but is someone in between, and possibility finding the global minimal is extremely hard.

Outside of the NV3x, what type of performance yield are you proposing from this compared to what can be done with the LLSL?

Yes, and that is a superficial analogy, because whether people are arguing about one approach being too complex or prone to bugs compared to another is not the issue, it is whether it actually will be, and why. That's why I deride your insistence on it, and try to encourage discussions that don't depend on its substitution. How is my initial observation about the analogy incorrect?

Click to expand...

Because your comments add nothing and history is relevant.

Well, my comments mentioned why your usage of history was not relevant. Simply saying that "adds nothing" so you can say "history is relevant" doesn't do much to show otherwise.

The people claiming that implementation difficulty is a problem lost the argument. Likewise, those claiming compiler implementation problems will be proven wrong as well.

Your airtight argument astounds me.

Note about omitted text for brevity: your not taking time to try and understand something some said, and demonstrating that profoundly, is not the same as something not existing.

...You can appeal to authority and say "MS compiler developers are just that good", but it's a very weak argument.

I suppose I might as well, as that's the only argument you seem to want to respond to.

No one knows who NVidia and ATI and 3dLabs have hired to work on OGL2.0, and frankly, expertise in Visual C++ compilers doesn't neccessarily translate to GPU compilers, using classic Demalion analogy doubting.

Did you really think I meant that I object to all analogies, or are you just trying to propose a statement to support your stance and blithely blame any problems someone might have with it on me?

How about writing a compiler that has to interact directly with all the elements of a large device driver like the OpenGL ICD?

Click to expand...

Well, if it's a static compiler, it doesn't have to "interact" with squat, but is just a DLL that the driver delegates to to compile the shader. The only "interaction" part is in how you bind the registers to OGL state, and how you upload the code to the GPU. But these are the same issues that ARB_{vertex|fragment}_program have to deal with.

With added complexity in the IHVs added processing compared to the extension if they're trying to find more optimizations than they would in DX. Replacing a complexity with a greater complexity in terms of avoiding bugs and incompatibilities, not replacing complexity with simplicity or the same complexity.

If it's dynamic runtime compilation, it's more difficult, but if you think this is what driver developers have to worry about, then you are a tantamount to admitting FXC's static approach which ignores runtime state is insufficient.

How does this change FXC if we're talking about glslang doing this?

What are the challenges in this "black art"? I do think maintaining API compliance with complex interaction is part of it.

Click to expand...

Writing a compiler from scratch is easier than writing an ICD. Adding a compiler into an ICD isn't that complex, since, as I stated, the compiler doesn't need to "interact" with anything.

So there is no concern with the any other state updates influencing shader output at all on the GPU's end? Perhaps in relation to things like fog, AA, and texture handling? Is this what you mean, or do you just not like the word "interact"?

You call Compile(source), the compiler in the driver gets invoked, compiles to native code, and then the driver must upload it to VRAM and bind OGL variables to it. That is the only "interaction", an interaction, which is in fact, the same as what needs to be done today.

I'm not proposing that the act of compiling arbitrarily affects other parts of the driver, I'm proposing that IHV adaptation of compilation from a higher level introduces more opportunities for unexpected or unique characteristics in the output (bugs and conflicts between the outputs from implementations), as well as offering the possibility of more optimal code.

The compiler only needs to "interact" with OGL (that is, during compilation) if it is doing some kind of runtime optimization that you claim isn't neccessary, as we all know, because of your defense of LLSL as sufficient.

Eh? So you're saying, in your roundabout way of making up what to attack again, that glslang will indeed offer significant problems with bugs and conflicts in what you envision?

OpenGL is preferred by the majority of game developers.

Click to expand...

Hmm...OK, this doesn't seem to be demonstrated at the moment.

Click to expand...

Why not take a poll on B3D and ask which API developers here think is easier to use, cleaner, and which one they would use in ideal circumstances if Microsoft's market power didn't dictate DX's use.

I'm not sure of how much representation there is here in developers, though I am sure that a general poll wouldn't do much to represent that,

. I would be curious as to the answer, though. That would be very important for glslang, and a result in favor of OpenGL would be encouraging.

OpenGL guy · Oct 21, 2003

DemoCoder said:
OpenGL guy said:

But application detection raises the problem of confusing ISVs. "Application X does Y fast, yet application Z does Y slowly.

Click to expand...

Yes, it raises the issue, but that doesn't mean they won't do it. Didn't ATI do some shader replacements in the driver with functionally identical ones?

Of course I can't comment on this.

But there's a big difference between picking up some open source and hacking it, and tying that source into the low level registers, interrupts, internal MS kernel structures, dealing with AGP bus, etc. It's far more specific. If writing an ICD was as easy as just picking up the publically available source, why did it take so long for all of the IHV's to get it right?

Optimizations. HW bugs. WGL (a nightmare, IMO). Extensions. All of this takes time. If your original source base is doing something very inefficiently, it can take a long time to make serious improvements. Also, look at what version GCC is on... Why has it taken them so long to "get it right"?

Perhaps you didn't have the experience of having to write a compiler, but those getting an BS or MS in CS in good US colleges will run into the concepts: automata theory, principles of programming languages, discrete mathematics, graph theory, etc, and many colleges have compiler courses that teach how to write Scheme or Pascal compilers.

I would say that the population of computer science majors who have had to write a compiler is far larger than the number of people who have authored ICDs.

None of the above really applies to writing an ICD, so maybe it requires less knowledge?

Then go download free compilers: GCC for example, which has tons of well documented source code, and numerous tutorials and examples for how to extend it, including public documentation with the source code to explain the optimizers. That alone is far more useful than what I can find as far as OpenGL skeletons.

I have no interest in learning about compilers (remember I am lazy).

You may disagree, but look at the changes that happened between the first ATI 9700 driver release and today's Catalyst. Obviously, the first couple of releases are "make sure it works, meets the spec, and passes all unit tests type of release", and ATI's first attempt at OGL2.0 compilation won't be near optimal just like Nvidia's first Cg compiler was atrocious and still needs lots of improvement.

Of course the driver improves over time, that's called progress. If the driver didn't improve over the course of a year, I think there'd be some people looking for employment...

demalion · Oct 21, 2003

DemoCoder said:
demalion said:

As it is, that idea looks to me like it is useful for something like hinting for predication, which can be expressed in the LLSL specification already.

Click to expand...

No, ints and booleans are useful for other reasons. For example, looping and indexing. Some architectures have special loop registers, others do not. Loop counters only need to be integers in the vast majority of cases, which means less transistors need to be used, and less latency on loop math.

If all the architecture has a loop register, what is wrong with expressing it using "rep" instructiong and nesting, if the hardware has a dedicated register? LLSL would allow this.

How many loop registers? Archictures will differ. OpenGL2.0 makes no assumptions. DirectX assumes one or zero registers. DX assumes loops can't be nested, etc.

This seems to be an issue of whether DX specification is below the limits of hardware, but a valid one.

PS/VS 3.0 lists loop nesting up to 4.

As a result, the DX compiler is FORCED to inline loops that perhaps shouldn't. It might be forced to "spill" the loop register so it can be reused. OGL2.0 makes no assumptions on loop nesting, register availability, or register count. It also doesn't assume what "type" a loop register is.

Yes, this does seem a problem for getting optimal performance out of hardware designed in excess of the limitations specified for such shaders, and with PS 3.0 shader length this does seem a possible long term problem if the spec isn't changed by the time hardware is released. Given that length and featureset exposure, a very likely one it seems.

As a result, it is future proof. DX isn't, and even after VS/PS3.0, it will still need evolution. OGL2.0's direct compilation (no LLSL intermediary) means the shaders will automatically work on future hardware to the best degree possible.

Errr...no, it means shaders can work on future hardware to the best degree possible, where DX HLSL cannot without change. What they actually achieve when such shaders are being implemented will depend API and hardware releases. If we assume both that DX doesn't solve this evident problem and that glslang takes advantage of it, yes it does mean it "will".

With DX, future generation hardware drivers may be hobbled by being fed shaders compiled for devices which had no looping, or one level of nesting, or 1 register, etc. 2 years from now, someone playing HL2 will have their drivers being fed PS2.0 or PS3.0 shaders, even if their underlying hardware is a PS5.0 equivalent, with uber flexibility.

Yes, they may, though I dare say they might have patched HL2 in 2 years and not been upset to have done so, and that does seem to point out why arbitrary time compression and expansion muddies discussion of this issue.

OpenGL2.0 frees IHVs to expose features in their hardware without requiring assembly language extensions or API updates. MS will continue to require updates to their LLSL past 3.0.

If MS doesn't succeed from their approach, they don't succeed. If they don't adapt to glslang displaying strength, they don't adapt. OpenGL undisputedly pre-eminent in 3D APIs? Doesn't sound bad to me at all. That doesn't alter the aspects of the issue I mention, though.

nelg · Oct 21, 2003

Demalion, your post caused the death of my scroll wheel.

Ilfirin · Oct 21, 2003

Humus said:
That's a problem with the application, not the API. Switching shaders is an expensive state change, so using a different shader for every object is quite stupid.
The API shouldn't adjust to the needs of badly written applications.

Well, it's done for quality, not performance. And having a different shader for [close to] every different object (that is, unique set of textures) is something that's going to happen down the line. So even though I think that that engine design is really the wrong way to do things, any API that can't handle it is a broken API IMO.

Humus · Oct 21, 2003

Are you going to complain too if a C++ compiler takes half an hour to compile a million lines of autogenerated code? Or that it would generate an executable on tens or hundreds of megabytes? Should you conclude that the C++ specification flawed?

No. Instead you should focus on why the heck you are auto-generating such an assload of code, and fix that problem. An engine that uses a separate shader for every single object is flawed, and I don't see that being the future unless programmers are turning increasingly stupid. The costs of switching shaders would force you to either choose to have single digit framerate or just don't have that many objects. I see neither as the future. And as shader complexity increases the cost will only become higher. And compilation times for that matter, so such a design should be rejected quickly.

Having many shaders doesn't equate better quality, if there's any such line of thought in your post. Just because the artist have the full ability to change the appearance of objects as he like doesn't mean it needs to generate different shaders for each. Brown stone and grey stone only differs by color. So you pass the color as a constant instead of generating a shader for each, or in worst case even generate completely equivalent shaders for different objects with the same material.

The problem is easily solved. You want the artist to have the full abilities of shaders, but don't want so many shaders generated. So what you do it in the GUI you expose some kind of shader object, rather than directly generating shaders from input components. Instead you apply a shader object to your objects in the editor, and the shader object can be altered using the standard GUI stuff you used directly on your objects previously.

Humus · Oct 21, 2003

nelg said:
Demalion, your post caused the death of my scroll wheel.

When Demalion is around I tend to switch to the Page Down button. Scrolling isn't fast enough to get past it all.

Humus · Oct 21, 2003

Eolirin said:
I think what OpenGL Guy was trying to point out about bugs in the compiler code was that with DX HLSL at least all the bugs are the same. Since there's only one implementation of the compiler any bugs in it are obviously going to be the same across every platform. When you go into multiple compiler implementations you end up with the potential for an exponentially larger number of problems. To the point that workarounds for those problems may be extremely difficult if not impossible because they break some of the other IHV's drivers. At the very least there's an exponential increase in complexity unless every IHV manages to provide flawless compilers. Something that's definately not going to happen. You now have to remember the flaws in every single compiler instead of just one.

There's nothing new here. This problem already exists in every other aspect of the rendering process. You can have VBO broken on nVidia, occlusion query broken on ATI, separate blend broken on XGI, stencil test broken on Trident, volumetric texturing broken on 3dlabs etc. .... nothing new under the sun.

I have yet to hear a good argument why high level compilers would be so inherently more prone to being buggy than other parts of the rendering pipeline. And it's not like such bugs can't be fixed or anything. In shaders you at least got #defines with which you probably can work around the problem in most cases for the particular driver that would be problematic.

Eolirin said:
There'd be a bar to entry that would be very high especially for the smaller IHVs.

I don't think the compiler team resources would be that large or even significant compared to other tasks faced, such as actually designing the hardware or writing the ICD.

HLSL 'Compiler Hints' - Fragmenting DX9?

Ilfirin

OpenGL guy

OpenGL guy

DemoCoder

KimB

Humus

Crazy coder

Humus

Crazy coder

Dio

Bjorn

GameCat

DemoCoder

Eolirin

demalion

OpenGL guy

demalion

nelg

Ilfirin

Humus

Crazy coder

Humus

Crazy coder

Humus

Crazy coder

Similar threads