DemoCoder said:
demalion said:
If they've done "90%" of the work for their driver side optimizer, why do you have to "download patches for all your programs" to get the performance improvements from the driver. I have the same question with other numbers that are less than "90%", but not "0%".
Quite obviously, because there is a limit to how much can be done to optimize DX9 "LLSL" in the driver and therefore improvements have to be made in the compiler itself.
Yes, it is obvious there is a limit, no it is not obvious that "improvements have to be made in the (HLSL) compiler itself". You keep on turning "might" into "definitely", and allowing your recognition of the LLSL compiler disappear.
If this statement were not true, then why would the FX profile even need to be created.
To reduce the amount of different types of optimization strategies the HLSL compiler isn't capable of expressing and because the FX has limitations that require special attention.
The fact is, NVidia's driver could only do so much on the PS2.0 FXC output, and therefore needed a special hack to the compiler.
Why is the ability to address an architectures limitations a "hack" in DX HLSL, and a feature in glslang?
The fact that new profiles have to be consistently created...
No, the fact is that a new profile had to be created for the
NV3x.
...is an admission that LLSL does not have representational power to allow the driver to do the best optimizations.
For the NV3x and it slowing down with temporary register usage beyond 2 or 4, yes. This seems to be a problem in a design intended for running long shaders, regardless of API...don't you think? Perhaps that might be relevant to what that issue is an "admission" of?
Now, there are two questions. #1 does NVidia have access to the FXC source and FX profile heuristics, or does Microsoft have to maintain this code? #2 How often can NVidia ship updates to the compiler and #3 If the compiler is updated, how will the improvements be realized in games?
What do these questions relate to?
...
Hey, wouldn't it be neat if you had different "thingies" for changing compiler behavior, based on introducing such architecture specific heuristics as necessary?
Yeah, and wouldn't it be nice if these compiler options were a constant and didn't have to change based on runtime state?
This doesn't seem to change that they're there, which was my actual point in making that comment in reply to your statements. But hey, why bother recognizing that I made my point, when there is an opportunity to restate your preference for a constant and driver specific HLSL->GPU opcode compiler?
Wouldn't it be nice if LLSL worked as well for optimization in the driver as you continue to assert, even though you have no apparent experience writing either compilers or drivers, and can offer no explaination as to why LLSL + profiles remains the best solution for extracting maximum driver performance.
Let's try this again: "
For future reference, please don't have a discussion with someone who thinks HLSL is perfect and call them by my handle." When will you be willing to have your discussion of what I "assert" reflect what I've actually stated instead of what you arbitrarily want to argue against?
OK, so how do these things answer any of the points I brought up?
Because some heuristics can be best decided at runtime or after profiling. Based on driver state, some drivers might even perform a kind of "just in time" compilation, recompiling shaders based on other factors, such as bandwidth usage. OpenGL drivers do this today by deciding whether to allocate memory in video ram or AGP memory based on utilization statistics. A "heuristic" hardcoded into the driver with no regard to actual runtime factors would not be as optimal.
The points I refer to:
demalion said:
It is actually better than the "problem" we have right now for implementing game performance improvements, because targetting the "LLSL" and the "90% of the work of glslang" you propose for IHVs is not statically linked. Your self contradiction is mind boggling, as is this bogeyman of "DX HLSL will make you have to download patches every month!".
Do you see "the HLSL methodology will always extract the best possible optimizations" in there?
But continuing this misplaced, but relevant, discussion:
Based on driver state, some drivers might even perform a kind of "just in time" compilation, recompiling shaders based on other factors, such as bandwidth usage. OpenGL drivers do this today by deciding whether to allocate memory in video ram or AGP memory based on utilization statistics. A "heuristic" hardcoded into the driver with no regard to actual runtime factors would not be as optimal.
An example would be a library call to normalize() which may evaluate to a cube map, slow but accurate macro, or fast but with errors macro, based on what other pipeline state exists.
This does not sound quite "trivial" to implement for a GPU. If it is, here is an opportunity to actually show me something new and meaningful about why glslang will be able to do offer significant advantage with the ease you maintain.
Note: PS 2.0 has a normalize macro as well.
DC, stuff like this just doesn't make sense. You proposed that IHVs did a "bad job". I provided a reason why it seems a valid characterization for one particular IHV's compiler: because their hardware needed it badly.
Well, it sounded to me like you were saying that the blame for NVidia's poorly optimizating drivers is because their hardware sucks, instead of that their driver developers have a difficult task.
I'm saying they have such a difficult task for achieving good performance, because their hardware is "deficient". Blame for the driver developers poorly optimizing drivers->difficult task. Difficult task->hardware is "deficient". Sucks doesn't generally seem appropriate to me, it sounds like an absolute not a comparison.
BTW, when I said there were obvious problems with your P4/SSE analogy, and that comparing "SSE to 3dnow!" would be required to get past at least some of them, I meant you to stop and give some thought their being a problem with just continuing to use it.
...Future GPU's might have bizarre architectures that don't fit what FXC/PS2.0 was designed around today.
Yes, they might. Is this supposed to be something I disagreed with?
OpenGL2.0 provides more semanic awareness to the driver, and more freedom for optimizations, something which might allow "defficient" HW which runs old software poorly, to later, as compiler technology advances, reach it's full potential.
You see that "might" in there? When you use that word, you're saying things I said in my initial post. When you say "must" and "WILL" and talking about application patches being required every month, or ignoring that there are issues with the way glslang went, you're saying something I disagree with. Please pay attention to me pointing out the distinction this time.
...bad quote, representing the opposite of my viewpoint...
Or it could be that you couch every statement in terms of doubt and "maybes" and half-assertions, and you never state anything of importance.
I do think the issues with glslang implementation are important, as well as accurately considering what it faces in competing with DX HLSL . You're not agreeing with and not being willing to discuss something does not automatically equal "not important".
What can anyone learn from any of your discussions of LLSL and GPU compilers?
If they read as carelessly as you seem to be, obviously nothing. However, I don't claim to be here as a teacher, but as an explorer. I'd say student, but your view of a student seems to be as someone on the receiving end of any abuse you are in the mood for, and who doesn't dare propose a dissenting opinion, which is rather markedly incompatible with my personal view and approach to both roles.
Have you added anything to the discussion besides "well, maybe glslang won't do as advertised, I think, perhaps, but have no information to offer"
Really.
Why are so many of your questions rhetorical and insulting? What does this "add to the discussion", then?
I started off with a description of my view of the glslang/HLSL comparison, and details about my observations that went into that. If someone hadn't thought about things I proposed, they might learn something, or spark new thoughts outside of what they'd thought of before, if they choose. Or, they could replace consideration of what I'd said with poorly related analogies that simply pre-package their opinion, because they've pre-determined there is nothing to learn from someone.
Those are the two examples that seem to work towards and away from people learning things, IMO. But anyways, where were you going with this?
If I ask you to read my initial discussion of the comparison to the approaches again, or otherwise point out that this was already a topic of my conversation before your analogies steered away from it, will this illustrate the problem with them more clearly?
Why don't you succinctly state it here.
Err...why don't you just go actually
read it?
Too much of a stretch? The "advantages", "reality intrusion", "my opinion" summary should make it pretty easy to figure out the advantages, disadvantages, and summary of their situations I propose. I don't see what truncating it and repeating it, and lengthening this already long post, will accomplish.
I propose that more profiles be created as necessary to reflect a LLSL expression suitable for "intermediate->GPU opcode" optimisation by IHVs. How many will that be? Well, that depends on the suitability of the LLSL and profile characteristics itemized at that time.
Well, why don't you tell us how you would evaluate the suitability of LLSL?
I'd look at precision issues (all upcoming hardware I know of specifications for process at fp24 for pixel shading) and register issues (unknown) of the hardware. In general, look at factors that cause performance penalties for a base PS 2.0 shader. If there are issues with these, which I do think IHVs will try to avoid, the issue of suitability (in comparison to the glslang methodology) would be how difficult it would be to address this compared to implementing and maintaining a glslang compiler.
I'd then consider possibilities for performance improvement beyond 1 op/clock, and the requirements for achieving that. The issue of suitability would again be the comparative difficulty in addressing and maintaining this compared to the glslang compiler.
Many people, not just me, have already told you the shortcomings.
Well, there are some theoretical shortcomings, and there are your shortcomings like "you'd have to compile an application every month". I've tried to freely discuss the first ones, and here you are focusing on my reaction to the second as if it is the same thing. This makes it seem to me like you're more interested in attacking me than discussing something while there remains the possibility that I might disagree.
And further, I have just told you that runtime or profile based optimization (wherein the developer runs the game, collects profile info from the driver, and then reruns the compilation feeding back the profile information) could do even better.
Yes, you mean that automatic version of what Carmack and Valve have talked about, right? That would be nice to see, but it doesn't exactly sound like something that is going to appear any time soon. If you have information indicating that it will, please share it, as I said above...your proposing just any possibility and that it will manifest "somehow" doesn't do much to answer my commentary.
And just as ATI and NVidia do driver-based game detection, no doubt, they would also be able to do compiler-heuristics tweaking on a per-game basis as well. Point being, your proposal that developers and endusers remain hostage to a centrally managed uber-compiler from Microsoft has serious problems, and you don't seem to even partway get it.
Who is this guy saying HLSL is without issues? I thought I was the guy who was pointing out that you are exaggerating the DX HLSL issues and ignoring any significance of those for glslang?
Your alternative proposes that a new and unique type of compilation paradigm will be 1) introduced every month
No, my position is that N developers will ship M driver updates per year, yielding N*M updates to the game community, today.
Well, IHVs ship driver updates, not game developers. So how do these driver updates get multiplied? Are you proposing each person has "N" unique video card types installed?
I expect M compiler updates per N developers as well, due to the fact that IHVs will be discovering tweaks all the time which yield improved shader performance.
I'll let you convince OpenGL guy he'll be doing this, it would take me too long to try and resolve the conflicts apparent in it at the moment.
It's not a "new and unique" compilation paradigm. It's HL2 gets released, and the following month, ATI and NVidia discover simple heuristic tweaks which yield say, a 5-10% performance boost. Everytime a new games comes out, they might need to issue an update. Or, 3dMark2004 comes out, etc.
OK, who hid the driver side rescheduler on us? They "might" just release a new driver to offer performance improvements through that, right?
2) unable to be resolved by a back end optimizer of the "shorter instruction count" principle intermediary or whatever other characteristics in prior profiles there are.
2) depends on how many different approaches are needed for what IHVs do. We have 2 at the moment, one being "shorter operation itemization count, and executing within the temp register requirement without significant slow down". I think this is a pretty generally useful intermediate guideline that more than one IHV should be able to utilize.
You've been given umteen examples why LLSL as it exists today is not a good representation format for these optimizations.
No, I've been given a few examples of why it might not be optimal by itself, and watched you arbitrarily turn "might" into "will" as it suited your agreeing with yourself, and complaining that I didn't "add anything" for inconveniently continuing to use "might".
NVidia couldn't resolve the issue in their driver alone because of this.
Well, looking at the hardware performance issues, the factors that prevented the LLSL from serving for this seem to be things that don't make sense for hardware trying to execute complex shaders in real time...doesn't requiring idle or redundant usage of transistor expensive computational units for certain clock cycle to overcome a design limitation for something needed more as shader complexity increases somewhere beyond the "PS 1.3" level (i.e. temporary registers) qualify as something pretty simple: a mistake?. This seems like something IHVs would be trying to avoid, whether talking about performance in glslang or DX HLSL.
This seems to indicate an extraordinary problem. Why "will" all problems for LLSL be similarly extraordinary?
And overriding principles (shortest register count, shortest instruction count) are too simplistic to capture everything that needs to be done, because they are competing goals.
No, they're not competing goals if you don't have a significant performance penalty simply from increasing temporary register utilization beyond a very low number. Perhaps you mean from a hardware design standpoint? I agree, but then I'm not saying glslang doesn't have a theoretical advantage, I'm saying it is obviously in an IHVs interest to avoid certain mistakes as high priorities with a given goal, if possible.
The most optimal code is not neccessarily at the extremes (shortest actual shader, or shader with fewest registers used), but is someone in between, and possibility finding the global minimal is extremely hard.
Outside of the NV3x, what type of performance yield are you proposing from this compared to what can be done with the LLSL?
Yes, and that is a superficial analogy, because whether people are arguing about one approach being too complex or prone to bugs compared to another is not the issue, it is whether it actually will be, and why. That's why I deride your insistence on it, and try to encourage discussions that don't depend on its substitution. How is my initial observation about the analogy incorrect?
Because your comments add nothing and history is relevant.
Well, my comments mentioned
why your usage of history was not relevant. Simply saying that "adds nothing" so you can say "history is relevant" doesn't do much to show otherwise.
The people claiming that implementation difficulty is a problem lost the argument. Likewise, those claiming compiler implementation problems will be proven wrong as well.
Your airtight argument astounds me.
Note about omitted text for brevity: your not taking time to try and understand something some said, and demonstrating that profoundly, is not the same as something not existing.
...You can appeal to authority and say "MS compiler developers are just that good", but it's a very weak argument.
I suppose I might as well, as that's the only argument you seem to want to respond to.
No one knows who NVidia and ATI and 3dLabs have hired to work on OGL2.0, and frankly, expertise in Visual C++ compilers doesn't neccessarily translate to GPU compilers, using classic Demalion analogy doubting.
Did you really think I meant that I object to all analogies, or are you just trying to propose a statement to support your stance and blithely blame any problems someone might have with it on me?
How about writing a compiler that has to interact directly with all the elements of a large device driver like the OpenGL ICD?
Well, if it's a static compiler, it doesn't have to "interact" with squat, but is just a DLL that the driver delegates to to compile the shader. The only "interaction" part is in how you bind the registers to OGL state, and how you upload the code to the GPU. But these are the same issues that ARB_{vertex|fragment}_program have to deal with.
With added complexity in the IHVs added processing compared to the extension if they're trying to find more optimizations than they would in DX. Replacing a complexity with a greater complexity in terms of avoiding bugs and incompatibilities, not replacing complexity with simplicity or the same complexity.
If it's dynamic runtime compilation, it's more difficult, but if you think this is what driver developers have to worry about, then you are a tantamount to admitting FXC's static approach which ignores runtime state is insufficient.
How does this change FXC if we're talking about glslang doing this?
What are the challenges in this "black art"? I do think maintaining API compliance with complex interaction is part of it.
Writing a compiler from scratch is easier than writing an ICD. Adding a compiler into an ICD isn't that complex, since, as I stated, the compiler doesn't need to "interact" with anything.
So there is no concern with the any other state updates influencing shader output at all on the GPU's end? Perhaps in relation to things like fog, AA, and texture handling? Is this what you mean, or do you just not like the word "interact"?
You call Compile(source), the compiler in the driver gets invoked, compiles to native code, and then the driver must upload it to VRAM and bind OGL variables to it. That is the only "interaction", an interaction, which is in fact, the same as what needs to be done today.
I'm not proposing that the act of compiling arbitrarily affects other parts of the driver, I'm proposing that IHV adaptation of compilation from a higher level introduces more opportunities for unexpected or unique characteristics in the output (bugs and conflicts between the outputs from implementations), as well as offering the possibility of more optimal code.
The compiler only needs to "interact" with OGL (that is, during compilation) if it is doing some kind of runtime optimization that you claim isn't neccessary, as we all know, because of your defense of LLSL as sufficient.
Eh? So you're saying, in your roundabout way of making up what to attack again, that glslang will indeed offer significant problems with bugs and conflicts in what you envision?
OpenGL is preferred by the majority of game developers.
Hmm...OK, this doesn't seem to be demonstrated at the moment.
Why not take a poll on B3D and ask which API developers here think is easier to use, cleaner, and which one they would use in ideal circumstances if Microsoft's market power didn't dictate DX's use.
I'm not sure of how much representation there is here in developers, though I am sure that a general poll wouldn't do much to represent that,
. I would be curious as to the answer, though. That would be very important for glslang, and a result in favor of OpenGL would be encouraging.