HLSL 'Compiler Hints' - Fragmenting DX9?

Humus · Oct 16, 2003

DaveBaumann said:
Someone mentioned that this was the "OpenGL2.0 way of doing it", which is in fact not the case AFAIK - the OGL2.0 way of doing it puts the compiler in the hands of the IHV's and hence this means that from a single source of "The OpenGL Shader Language" (ugh) code assembly that is optimised specifically for the underlying hardware can be generated. IMO this is a good approach, but it does require each IHV to have a tight compiler that copiles to the right thing.

Unless I'm mistaken I think that's what's been said all the time, that OpenGL directly targets the underlying hardware.

Joe DeFuria · Oct 16, 2003

Humus said:
Nope. The API is still the same (and the OS too).

No kidding.

The point is, I'm not advocating MS control anything past the compiler level. So carrying out my "logic" beyond that extent is as irrelevant as me saying that you guys would advocate all IHVs to have their own APIs.

demalion · Oct 16, 2003

Driver based HLSL->GPU opcode compiler. Everything is done in one place, and controlled by the IHVs.

The advantages of this are that absolutely all possible HLSL->GPU opcode optimization opportunities that can be implemented are accessible, with the right compiler, and the IHV has complete control over releasing HLSL compilation improvements, if necessary, as part of driver updates.

Reality intrusion: Finding all optimization possibilities doesn't come for free, and neither does creating bug and conflict free compilers across multiple architectures while attempting to find them, let alone updating them with significantly more frequency than they might be done otherwise. Also, by virtue of being a standard, there is still "one body" imposing limits on the behavior, though the standard is the only place where "strengths" and "weaknesses" of that body are guaranteed to be brought to bear. However, if IHVs can agree to share and develop common compiler technology that is applicable to them all, and succeed, this seems a possible way to reduce the significance of this hurdle quite a bit. Hopefully, politics and economics don't interfere with this exercise.

My opinion: A rather clearly visible goal of optimal compilation is visible on the far side of this rather large obstacle, though their being unique benefits to this approach is not established yet. If all obstacles are overcome in a timely fashion, however, that goal should achieve the best possible result if reached...though this observation neither guarantees actually achieving better compilation, nor negates the issues that need to be overcome.

API based HLSL->intermediate compiler, driver based intermediate->GPU opcode compiler. One body controls the first part as a standard, and the IHVs still have control over compilation for each GPU.

The advantage of this is that standardizing an intermediate specifies more reproducible compiler behavior, "nailing things down" more firmly. Also, the strengths of the "one body" can be brought directly to bear on one part of the task, depending on how much communication there is between IHVs and the "one body".

Reality intrusion: politics and economics can cause interference with communication between companies, and optimal performance is limited by the suitability of the intermediate representation for implementing optimized GPU opcode. Also, "weaknesses" in the "one body" controlling the first part might possibly be universally applied.

My opinion: I think we're already seeing good performance here, and a demonstration of how the intermediate representation can be successful. However, we're also seeing indications of bugs that hinder that, and might be related to economics and politics.

...

I'm still waiting to see how things pan out in comparing the success of the two approaches...they both have advantages and disadvantages.

I'm hoping GLSLang is able to take off and overcome the associated hurdles to at least catch up with DX 9 HLSL, because its hurdles are, AFAIK, significant and uncharted. With current hardware, I have my doubts on that happening...perhaps the R420 and NV40, along with the hopefully extensive work IHVs have been doing in this regard already, will allow it to show advantage compared to the current DX HLSL implementation when those products are released. The intermediate->GPU opcode compilation seems to be making significant progress, which leaves IHVs having to match more of MS's general compiler experience for the GLSLang approach to compete favorably. How that is going to be done remains to be seen.

More directly to the thread topic, I think HL 2 overall illustrates that the issues with the NV3x are independent of HLSL and LLSL, as specified by the commentary about changes to shaders required, and that the basic HLSL model (implemented properly to spec) is flexible enough to accomadate differing hardware. I think the quoted commentary relates to that.

As far as that results in "annoyance" for HLSL profiles, someone has mentioned there is a query to return the optimal profile for current hardware. This seems to indicate that no annoyance is being added besides executing the query and using the result, at least as far as the annoyances that are actually due to the DX 9 HLSL mechanics for supporting different hardware.

This does seem to leave issues (for both approaches) that have yet to be addressed, and how things will compare, and when, an open question.

Bjorn · Oct 16, 2003

My opinion: I think we're already seeing good performance here, and a demonstration of how the intermediate representation can be successful. However, we're also seeing indications of bugs that hinder that, and might be related to economics and politics.

I think that it's a bit premature to say that we're already seeing good performance here since we don't really have anything to compare with at this moment.

demalion · Oct 16, 2003

Bjorn said:
My opinion: I think we're already seeing good performance here, and a demonstration of how the intermediate representation can be successful. However, we're also seeing indications of bugs that hinder that, and might be related to economics and politics.

Click to expand...

I think that it's a bit premature to say that we're already seeing good performance here since we don't really have anything to compare with at this moment.

Well, my metric is the types of effects that we are seeing done in real time, and I do indeed think the performance for them at the moment is quite good.
That we don't know how things compare between these approaches yet is something I view as separate from this evaluation, at least until the basis that comparison is available...I'm hoping it is shown to be as "poor" as possible at that time, because that would mean things would be that much better than they already are.

Bjorn · Oct 16, 2003

demalion said:
Well, my metric is the types of effects that we are seeing done in real time, and I do indeed think the performance for them at the moment is quite good.

I saw that as rather irrelevent in this discussion. And it's not just about getting one card/generation from one IHV to perform reasonable (relative to what it could achieve with the other method, not compared to other IHV's).

That we don't know how things compare between these approaches yet is something I view as separate from this evaluation, at least until the basis that comparison is available...I'm hoping it is shown to be as "poor" as possible at that time, because that would mean things would be that much better than they already are.

I agree , although not about the separate evaluation part

GameCat · Oct 16, 2003

If the performance was so good to begin with why is there a ps_2_a target at all?

Having some sort of intermediate language between the driver and the programmer visible high level language might be a good idea, but it obviously need to be higher level than both ARB_fragment_program and PS 2.0.

Since the IHVs compilers probably will spend most of their time optimizing, not parsing, the benefit of a bytecode language is unclear. It might be the case that glslang is not low level enough to be compiled but I doubt HLSL is, it still let's you specify in what registeres you pass interpolants for christ's sake. PS 2.0 also exposes registers while ARB_fragment_program does not.

MikeC · Oct 16, 2003

GameCat said:
For those of you that don't know, this is how the OpenGL shading language works: The high level shading language is the same for all IHVs but the compiler is part of the driver so the compiled code should be optimal for the architechture you're running on.

Give this man an A+. I posted a story at nV News back in August that contained the following shader schematic for D3D and OGL.

http://www.3dlabs.com/support/developer/ogl2/presentations/index.htm

WaltC · Oct 17, 2003

GameCat said:
If the performance was so good to begin with why is there a ps_2_a target at all?

...

Yes, and what is the ps_2a target, specifically, and how does it differ from ps2.0?

Demirug · Oct 17, 2003

PS_2_A permits the compiler to use the additional NV3X pixelshader features.

Joe DeFuria · Oct 17, 2003

Demirug said:
PS_2_A permits the compiler to use the additional NV3X pixelshader features.

What features are those?

I was under the impression that PS_2_A simply compiles the same features, but in a more FX friendly way than the base compiler.

Hyp-X · Oct 17, 2003

Joe DeFuria said:
Demirug said:

PS_2_A permits the compiler to use the additional NV3X pixelshader features.

Click to expand...

What features are those?

ps_2_a: Same as the ps_2_0 profile, with the following additional capabilities available for the compiler to target:
Number of Temporary Registers (r#) is greater than or equal to 28
Arbitrary source swizzle
Gradient instructions: dsx, dsy
Predication
No dependent texture read limit
No limit for the number of texture instructions

silhouette · Oct 17, 2003

Hyp-X said:
ps_2_a: Same as the ps_2_0 profile, with the following additional capabilities available for the compiler to target:
Number of Temporary Registers (r#) is greater than or equal to 28
Arbitrary source swizzle
Gradient instructions: dsx, dsy
Predication
No dependent texture read limit
No limit for the number of texture instructions

Click to expand...

Can someone verify with a complex shader if the HLSL compiler generates a code that actually uses more temporary registers for a ps_2_a profile (since it has more registers), or generates a code that uses as few registers as possible (because of the limitation of FX architecture).

bloodbob · Oct 18, 2003

Hmmmmm question are the two compiler paths in the same binary?
Question 2 are the two compiler source avaible?

( just thinking of possible cheats for cheaters )

WaltC · Oct 18, 2003

Hyp-X said:
ps_2_a: Same as the ps_2_0 profile, with the following additional capabilities available for the compiler to target:
Number of Temporary Registers (r#) is greater than or equal to 28
Arbitrary source swizzle
Gradient instructions: dsx, dsy
Predication
No dependent texture read limit
No limit for the number of texture instructions

Click to expand...

If that's the case then I don't see that it's much of a big deal. nVidia's complaints all year long have been not that ps2.0 doesn't support the ps2.0+ hardware nVidia is selling, but that they think ps1.x is "better." Many of the comments I've read by nVidia specifically talk about moving ps2.x instructions down to ps1.x for "better performance."

I had thought any significant DX9 compiler changes in regard to "favoring nVidia hardware" would address that aspect. nVidia has certainly directly addressed it enough this year.

Xmas · Oct 18, 2003

silhouette said:
Can someone verify with a complex shader if the HLSL compiler generates a code that actually uses more temporary registers for a ps_2_a profile (since it has more registers), or generates a code that uses as few registers as possible (because of the limitation of FX architecture).

It will try to use as few registers as possible and a more FX-friendly instruction ordering, but it will also compile code that requires many temp values.

Colourless · Oct 18, 2003

My concern with the OpenGL model would be that as a developer, it would be hard to have things like program length limits defined and I would need to have contingencies in place for every shader should it fail to compile. At least using the Direct3D method you will at least know that after testing, the shader will still work with all compliant hardware, even if it's slow.

So pretty much it's like this:
Direct3D = Will work, might be slow
OpenGL = Might work, should be faster

I seriously tend to lean towards the former (D3D) rather than latter.

Xmas · Oct 18, 2003

Colourless said:
I seriously tend to lean towards the former (D3D) rather than latter.

That's why I (and others in this thread) proposed a combination of both worlds.

Humus · Oct 18, 2003

Colourless said:
My concern with the OpenGL model would be that as a developer, it would be hard to have things like program length limits defined and I would need to have contingencies in place for every shader should it fail to compile. At least using the Direct3D method you will at least know that after testing, the shader will still work with all compliant hardware, even if it's slow.

So pretty much it's like this:
Direct3D = Will work, might be slow
OpenGL = Might work, should be faster

I seriously tend to lean towards the former (D3D) rather than latter.

OpenGL does away with many limits, so you don't need to worry about them at all. You don't need to worry about instruction count, temporaries used and such. OpenGL guarantees that it will work. Will it need to multipass, it'll do it for you. As a last resort if that's not possible, you'll get software.

DemoCoder · Oct 19, 2003

The way I look at is this issue is similar to the issue of ICD vs MCD OpenGL drivers.

Many years ago, people were arguing that it was too hard for IHVs to write high quality ICDs and thus MCD would yield more consistent performance over most IHVs and lower amount of bugs, due to the smaller amount of code to be developed and tested.

Now we are arguing that compilers and optimizers are too hard to write and that MS should provide a one-size-fits-all crosscompiler. But the fact of the matter is, IHVs already have to include compilers and optimizers in their DX9 drivers that deal with "DX9 intermediate representation" (if you subscribe to demalions view), or my view, which is DX9 assembly which must be translated and optimized at run time.

Either way, they already have to do 90% of the work of a full compiler: register allocation, instruction scheduling, instruction selection, etc. TODAY. The front-end of the compiler: parsing, intermediate optimizations, etc can and WILL likely be provided as reusable open source in DDKs. 3DLabs is already shipping free code people can use to write the frontend part of the compiler. Nvidia ships a CG frontend which can deal with DX9 HLSL syntax too.

Personally, if OpenGL2.0 ICDs are harder to write, but yield more optimization opportunites, well, tough luck to IHVs without good driver dev teams.

I'd rather have an API architecture that is flexible enough to deliver performance if the IHV knows how to write good drivers instead of a one-size-fits-all approach that ties the hands of good IHVs so crappy IHVs can ship lower performing and less complex drivers and have their performance equalized with the good IHVs. That's why MCDs were bad, and it's why static compilation/linking is bad.

And the idea that you have to RELINK applications to get performance from driver/compiler updates is ludicrous and an unworkable enduser distribution scheme.

Imagine if everytime a new Detonator of Catalyst driver came out, you'd have to download patches for all of your games to any improvement (because drivers were statically linked), it's crazy. Save that approach for the Consoles.

The fact that a future upgrade to a driver (or compiler) can break some game's pixel shaders is an issue for the IHV's QA department. They have to deal with regression testing ANYWAY since even simply changes to an OGL driver can break older games. This should not hold back runtime compilation in the driver.

Any new shared code can exhibit regressions. This is not an argument against shared code.

HLSL 'Compiler Hints' - Fragmenting DX9?

Humus

Crazy coder

Joe DeFuria

demalion

Bjorn

demalion

Bjorn

GameCat

MikeC

WaltC

Demirug

Joe DeFuria

Hyp-X

Irregular

silhouette

bloodbob

Trollipop

WaltC

Xmas

Porous

Colourless

Monochrome wench

Xmas

Porous

Humus

Crazy coder

DemoCoder

Similar threads