Which API is better?

Humus · Nov 11, 2003

Joe DeFuria said:
This is just non-realistic.

If you're nVidia, and joe schmoe developer tells you that "hey, I found out that your driver is non-conformant. My opriginal code (which works on your drivers) is wrong, and ATI's drivers which do conform, don't accept it. I want to change my code to conform but I find out that it then breaks on your drivers."

Does nVidia tell joe schmoe "OK, we'll change it" or does he say "put a code path in your game...we'll break lots of stuff if we change our behavior now...we'll get around to it at some point."

Unless the non-conformance have existed unnoticed for a very long time and been implemented in many applications such that if it's fixed can't be worked around in an easy way for older flawed apps, then maybe they will go with the second option. Otherwise they will fix it.
Situations like this are very rare. I only know of one, GL_CLAMP behavior, which is basically a quite useless feature anyway. And again, this is in no way different to DX.

Rolf N · Nov 11, 2003

Joe DeFuria said:
zeckensack said:

I'd really like to see the "small guys" collaborating on that, to a certain degree (say, parser, syntax check, dead code elimination; stop there and do the rest alone).

Click to expand...

So some (certinaly not extensive) degree, isn't some of this what MS gets you with the intermediate compiler?

Not quite. They (MS) overdo it in places and they introduce arbitrary restrictions (which can be regarded as beneficial, in line with your demanding of consistency).

Overdone:
Resource allocation (eg intermediate languages - as opposed to assembly languages - should never even think about registers)
All sorts of optimizations (duplicated in driver, possibly counterproductive wrt target hardware)

Restrictions:
Shader profiles underexpose hardware (for a guaranteed instruction count), and you have to chose one up front and stick with it.
I'll file the macro thing under "bugs", otherwise it would have to be mentioned here as well.

PS: even the Geforce FX profile of Cg will do some redundant work. Programmers can code to PS2.0a or NV_fragment_program directly, so this "backend" path must have some sort of optimizer as well, even if not needed for Cg. This is true for all API exposed assembly interfaces.

Simon F · Nov 11, 2003

Struth Fellas! We are probably arguing over trivialities.

You can just as easily consider DX9's intermediate/P-Code representation to just be another language which must be compiled by the IHV's driver to run (optimally) on their hardware. The compilation may be a somewhat simpler than starting from GLSlang but, IMHO, there is still going to be the potential of the same types of bugs.

Dio · Nov 11, 2003

I looked at the forum page and thought "That's funny, I'm sure that topic was only on page 10 or 11 this morning". Then I looked, and now I'm thinking, "By 'eck, yes, this topic HAS grown 3 new pages in about five hours".

"Calm down dear. It's a forum."

Joe DeFuria · Nov 11, 2003

Simon F said:
Struth Fellas! We are probably arguing over trivialities.

You can just as easily consider DX9's intermediate/P-Code representation to just be another language which must be compiled by the IHV's driver to run (optimally) on their hardware. The compilation may be a somewhat simpler than starting from GLSlang but, IMHO, there is still going to be the potential of the same types of bugs.

Really, that's all I'm trying to say.

I believe that with the DX model:

1) Driver compilation will be somewhat simpler
2) The potential for optimal compilation will be somewhat reduced
3) The potential for "those types of bugs" will also be somehwat reduced.

In other words a tradeoff of a bit of performance for

1) A bit easier driver development, (faster time to market, less resources needed)
2) A bit more stability.

Again, I'm in no way saying one method sucks, one method is golden. (I know you're not accusing me of saying this Simon...)

I see the pros and cons of both, and I just think that for the consumer market, the DX model is a better fit.

KimB · Nov 11, 2003

Joe DeFuria said:
I believe that with the DX model:

1) Driver compilation will be somewhat simpler
2) The potential for optimal compilation will be somewhat reduced
3) The potential for "those types of bugs" will also be somehwat reduced.

In other words a tradeoff of a bit of performance for

1) A bit easier driver development, (faster time to market, less resources needed)
2) A bit more stability.

1. Any additional challenges in implementing GLSlang are one-time only. Many have already been overcome by the various IHV's (nVidia already has Cg: changing to GLSlang will be even easier, ATI already has automatic multipass).
2. A DX9 driver can be much more complex due to the lessened amount of information available to make optimizations.
3. The DX9 system will be more prone to bugs: you have driver compiler bugs as well as bugs inherent in DX9.

In the end, what you'll get with GLSlang is:
1. Easier development for programmers.
2. Easier optimization for IHV's.
3. More freedom for future hardware architcture design.

So, I feel that Microsoft's HLSL was a mistake. It was an implementation of a somewhat better language structure (Cg) that allowed for better optimization opportunities, but since IHV's have no control over how the shaders compile to assembly in their drivers (Cg was designed to allow each IHV to produce their own back-end compiler), all you get is an assembly shader prototyping tool for software developers.

Hyp-X · Nov 12, 2003

Chalnoth said:
In the end, what you'll get with GLSlang is:
1. Easier development for programmers.

You write a shader on DX and you know what hw will it run.
You write a shader with GSlang and you have no idea what hw/driver combination will or will not run it.
How it that easier?

2. Easier optimization for IHV's.

Sure they have to do everything instead of half of the work done by HLSL.
(And I don't mean parsing.)

3. More freedom for future hardware architcture design.

How is HLSL leaves less freedom for future hw desing?
Future hw will support PS3.0, PS4.0 etc.
HLSL will support that.

So, I feel that Microsoft's HLSL was a mistake. It was an implementation of a somewhat better language structure (Cg) that allowed for better optimization opportunities, but since IHV's have no control over how the shaders compile to assembly in their drivers (Cg was designed to allow each IHV to produce their own back-end compiler), all you get is an assembly shader prototyping tool for software developers.

So HLSL is a Cg implementation???
Cg optimizes better than HLSL???

As for back-end ATI had two choices: write a back end for their competitors language or let MS write a back end for them.
I don't think that's a choice they had to think too much about.

And finally Cg compiles to assembly...

I thought we compared HLSL to GLSlang btw.
Comparing HLSL to Cg is pointless.

DemoCoder · Nov 12, 2003

Hyp-X said:
Sure they have to do everything instead of half of the work done by HLSL.
(And I don't mean parsing.)

Then explain yourself. Under DX9, do IHV's still have to do parsing? Check (of DX9 token streams). Do they still have to liveness analysis, register allocation, and instruction selection/scheduling? Check. Under VS3.0/PS3.0, they will also have to do control flow graphs, intra-block analysis, and instruction scheduling under loop constraints (NP-complete)

Pretty much the only thing MS's compiler buys you is HLSL parsing, error checking, and some trivial optimizations that have to be redone on many archiectures (e.g. common subexpression elimination for NV30 = BAD)

3. More freedom for future hardware architcture design.

Click to expand...

How is HLSL leaves less freedom for future hw desing?
Future hw will support PS3.0, PS4.0 etc.
HLSL will support that.

[/quote]

Because no HW maps exactly to PS2.0/PS3.0/etc. As has been explained many times, what happens if your hardware has accelerated support for normalize(), smoothstep(), tranpose(), or sincos()? You really don't have much freedom to innovate on these because the compilation model pretty much guarantees your HW will never be touched since PS2.0/3.0 do not represent these operations in the instruction token stream. (smoothstep() and tranpose() that is). There are dozens of library functions in HLSL and MS represents none of those at the low level.

Frank · Nov 12, 2003

When you optimize a set of 'recognized' assembly instructions (that probably represent a specific function) into something else, it will break when the p-code compiler outputs another sequence. Or when you are just guessing wrong. So, when Microsoft builds a new HLSL compiler, it is "back to the drawing board" for any driver team that dared optimizing their code.

Or what if a developer comes up with a "better" (inline) macro for a function? Should the HLSL compiler generate "NOP's" in between specific functions? Or tag them otherwise? Why not include the whole state machine and parse tree when they're at it? Or forget about p-code at all?

So, to optimize stuff with a DX driver, you just replace single instructions with faster approximations. And when you are only able to replace single instructions, you have not much more options than reducing image quality by replacing those that have a faster approximation. And if there is a faster approximation, the compiler or designer would have specified it, when it would work. But they didn't. They want the better (and slower) one for a reason!

When you can compile the (OGL) source code directly in your driver, there is no need for those things. You can not only speed things up, you can even use the functions that your hardware supports to make it even better.

And it should be absolutely obvious by now, that just about any functional GLSlang driver will deliver better performance and image quality than a well-written DX9 driver.

And about the reason why Microsoft did what it did: it succeeded in delivering a standard. Not a very good one, technically, but for most people the most important one. Because it is Microsoft, the company that has the most money and therefore is right.

That's why M$ rushed the HLSL p-code compiler in the first place: to make sure Cg (nVidia) wouldn't win.

So, why are we discussing that something pushed by marketing, with no real useful effect, should be better than a good solution that a lot of people actually spend a lot of time on to make sure it will actually work?

Beats me.

Humus · Nov 12, 2003

Dio said:
I looked at the forum page and thought "That's funny, I'm sure that topic was only on page 10 or 11 this morning". Then I looked, and now I'm thinking, "By 'eck, yes, this topic HAS grown 3 new pages in about five hours".

I read that people produce about 250MB data on average every year. So to keep up we must type 680,000 chars / day (assuming forum posting is everything we do (as it may seem

)).

KimB · Nov 12, 2003

Hyp-X said:
You write a shader on DX and you know what hw will it run.
You write a shader with GSlang and you have no idea what hw/driver combination will or will not run it.
How it that easier?

You're speculating that GLSlang drivers will be buggy.

Sure they have to do everything instead of half of the work done by HLSL.
(And I don't mean parsing.)

The assembly has less information than the HLSL. Compiling from the HLSL, with more information, will make optimization easier.

How is HLSL leaves less freedom for future hw desing?
Future hw will support PS3.0, PS4.0 etc.
HLSL will support that.

PS 3.0, 4.0, etc. will be low-level assembly languages. Just like PS 2.0, they will work best with one single hardware architecture. Anything very different will have problems.

MfA · Nov 12, 2003

C isnt all that high level and something with fully virtualized resources cant be really termed low level, they will probably never allow custom functions ... but the only other fundamental difference will just be that variables have numbers instead of names AFAICS.

Simon F · Nov 12, 2003

Humus said:
I read that people produce about 250MB data on average every year. So to keep up we must type 680,000 chars / day (assuming forum posting is everything we do (as it may seem )).

Presumably that's before compression. After compression it probably reduces to about 100k

DemoCoder · Nov 12, 2003

MfA said:
C isnt all that high level and something with fully virtualized resources cant be really termed low level, they will probably never allow custom functions ... but the only other fundamental difference will just be that variables have numbers instead of names AFAICS.

Well, you can just about view any instruction set as "virtualized" if you want to. That's principally how Intel and AMD view the x86 instruction set these days. There are also Java chips that run Java bytecode in HW.

On the other hand, no one will deny that the x86 "virtual" instruction set causes problems by both restricting CPU architecture and making compilers harder to write. For example, an assembly language programmer can typically beat an x86 compiler by 50%, but on the SPARC or StrongARM, human and machine will come within 10%.

The choice of intermediate representation is important, not just for freely HW manufacturers from a specific implementation, but also making compilers easier to write. GCC is a classic example of this problem. The compiler's machine independent path produces substandard code on the x86.

The problem is although the PS is virtualized, the DX compiler is making choices about instruction selection, order, and register labeling that obcure the original source expressions and make it harder for the driver to "recognize" patterns of instructions.

Let me explain it this way: The original source code has variables with labels A1, A2, ..., AN which form a interference graph in the DX compiler. The new output assembly, has labels R0, R1, ..., RM which form a new interference graph for the driver compiler. The two graphs will not be identical, but will be isomorphic with some additional constraints. By the well known graph isomorphism problem, it is very difficult for the driver compiler to find the isomorphism back to the original source graph. That means some structures amenable to HW optimization may be obscured by the morphed graph.

It doesn't matter that two graphs are identical if you simply relabel the vertices and add some edges. A compiler which can easily detect constructs in one graph to optimize will not be able to easy detect such patterns in the other. In fact, this fact is used in crytographic zero-knowledge proof systems.

Anyway, this is one of the reasons why the architecture specific compilers can trash generic compilers like GCC.

MfA · Nov 12, 2003

DemoCoder said:
That means some structures amenable to HW optimization may be obscured by the morphed graph.

I doubt it is all that likely. with register allocation already being deferred to the driver I doubt they will do much optimization at this stage. It is not like microsoft are out to purposely piss off IHVs, which also of course only have to deal with the single compiler of which they know exactly what it does.

JohnH · Nov 12, 2003

Humus said:
JohnH said:

What F-Buffer support on 9700? Or is it really there and they just don't publish the fact ? I think you might be able to manage without it but...

And naturally I could supply an arbitrarily complex shader and its guarenteed to work without bugs and at acceptable performance on all HW irrespective of its level shader support? If you think thats the case then you might want to check the tint you've had applied to your glasses!

For example just consider dynamic flow control, I could write an instruction sequence which would cause current ATi HW to have to break the shaderdown into an instruction per pass. Could it do it ? Yes. Will it be be usably fast ? NO.

Start mixing subroutine calls and looping and it gets a whole lot more messier.

On top of this the suggestion that GLSlang is supposed to be a simpler option that reduces bugs is not compatible with having to compile an arbitrary shader onto a peice of HW that has few of the features required to support it i.e. ALL current hardware.

John.

Click to expand...

Who said anything about F-buffer on 9700? I didn't even mention the F-buffer at all. Ashli solved the problem with additional render targets.

Yes yes, mixing and match posts from various people about the same thing, doesn't make any difference given the rest of what I posted.

Of course they aren't going to be able to support everything in hardware on the R300 chip, but a good subset. I would guess they won't even try accelerating dynamic flow control. It will be done in software. But supporting long shaders shouldn't be a problem.

So its acceptable that its very easy to produce something that when run in the field runs like a slide show on some unknown percenatge of HW? If it runs at all that is.

God this threads gotten too long...

John.

JohnH · Nov 12, 2003

DiGuru said:
When you optimize a set of 'recognized' assembly instructions (that probably represent a specific function) into something else, it will break when the p-code compiler outputs another sequence. Or when you are just guessing wrong. So, when Microsoft builds a new HLSL compiler, it is "back to the drawing board" for any driver team that dared optimizing their code.

Err, that may be how some (one) people (IHV) choose(s) to write an optimiser, but its not the generally excepted approach i.e. you write an optimiser that is general that, doesn't generally rely on specific instruction sequences, this would not be broken by changes in the HLSL.

John.

JohnH · Nov 12, 2003

One last post, then I'm giving up trying to keep up with the thread as its just to time consuming

Chalnoth said:
Joe DeFuria said:

I believe that with the DX model:

1) Driver compilation will be somewhat simpler
2) The potential for optimal compilation will be somewhat reduced
3) The potential for "those types of bugs" will also be somehwat reduced.

In other words a tradeoff of a bit of performance for

1) A bit easier driver development, (faster time to market, less resources needed)
2) A bit more stability.

Click to expand...

1. Any additional challenges in implementing GLSlang are one-time only. Many have already been overcome by the various IHV's (nVidia already has Cg: changing to GLSlang will be even easier, ATI already has automatic multipass).
2. A DX9 driver can be much more complex due to the lessened amount of information available to make optimizations.
3. The DX9 system will be more prone to bugs: you have driver compiler bugs as well as bugs inherent in DX9.

In the end, what you'll get with GLSlang is:
1. Easier development for programmers.
2. Easier optimization for IHV's.
3. More freedom for future hardware architcture design.

Chalnoth, I think this is fine if you ignore the fact that GLSL requires you to be able to run _any_ code that is valid to the specification if you expose the extensions. This in itself will make it harder to produce an optimul result, and will inevitably lead to more bugs. This is of course moot if HW becomes available that can truely run GLSL, but then in theory anyone writing for that HW under GLSL is going to have problems producing for a wider audience. Given this I think you'll find that Joe is correct.

So, I feel that Microsoft's HLSL was a mistake. It was an implementation of a somewhat better language structure (Cg) that allowed for better optimization opportunities, but since IHV's have no control over how the shaders compile to assembly in their drivers (Cg was designed to allow each IHV to produce their own back-end compiler), all you get is an assembly shader prototyping tool for software developers.

Wrong, the IHV's still optimise in there driver, they are not prevented from doing this (well maybe one IHV hasn't managed to grasp this yet...). By the time the arguments about absence of external functions becomes relevent, the problem may well have been fixed. There are even some arguements in the RISC vs CISC vain that mean that external functions might actually be a questionable approach.

Anyway, enough of this foul temptress of a thread, no more, down I tell you...

John.

Hyp-X · Nov 12, 2003

Chalnoth said:
You're speculating that GLSlang drivers will be buggy.

No I'm not.
I speculate that I have to install a specific driver on a specific hardware and test a specific shader to find out if it exceeds limitations or not.
There doesn't have to be driver bugs for this scenario to happen.

Sure they have to do everything instead of half of the work done by HLSL.
(And I don't mean parsing.)

Click to expand...

The assembly has less information than the HLSL. Compiling from the HLSL, with more information, will make optimization easier.

So if you have an architecture that can't support branching you can have branches removed in the HLSL or you can remove them in your own compiler.
You have to remove it anyway.
If that's easier for you, than define "easy".

PS 3.0, 4.0, etc. will be low-level assembly languages. Just like PS 2.0, they will work best with one single hardware architecture. Anything very different will have problems.

No they will fit as good as the IHV is willing to cooperate on defining those languages.

KimB · Nov 12, 2003

JohnH said:
Chalnoth, I think this is fine if you ignore the fact that GLSL requires you to be able to run _any_ code that is valid to the specification if you expose the extensions.

IHV's have always released documents pertaining to getting good performance on their products. This will be no different. The paths that can't be used on current-generation hardware in realtime just won't get optimized, and docs will be released telling developers what not to do.

I really don't see how this presents a problem.

Which API is better?

Which API is Better?

OpenGL2.0 is more elegant, easier to program

DirectX9 is more elegant, easier to program

Both about the same

I use DirectX mainly because of market size and MS is behind it

Humus

Crazy coder

Rolf N

Recurring Membmare

Simon F

Tea maker

Dio

Joe DeFuria

KimB

Hyp-X

Irregular

DemoCoder

Frank

Certified not a majority

Humus

Crazy coder

KimB

MfA

Simon F

Tea maker

DemoCoder

MfA

JohnH

JohnH

JohnH

Hyp-X

Irregular

KimB

Similar threads