Which API is better?

DemoCoder · Nov 17, 2003

Well, in the industry I'm in, it's not the cost of HW or SW that matters. Many times, the choice is influenced by politics or "strategic partnerships". Use company X's product, because we have a relationship or deal with company X. Manager's don't dictate to use the "cheapest" means to accomplish a task. A manager's biggest concern is usually delivering software on time and with the fewest bugs.

In the 90s, everyone was buying expensive Sun server HW, Sun RAIDs, etc instead of cheap linux servers, despite the fact that the Sun HW wasn't really any easier to manage. In the 80s, everyone bought IBM, despite the fact that clones of IBM mainframes were faster and cheaper.

At best, the choice the use DX is simply one of installed base and support from MS. But companies in the past that have chosen MS technology will frequently learn that the costs and benefits look cheap in the beginning, but you have to be in it for the long haul, because of MS bugs and design flaws. You have to bet that MS will fix the problems eventually. And they usually do -- several years later.

Anyone working on WinCE, the first .NET, or the early DXes, for example will know what I'm talking about. If you're willing to wait 3-5 years for MS to mature and fix stuff, fine. DX is finally starting to mature, but some consolation that was to people who started on DX3 or DX5. Today, the flaws in the shader model won't manifest themselves because of the paultry lack diversity in DX9 HW, and the lack of DX9 titles. But in 2 years, it will be quite evident that the MS model is flawed, and I think you will find that DX10 will feature compilation in the driver model, just like DirectX has adopted most other OGL ideas.

Joe DeFuria · Nov 17, 2003

DemoCoder said:
Well, in the industry I'm in, it's not the cost of HW or SW that matters. Many times, the choice is influenced by politics or "strategic partnerships".

Yes...that does happen. Though there is of course a limit to the amount of politics that can impact going with the "better" or "cheaper" alternative.

Manager's don't dictate to use the "cheapest" means to accomplish a task. A manager's biggest concern is usually delivering software on time and with the fewest bugs.

It's both. We want to deliver software on time and with the fewest bugs, because that means less cost or more revenue. (Just as being "cheaper" means less up front cost.)

But in 2 years, it will be quite evident that the MS model is flawed...

Again, that depends entirely on your perspective. You are implying that in two years, we may find out that the DX model is flawed, yet right now there is hardware and the API to support it, so I can certainly say the GL model is actually flawed right now.

So what's better? Being definitely flawed now or perhaps flawed later?

....and I think you will find that DX10 will feature compilation in the driver model, just like DirectX has adopted most other OGL ideas.

Perhaps that will be true. However, that doesn't mean that MS is adopting anything. It can simply mean that the Microsoft believed the best route to take with DX9 was to not adopt compilation in the driver.

Rolf N · Nov 17, 2003

Joe DeFuria said:
DiGuru said:

Just ask any manager. They're not interested in technical details anyway.

Click to expand...

I don't know and managers for gaming development houses.
I'm a manager for other types of development. And I'm not necessarily interested in the tech details either....I'm mostly interested in the job getting done.

And that means I ask the people doing the job what they need. If it doesn't cost me anything

If that isn't a giveaway statement ...
You say you're a manager. You say you're not interested in technical details. At the same time you're arguing furiously (no pun intended) over the perceived disadvantages to using GL. Wouldn't it be proper to delegate such judgement to someone who's more interested in (and thus presumably more familiar with) the technology? Ie just ask programmers about the relative effort in work hours, for a start.

Joe DeFuria said:
If choosing OGL would be very much cheaper, it might be different. But only if a lot of others would use and recommend it as well.

Click to expand...

You've got it backwards. If OGL isn't more expensive, then there's no reason I wouldn't allow them to use it if that's what they're asking for.

No offense, but it sounds to me like you're talking out of your ass a bit with respect to "managers."

"If OGL isn't more expensive <...>". You're assuming it is, without having reasons to back it up. And no, I don't count "Think consumer space" as a reason, you'd have to add more meat to that. I also don't count "other companies do it, so there must be a good reason even though I don't know it". The point is: you don't know, and you never really wanted to know. You just don't listen.

Joe DeFuria · Nov 17, 2003

zeckensack said:
If that isn't a giveaway statement ...

??

You say you're a manager.

Yup.

You say you're not interested in technical details.

No, I AM interested in technical details, but to the extent that they impact costs or revenues. I'm not interested in technical detatils for the sake of being interested in the technical details.

At the same time you're arguing furiously (no pun intended) over the perceived disadvantages to using GL. Wouldn't it be proper to delegate such judgement to someone who's more interested in (and thus presumably more familiar with) the technology?

Sigh.

I see you're not a mananger.

I'm not going to delegate "judgement" about technology. I make the judgement on technology. I demand of my staff that they be able to give me certain facts. For instance:

* My staff tells me that "using technology A, we should be able to complete the task in X man hours. Using technology B, it would take Y man-hours.

(The above is grossly simplified, but I hope you get my point.)

Me, being the manager, knows the details about things that my technical staff doesn't. Such as other (non-labor) costs...or what I could hire OTHERS (or contract others) to do the job for, etc.

Edit: In other words, the input from my staff is critical, but it's still only one consideration for choosing which technology to persue. /end edit.

Ie just ask programmers about the relative effort in work hours, for a start.

Exactly. I don't delgate to them what tools they use. I ask them about work effort, and that data, compiled with other data, gives ME the information that I need to make the decision.

"If OGL isn't more expensive <...>". You're assuming it is, without having reasons to back it up.

Huh? I'm certainly NOT assuming that coding for GL is "more time consuming" than coding for DX. There's more to costs than software development costs.. There's IHV driver development costs for example.

And there's more to "what's best for consumers" than what costs the leasts. Best for consumers is not necessarily the cheapest.

The point is: you don't know, and you never really wanted to know. You just don't listen.

No, I'm listening, and I don't see EITHER side presenting "proof" that "their way" is better. It's called a difference of opinion. I at least acknolwedge the advantages for the GL platform (software interface.)

You're just again in that closed minded world of "if it's not best for the software developers, it can't be best for consumers."

All I see is a lot of DirectX Model bashing based on little more than "software developers perfer coding for GL."

krychek · Nov 17, 2003

With DirectX Next, won't DX be setting the standard for the graphics hardware? And then OpenGL will have to play catch up. It looks like MS can dictate the minimum required feature set and the hardware manufacturers will simply have to comply. So, will OpenGL never be the "forward looking API" again?

AndrewM · Nov 17, 2003

krychek said:
With DirectX Next, won't DX be setting the standard for the graphics hardware? And then OpenGL will have to play catch up. It looks like MS can dictate the minimum required feature set and the hardware manufacturers will simply have to comply. So, will OpenGL never be the "forward looking API" again?

I dont think it quite works that way. Do you think MS just pulls stuff out of the air and forces IHV's to make hardware for it. Why do companies like nvidia and ati have R&D? why not just have 'D'?

krychek · Nov 17, 2003

No, I didn't mean MS is going to put up random stuff in the DX specs. For its Longhorn, it needs some features to be supported by the hardware which are put into the spec. These will have to be supported by the IHVs. If the features are big enough, then the IHVs might not really bother putting any stuff that might come out of their R&D departments. It could be that MS's required features and the stuff coming out of the IHVs' R&D might be the same but when new chips come out with speedups for special cases (like stencil shadows) instead of spending that die area for more functionality, I really don't think innovative features are as high in their priority list as performance and supporting a spec are.

So its nice that MS can push the hardware forward if it wants to but is it the right one to do so? The ARB would seem nice for this but it looks slow in ratifying specs.

Now, pick apart my incoherent n00b post :mrgreen:

.

archie4oz · Nov 18, 2003

For its Longhorn, it needs some features to be supported by the hardware which are put into the spec.

You don't need a DX "Next" for Longhorn. DX9 satisfies it's feature requirements (as it stands it's currently using a version called DirectX 9.0L). The big changes for Longhorn entail more with driver implementation than the DirectX API.

In any case ATI and Nvidia are probably getting a lot of practice on OS X since the concept is the same...

krychek · Nov 18, 2003

Oh, I was under the impression that Longhorn needed a command processor in the GPU like in the P10. I remember reading that virtual memory and other features in the P10 would be useful in Longhorn. Anyway thanks for pointing that out.

JohnH · Nov 18, 2003

Humus said:
JohnH said:

Humus said:

No, the problem is the intermediate language. You can't optimize for both architectures with a common profile. Yes, the GFFX hardware is slow too, but that's just another problem. Both vendor's hardware could be faster had they had the opportunity to compile the code themselves, though the difference would probably be larger on the GFFX side.

Click to expand...

I would be prepared to bet a large sum of money that you are completly wrong here (well within a few percentage points).

John.

Click to expand...

Well, be prepared to pay then. 8)
One example is that removing common subexpressions is a good optimization for the R300 since register usage is for free. On the NV30 on the other hand it's not an optimization at all, rather the opposite since register usage is costly. There's no way a common intermediate version can be optimal for both, so the compiler needs to either unfairly favor one, or come up with a good compromise that's decent but not optimal for either card.

What common sub-expressions are you talking about exactly?

JohnH · Nov 18, 2003

Humus said:
JohnH said:

Humus said:

The R9700 also has major problems with certain GL_ARB_fragment_program code.

Click to expand...

Such as ? Last I heard DoomIII didn't have an issues with ARB frag on ATi HW. Seriously, out of interrest what sort of problems ?

Click to expand...

For instance non-native swizzles. It can expand to many instructions.

On the basis of other arguments here, including additional functionality beyond some base profile is supposed to be a good thing... Maybe you're not arguing against it ?? Don't forget GLSlang goes massively beyond just simple non native swizzles, surely in the same vain that makes it even worse than the arb frag extension?

John.

JohnH · Nov 18, 2003

DemoCoder said:
Humus said:

Well, be prepared to pay then. 8)
One example is that removing common subexpressions is a good optimization for the R300 since register usage is for free. On the NV30 on the other hand it's not an optimization at all, rather the opposite since register usage is costly. There's no way a common intermediate version can be optimal for both, so the compiler needs to either unfairly favor one, or come up with a good compromise that's decent but not optimal for either card.

Click to expand...

Also, LRP and CMP are expensive on the NV3x, but SINCOS is very cheap. FXC has an affinity for choosing these over other constructs. IF_PRED would be more optimal for the NV30, and SINCOS is way cheaper than a power series expansion. The SINCOS expansion is devastatingly inefficient on the NV3x because it eats up multiple extra registers.

Err, but you get those instructions down at the driver they're all part of the intermediate format, no information is lost there (well other that the annoying bug with sincos).

Another example is BIAS, and SHIFT operations, which on DX8 HW and some DX9 HW are hardware supported. But DX9 can't represent them, so code for "(x - 0.5)*2" generates code like

def c0, 0.5,2.0, 0, 0
add r0, v0, c0.xxxx
mul r0, r0, c0.yyyy

Which requires the driver to do some real heavy lifting to figure out what the hell is going on, since it will have to inspect the content of the constant registers themselves to figure out if it could generate a HW BIAS/SCALE modifier or not. And if the code is 2*X - 1, a GLSLANG compiler could still figure out how to use HW bias/scale via strength reduction techniques, but FXC will merily generate raw code for this.

Actually its not that heavy duty to spot these types of optimisations, but yes its is more work than if you had higher level information available to you.

Oh, did I mention that FXC doesn't do constant folding correctly and that I noticed that sometimes it would actually waste a register to add two constants together that could have been rewritten with a fold?

Isn't that a bug with the HLSL? Has no impact on the im format.

Basically none of these issue need lead to less efficeint code, but yes it does lead to some effort in writing the driver based compiler, I won't deny it.

FXC is more efficient for an register-combiner-like phased pipeline (e.g. R300), and unfortunately, doesn't take kindly to NV3x's choices of going with specialized SINCOS and predication HW.

I don't believe the NV3x's pipeline will ever beat an R300, if both are optimized to max. The issue is not whether the R300 isn't a killer card that destroys the NV3x. The issue is whether or not DX9 will restrict pipelines in the future that have more flexibility. It's hurt the NV3x already, and I'm just worried that when they try to introduce real HW branching into the R300 successor, we are going to run into significant problems.

[/quote]
By the time those future pipelines are available there will be new profiles to take advantage of them. On top of this existing profiles will run faster as basic intsruction throughput will increase as well, so use of those profiles now won't cripple performance of anything written with them now.

Later,
John.

JohnH · Nov 18, 2003

Xmas said:
JohnH said:

Xmas said:

DX9 intermediate representation is, from a technical POV, flawed. This has nothing to do with either ATI or NVidia, or PVR for that matter.

Click to expand...

Its only flawed if you try to write a shader that exceeds a specific profile, if thats the case use a higher specification profile, HW not support one ? Well thats what the profiles are there for.

Click to expand...

No, it is flawed because it drops vital information on what that shader is really supposed to do. And the profiles are flawed because they a) are based on the flawed IR, b) do not accurately represent HW limitations and capabilities and are therefore much too limited, c) there are no 3.0 profiles yet, and d) the best profile is not automatically chosen at runtime.

Actually most of the information it drops would more qualify as "useful" as apposed to "vital", its possible to sort out most things from what you've been given (yes, not all, most) this I know from experience. Looking at this from my pov, right now, GLSlang is about as far away from any available HW as you can get, not a very good profile.

What is flawed, in the majority of this thread, is the assumption that its a good idea to be able to try and compile and arbitrary shader to an arbitrary peice of HW without any way of telling in advance if its going to succeed, or if it succeeds if it will run at a reasonable speed. Hey there was even the beginning of a discussion of how to fix this but that seemed to fail to go forward for some reason..

Click to expand...

I think validation tools for runtime compilation are the way to go. It's a small effort for the IHVs, but it doesn't have the flaws of profiles: no flawed IR, accurate representation of the hardware, automatically runs the best way possible.
I realize there is a problem with an IHV presenting hardware that is more limited than the current "least-capable" shader (meaning GLslang capable) hardware. This will be a problem for now, while there is no "legacy shader hardware" to target. But maybe in one or two years developers have decided on e.g. Volari as their lowest target, and any new hardware will be more capable than that.

Actually I think the validation would work if coupled with some defined support levels as this would minimise some of the effort associated with generating something you know is going to work in the feild

John

Joe DeFuria · Nov 18, 2003

JohnH said:
Anyway, after to many joint jolting reply sessions, ireally exiting thread this time (honest).

Mwuhaahaa!!

DemoCoder · Nov 18, 2003

John said:
HErr, but you get those instructions down at the driver they're all part of the intermediate format, no information is lost there (well other that the annoying bug with sincos).

A HLSL branch, depending on the code, can be written as MIN/MAX, CMP, LRP, IF, and IF_PRED. On some HW, it is more efficient to use CMP, on some, LRP/MIN/MAX, on others, branching, and perhaps others, predicates. The FXC compiler is forced to pick one of these, so let's say it picks CMP to implement HLSL if(cond). Well, CMP performs badly on NV30, so you are now asking their driver to take low level assembly, which has been inlined and reordered, and reverse engineer loop branches out of it.

And IHV's are support to have an easier time developing DX9 drivers because of this? Why don't they just put a DECOMPILER back to SOURCE in there while they are at it?

Your answer to everything is "profiles, profiles, and more profiles!" Despite the fact the number of profiles will have to grow quite large, Microsoft will have to maintain all of them, and it still doesn't remove the burden from the IHVs to write compilers to deal with DX9 assembly.

Actually its not that heavy duty to spot these types of optimisations, but yes its is more work than if you had higher level information available to you.

It's not that heavy duty if they remain relatively intact like I showed you above, but it will be hellishly difficult if the instructions get reordered through a scheduler, registers get packed, and some of those intermediate results are reused by the compiler.

Isn't that a bug with the HLSL? Has no impact on the im format.

Yes, but I am listing the inadequacies of the whole platform. And right now, Microsoft has one poorly optimizing compiler to rule them all. MS expands any and all DX9 macros. MS fubar's constant folding. And when will this be fixed? And when people are playing HL2, will their SINCOS units be sitting idle?

Basically none of these issue need lead to less efficeint code, but yes it does lead to some effort in writing the driver based compiler, I won't deny it.

^^^^^^^^^^^^^^^^^^^^^^

Are you listening Joe Defuria?

Joe DeFuria · Nov 18, 2003

DemoCoder said:
Are you listening Joe Defuria?

Yes I am.

I already said that the consequence of the DX model is to not being able to have "as optimal" translation vs. working from full source.

Is making a few "harder" optimizations with the DX model vs. GL model ultimately require more or less resources than coding support to handle code that your hardware doesn't support at all?

arjan de lumens · Nov 18, 2003

JohnH said:
What common sub-expressions are you talking about exactly?

Common subexpression elimination is a fairly common and well-known compiler optimization. Consider e.g. the following two statements:

Code:

A = B+C+D+5;
X = B+C+D+7;

In this example, the sub-expression B+C+D is common for the two statements, so you or the compiler can optimize the code by evaulating B+C+D only once:

Code:

temp = B+C+D;
A = temp+5;
B = temp+7;

which saves a couple of instructions on most platforms and is done by most popular compilers. The problem is that in this case you need to keep 3 variables (A, B, temp) instead of just 2(A, B) to hold intermediate results, so this optimization generally will increase the number of temp variables or registers needed. So you can choose: do it or not? Doing it may penalize the NV3x architectures hard because of the extra needed registers; not doing it will penalize the R3xx architectures hard because they are forced to execute redundant instructions. Doing it on an assembly-like intermediate representation is harder and more error-prone than doing it on an HLL parse tree and is as such not a very good option.

Humus · Nov 18, 2003

JohnH said:
What common sub-expressions are you talking about exactly?

Uhm, pretty much any kind, unless the sub-expression is very large. Like this:

A = (X + Y) * Z;
B = (X + Y) * W;

On R300 this would optimally be done like this:

temp = X + Y;
A = temp * Z;
B = temp * W;

This adds an extra register however, which is unoptimal on NV30. So instead it would be preferable to do:

A = X + Y;
A *= Z;
B = X + Y;
B *= W;

One instruction more, but less register usage. With the MS compiler making these decisions rather than the driver one GPU will be at a disadvantage.

Edit: arjan de lumens beat me to it.

Humus · Nov 18, 2003

JohnH said:
On the basis of other arguments here, including additional functionality beyond some base profile is supposed to be a good thing... Maybe you're not arguing against it ?? Don't forget GLSlang goes massively beyond just simple non native swizzles, surely in the same vain that makes it even worse than the arb frag extension?

Except that a compiler can take advantage of the swizzling hardware of NV30 to cut down instructions, which is great for them. For the R300 though being fed with such a shader means that it will expand into many more instructions than it need to. With a HLSL on the other hand, the driver can decide how to make the best use of the hardware.

Arun · Nov 18, 2003

Just thought of noting that the difference between 2 and 3 registers on the NV3x is minimal. Unless you're at one of the "steps" where adding a register costs you a lot, a register isn't gonna kill you.

Of course, should you choose between maybe 7 or 8 registers, in a 50 instruciton program, 7 registers and 51 or 52 instructions is highly preferable.
8 registers and 40-45 instructions is still preferable to 7 registers and 50 instructions though AFAIK. I'm just guestimating based on the numbers I've seen ( keep in mind the NV35 got lower register usage penalties and the thepkrl numbers are from the NV30! ), but I'd say you shouldn't generalize too much.
Which is precisely why a developer thinking stuff like "on the NV3x, register usage got to be minimal, so I'm ready to increase instruction count quite a bit" would most likely just lose his time writing a shader slower or at least not faster than the original one...

If he's lucky though and, for example, he's on a step case, then he might gain something like 10-15% performance. In fact, a compiler which could think about this stuff, managing to increase instruction count at the cost of registers or vice-versa, would be very hard to implement because of these "steps". An optimization sometimes delivering worse results than the original is not a desirable thing IMO.

Uttar

Which API is better?

Which API is Better?

OpenGL2.0 is more elegant, easier to program

DirectX9 is more elegant, easier to program

Both about the same

I use DirectX mainly because of market size and MS is behind it

DemoCoder

Joe DeFuria

Rolf N

Recurring Membmare

Joe DeFuria

krychek

AndrewM

krychek

archie4oz

ea_spouse is H4WT!

krychek

JohnH

JohnH

JohnH

JohnH

Joe DeFuria

DemoCoder

Joe DeFuria

arjan de lumens

Humus

Crazy coder

Humus

Crazy coder

Arun

Unknown.

Similar threads