What was that about Cg Not favoring Nvidia Hardware?

KimB · Dec 31, 2002

I don't think much of anybody is going to write PS2.0 shaders in assembly. HLSL and Cg are now available, so there's little reason to bother to compile to PS 1.4 on a card that supports PS 2.0.

I'd be highly surprised if hardly any developers wrote the assembly themselves for DX9. Occasionally we may see a developer that will write the HLSL, compile it, and then attempt to optimize the assembly. Hopefully we instead see more runtime compiling.

Hyp-X · Dec 31, 2002

nooneyouknow said:
The reality for DX9 is this:

I can't speek for all developers only for myself.

1 - Developers will be compiling their software, for late summer / xmas 2003 shipping titles.

Yep we have a title in said range.
DX9 support is a sure thing, I've already ported the engine to DX9 - took about half a day.

Vertex Shaders are much better in DX9 even if you use 1.1 only, so it's definetely worth it.

2 - Developers for the most part will NOT have PS 2.0 shaders. The reason why is that developers really do not know what to do that fully uses PS 1.4, so even more hard for them to even imagine what to do with 2.0. This makes sense because of the huge install base of ATI DX8.1 parts AND the growing install base of DX9 parts.

The one sure thing we'll have is a PS 2.0 based Shadow Buffer implementation.
That requires a rewrite of every pixel combiner case with PS 2.0

I agree that writing PS2.0 only effects has very low benefit on the user base - especially as it'll be an RTS game, not FPS.

3 - In saying the above, there WILL be cases where the developer will write a 2.0 Shader that does the exact same as the PS 1.4 version but I really do not see that. Now, if they can go beyond PS 1.4 functionality, then of course they will do PS 2.0.

Exactly the same functionality makes sense when PS2.0 allows higher precision or speed over the 1.4 implementation.
We are using PS1.1 shaders for things that can be done on DX7 hardware but the PS implementation is much faster.
This will continue on but our minimal requirement is still DX7 hardware so we cannot have too many things that are only possible with pixel shaders.

4 - All the above statements are made with the assumption of developers limited Shader Knowledge AND their increasingly shorter development cycles. The grim reality of the current state of the industry. This will improve, as far as knowledge wise, but not this year.
5 - If ATI and NVIDIA help write 2.0 Shaders, then we will see more REAL DX9 titles this upcoming year, and I really believe that will happen. it has to.

I think you completely missed the point this time!

Prototyping a shader requires a surprisingly short time. It involves a programmer (doing the shader) and an artist (doing some demonstration content). I've never seen this taking longer than a few days!

Proper implementation of the shader is of course a longer work, have to handle multiple cases, multiple hardware, have to cooperate with the rest of the engine, and there's a lot of optimization work.

But this is almost never the bottleneck! The workload impact on the artist is always much larger than the impact on programmers.

For example, we prototyped per-pixel lighting (with bump-mapping).
It was considered to be included on most objects.
But:
1. It multiplies the work an artist has to spend on creating an object.
2. It can only be justified if the result is so much better that it worths the extra work.
3. Worths actually mean that you can convince the investors that it will be a selling point so they invest multiple times the money for it.
4. Given that it's not going to happen - now that means that the effect will likely be used on a few objects only.
5. But it will likely that those few objects would look out of place.
6. The effort of implementation (programming) is likely not justified in this case.
7. So there goes the feature.

Note that it had nothing to do with:
- The programmer knowledge
- The amount of help ATI or nVidia provided

So this way doing fancy things is limited to special effects, as those has less strict requirements.

Just my 0.02 HUF

Doomtrooper · Dec 31, 2002

Great post Hyp-X

Hyp-X · Dec 31, 2002

Chalnoth said:
I don't think much of anybody is going to write PS2.0 shaders in assembly. HLSL and Cg are now available, so there's little reason to bother to compile to PS 1.4 on a card that supports PS 2.0.

I'd be highly surprised if hardly any developers wrote the assembly themselves for DX9. Occasionally we may see a developer that will write the HLSL, compile it, and then attempt to optimize the assembly. Hopefully we instead see more runtime compiling.

Hmm let's see the possible cases:

1. The shader in question can be done on PS1.4 hardware

Options:
a.) writing the shader in PS1.4 assembly
b.) writing it in both PS1.4 assembly and PS2.0 assembly
c.) writing it in a high level shader language

Since the shader is possible to do with PS1.4 it surely cannot have any special features that PS2.0 provides (MRT/MET, more textures, new instructions, etc.)
So options b.) makes no sense as PS1.4 maps very smoothly to PS2.0.

The only case where doing the PS2.0 version worth it when you _do_ make use some of those extra features.
For example doing a normalization cubemap lookup can be replaced with PS2.0 arithmetic vector normalization.

And that's where option c.) makes no sense - since this is an optimization the compiler will never be able to do.
So HLSL cannot have a benefit over option a.) !

2. The shader can only be done on PS2.0 hardware

Options:
a.) writing the shader in PS2.0 assembly
b.) writing it in a high level shader language

This is where option b.) might worth it - depends on how complicated the shader really is.

The real problem is: how do you decide if you fall in case 1 or case 2 if you write HLSL code only?
It might be that it doesn't compile to PS1.4, yet it is possible to implement in PS1.4.
As long as you care for PS1.4, I think there's no real alternative to assembly.

RussSchultz · Dec 31, 2002

Hyp-X said:
As long as you care for PS1.4, I think there's no real alternative to assembly.

What if, in the morning, you don't really respect it?

(joke, people! Joke!)

790 · Dec 31, 2002

nooneyouknow wrote:
Developers for the most part will NOT have PS 2.0 shaders. The reason why is that developers really do not know what to do that fully uses PS 1.4, so even more hard for them to even imagine what to do with 2.0

I don't expect to see 2.0 shaders really used until 2004, but this is because of slow market saturation for DX9 hardware. It is definitely not because developers don't know what to do. I have already blown every instruction in 2.0 for our game (I've cut back of course, for performance reasons), and I'm itching for more.

E.g., 2.0 hardware can only just about pull off many of the HDR image-space techniques that have recently become a reality. This is the stuff that's going to be the "next big thing" after per-pixel lighting (heh, seems so dated now to me, despite no games using it yet!). Tone mapping, and post-processing effects such as a bloom filter to simulate glare/scattering can really let us go wild and simulate the optics of camera or even our eyes. Like, for example, when you emerge from a dark room into broad daylight, everything's going to look significantly overexposed while your pupils dialate, which can be simulated with tone-mapping techniques. And you automatically get flares on any illuminating objects that have high contrast with the surrounding environment, no hacking in billboards for cheesy effects any more. I'd love to be doing this stuff right now, but we'd need to delay the game a year or two before anyone could play it!

DemoCoder · Dec 31, 2002

Hyp-X said:
The only case where doing the PS2.0 version worth it when you _do_ make use some of those extra features.
For example doing a normalization cubemap lookup can be replaced with PS2.0 arithmetic vector normalization.

And that's where option c.) makes no sense - since this is an optimization the compiler will never be able to do.
So HLSL cannot have a benefit over option a.) !

Never is too strong a word.

A compiler for a sufficiently high level language could certainly do that optimization. A good HLSL would have a normalize() library call which would be compiled to either a texture lookup or assembly normalization based on what was more optimal. (it actually might be the case that the cubemap lookup would be faster on some architectures if it could be parallelized with other computations in a compute bound shader)

The runtime would have to take care of housekeeping chores like implicitly binding the cubemap texture and setting all the d3d state, but it would be done. You could also use a code-generator approach to generate C++ init routines on a per-primitive basis for the developer. Then all you'd have to do call a generated setup routine before each primitive or batch of primitives and the approriate states will be setup.

And if you went for maximal flexibility and abstraction, you'd use a scene-graph architecture + HLSL and let the compiler and runtime reorder everything as needed.

But even a peephole optimizer could detect a sequence of PS2.0 instructions which corresponds to normalization and substitute a texture lookup. You'd have to put logic into the device driver to automatically supply the cubemap texture though. This is actually an optimization that could be done in a dx9 device driver.

nooneyouknow · Dec 31, 2002

Hyp-X said:
4 - All the above statements are made with the assumption of developers limited Shader Knowledge AND their increasingly shorter development cycles. The grim reality of the current state of the industry. This will improve, as far as knowledge wise, but not this year.
5 - If ATI and NVIDIA help write 2.0 Shaders, then we will see more REAL DX9 titles this upcoming year, and I really believe that will happen. it has to.

Click to expand...

I think you completely missed the point this time!

Prototyping a shader requires a surprisingly short time. It involves a programmer (doing the shader) and an artist (doing some demonstration content). I've never seen this taking longer than a few days!

Proper implementation of the shader is of course a longer work, have to handle multiple cases, multiple hardware, have to cooperate with the rest of the engine, and there's a lot of optimization work.

But this is almost never the bottleneck! The workload impact on the artist is always much larger than the impact on programmers.

Hyp-X, what I meant was that most game developers out there have limited 3D knowledge and have a hard time even understand what a Pixel Shader is let alone what they can really do with them. I completely agree that art resources is more impacful than programming resources. For me, the problem I see if that programmers / artist don't really know what they can do with today's hardware. Hence my comments. Remember, I am talking about the majority of the developers, not all by any stretch. Plus, if they work for a big publisher who wants a cheap game out quickly, they have even little time to re-write there rendering engine to add Shader support and also decide which shaders to write, then allocate art resources that most likely they did not even count on needing when negotiating the cost to the Publisher.

Note: The film industry actually hires Shader Designers.

790 · Dec 31, 2002

DemoCoder said:
A good HLSL would have a normalize() library call which would be compiled to either a texture lookup or assembly normalization based on what was more optimal.

The compiler won't do anything of the sort, nor should it. Developers would be a bit miffed if a texture slot vanished, unknowingly to them, because a simple macro created and sampled a normalization cube map! It would suck even harder if you already had a normalization map assigned to another slot for your own lookup, and the compiler created another one! A compiler can end up being too clever for its own good, and this kind of stuff definitely crosses that line.

BTW, the cubemap lookup still tends to be a tad faster than the assembly normalize on 2.0 hardware, but you're gaining precision, freeing up a sampler, making your shader tidier, and reducing memory bandwidth. But still, it's a decision a developer must make.

andypski · Dec 31, 2002

790 said:
DemoCoder said:

A good HLSL would have a normalize() library call which would be compiled to either a texture lookup or assembly normalization based on what was more optimal.

Click to expand...

The compiler won't do anything of the sort, nor should it. Developers would be a bit miffed if a texture slot vanished, unknowingly to them, because a simple macro created and sampled a normalization cube map! It would suck even harder if you already had a normalization map assigned to another slot for your own lookup, and the compiler created another one! A compiler can end up being too clever for its own good, and this kind of stuff definitely crosses that line.

Rubbish. (Sorry to be so blunt

)

If you're compiling a shader then why would you be distressed by a texture slot vanishing? You have supplied the compiler with a complete description of the shading required, so it already knows if you're using all the texture slots or not, by definition, and is therefore free to make use of spare ones as needed. It is essentially free to do whatever optimisation it feels like provided it doesn't push the output out of specification, and does it in a way that is invisible to you.

In the above case why would you have a normalisation map attached in one place and be making a normalise() call in another. The only reasons I can think of would be to:

- Enforce some precision behaviour (the normalisation map would probably be lower precision than doing it in the pipeline)

- Do some funky 'not quite normalisation' that your shader relies on, in which case the 'normalisation' map would be useless for proper normalisation anyway.

- Try to hand optimise the shader by balancing ALU ops vs. texture ops. If you are doing this then you really need to write in assembly, rather than in a high level language, and bear in mind that the optimal arrangement will probably be different on differing hardware.

When writing HLSL you should largely let the compiler decide for a given shader and hardware combination what is:

a. Optimal
b. Possible.
c. Within spec.

You appear to be trying to second guess the compiler/driver as to what is optimal - unless you are an expert on the underlying 3D hardware pipeline (which in consumer 3D graphics pretty much no software developers are outside of the IHVs - they don't have sufficient details of what is going on 'under the hood'), then you will probably be wrong.

The only thing you have a right to be 'miffed' about is if the compiler does an 'optimisation' that goes outside the specification and causes incorrect behaviour of your shader. Beyond that you pretty much agreed to give up your right to decide on the exact code that is produced as soon as you wrote to a high level language instead of in assembly, just as on current CPUs...

The only thing that I see that might be a problem here is the that having a normalisation map for optimisation purposes will use some additional texture memory that might not have been accounted for, but it should be possible to handle this intelligently.

KimB · Dec 31, 2002

andypski said:
The only thing that I see that might be a problem here is the that having a normalisation map for optimisation purposes will use some additional texture memory that might not have been accounted for, but it should be possible to handle this intelligently.

Since games run at a variety of resolutions with forced AA today, this is currently a problem. I don't see it being any more of a problem with things like this.

790 · Dec 31, 2002

andypski said:
Rubbish. (Sorry to be so blunt )

Complete Rubbish. Compilers should operate at a very defined abstraction level. Their scope should be at the language-level only. Otherwise, it's NOT a compiler, it's a singing dancing presumptious little high-level utility. It shouldn't become a D3DX utility that starts inserting textures and using up samplers behind my back.

Your other points are rubbish too. To answer them:
1. You'd use a normalization map because you didn't realize your compiler was going to jump up to the D3D API level and start inserting textures to 'aid' you.

2. If a slot gets used, it's very significant to developers. The developer sets the texture states, controls srgb correction, blending modes, keeps track of video memory and allocations, and needs to be very aware of all samplers for performance, bugs, and general usage.

3. I've already told you, doing assembly normalization is a trade-off, it's speed vs the niceties, these decisions are well out of the scope of a compiler.

Again, you're not describing a compiler, you're describing some toolkit that is tightly bound at the API and application level. I was the first one to jump at joy when I got my hands on the HLSL compiler, but I'd never want a D3DX utility that presumes to start inserting textures because it thinks texCUBE(sNormalizer,coords); is too complicated for me to handle. The number of problems I have with such a utility is endless. I no longer have a language, I can't port it to Cg or OGL HLSL, I can't just use the assembly, I've bound myself to a complete toolkit, which has a higher abstraction level than even Java would dare presume. Thankfully MS and NVIDIA agree with me -- we want the shader equivilent of C/C++, not the shader equivilent of visual basic script slash D3DRM.

RussSchultz · Dec 31, 2002

*boggle*

On my DSP cross compiler, it sometimes uses shift lefts, and sometimes multiply by 2.

I personally don't care which method it uses, as long as

x =x *256

gives me the right result.

Shouldn't it be the same for shaders?

DemoCoder · Dec 31, 2002

Rubbish * 2.

Normalize() is builtin library call in shader oriented languages, just like memset/memcpy/bzero/bcopy are in C. However, compilers today can replace a call to memcpy with a compiler-specialized inline version. This is called semantic inlining.

Most programmers no longer write their own memory clear routines, but use the memset/bzero alternatives instead. The same will be true of normalize, faceforward, noise, etc.

The purpose of any high level language is to allow the programmer to declare a high level representation of his calculation, or the intent or his calculation. The compiler should only guarantee that the result will meet the expected precision, be correct, and run fast. There is no guarantee that when you write X = X * 4 that the compiler will actually generate a MUL instruction and there is no guarantee that the code will be run in the exact order it was input. (except in strange cases where required, like with the volatile keyword in C)

Likewise, if my HLSL language allows me to completely specify all the pipeline state I require abstractly, then I no more care that it allows extra texture registers to be used or if it substituted builtin-texture-library lookups then I care about the C compiler using extra registers.

You are assuming some half-asses situation where people are writing half HLSL, half assembly, mixed with DX7 TSS style texture calls. Well, in that case, you have a mess.

But theoretically, there is no reason a HLSL can't be developed that manages all of the OGL/D3D state for you so that you don't have to worry about the compiler binding textures and overwriting your slots.

Perhaps DX9 HLSL/OGL GLSLANG/CG aren't it. That's why we need more languages. We can't stop with these three.

KimB · Dec 31, 2002

Yes, an optimal compiler would manage all low-level hardware resources. Only in this way can the compiler properly-optimize for the hardware, and the compiler should definitely be hardware-specific. Clearly we still have a little way to go yet.

Until then, we will still probably need to write vendor-specific shaders.

I still think that the absolute best thing for the 3D industry right now would be sort of a "standard meta shader language." The best way to do this would be to have it work across all API's, though this isn't absolutely necessary. The more that this language encompasses, the better. I call it a meta shader language because it should not be designed to be written directly to, but instead partially-compiled to by one of many available language compilers developed by any one of many companies (Microsoft, SGI, whoever). Then the compiled meta code would be further compiled by a vendor-specific driver at runtime.

790 · Dec 31, 2002

DemoCoder said:
You are assuming some half-asses situation where people are writing half HLSL, half assembly, mixed with DX7 TSS style texture calls. Well, in that case, you have a mess.

No I'm not. DX9 still uses sampler stages, and YOU must set them. MipFilter, SRGBEnable, AnisotropyLevel, etc, etc.

I'm not expecting to know the underlying instructions, that's the job of a compiler, to abstract a certain level. But I draw the line at a COMPILER getting involved with creating and referencing textures! It's not akin to CPU registers, using assembly registers is akin to CPU registers, which I'm fine with.

A more valid analogy would be your C++ compiler inspecting an initialization routine for an array, and deciding that it should dump the array contents to a file or exe offset, then transparently read from that instead of doing the calculations at load time! In fact it's rather worse than this, due to such limited hardware resources.

To reiterate, I'm fine with the compiler taking things out of my hands, and managing the best-path case at the assembly level, if I wasn't I wouldn't be using HLSL, and furthermore I'm fine with using an extension library such as D3DX, and I have a lot of code built on it, such as our skeletal animation system, and the effects files. BUT the proposition of a library secretly creating textures to emulate a normalize function is far beyond any of that! It is extremely complicated (read: +6 months to DX9 development, because the compiler has to be integrated into a high-level resource system), extremely messy, it completely blows the lid of what the compiler is right now, and would cause a host of problems for VERY little gain.. in fact no gain in the case of normalization, because as I've said, it's a high-level trade-off, not a compiler decision.

andypski · Jan 1, 2003

790 said:
Complete Rubbish. Compilers should operate at a very defined abstraction level. Their scope should be at the language-level only. Otherwise, it's NOT a compiler, it's a singing dancing presumptious little high-level utility. It shouldn't become a D3DX utility that starts inserting textures and using up samplers behind my back.

Oh well, at least I apologised for being blunt (and perhaps a bit rude - if so I apologise again, I was being a bit tongue-in-cheek)

The compiler should (in an ideal situation) be able to generate an optimal version of the shader you want within the specifications of the API, and more importantly to run on the target platform. Since a shader is a complete description of the surface properties and includes all information about what resources are requested (by you) to render the surface the compiler is aware of what additional resources are available.

Beyond this a compiler that is very tightly bound to the API should therefore be able to enforce additional optimisations that you are unaware of, and that don't alter the correct running of your program. DX9 HLSL is not this tightly bound, but in the future shading languages may be, and in these cases the compiler could easily replace library calls with whatever is required to implement the shading efficiently, including texture lookups.

Your other points are rubbish too. To answer them:
1. You'd use a normalization map because you didn't realize your compiler was going to jump up to the D3D API level and start inserting textures to 'aid' you.

Please read my post again - I stated explicitly - 'Why would you use a normalise() call in one place and a normalisation map in another?' I then gave three reasons you might do such an implementation, with the implication being that if you are using a normalise() call in one place you must have some reason (beyond your irrational fear of the compiler using up the resources that you have left unused) for wanting different behaviour from the two normalisations.

2. If a slot gets used, it's very significant to developers. The developer sets the texture states, controls srgb correction, blending modes, keeps track of video memory and allocations, and needs to be very aware of all samplers for performance, bugs, and general usage.

It's irrelevant to developers, providing the implicit state switching that occurs happens invisibly (as it should). When you bind a shader you woud also bind the appropriate resources for that shader to the state hooks in the API, and really you should probably explicitly unbind any additional textures from unused stages.

Again, you're not describing a compiler, you're describing some toolkit that is tightly bound at the API and application level. I was the first one to jump at joy when I got my hands on the HLSL compiler, but I'd never want a D3DX utility that presumes to start inserting textures because it thinks texCUBE(sNormalizer,coords); is too complicated for me to handle. The number of problems I have with such a utility is endless. I no longer have a language, I can't port it to Cg or OGL HLSL, I can't just use the assembly, I've bound myself to a complete toolkit, which has a higher abstraction level than even Java would dare presume. Thankfully MS and NVIDIA agree with me -- we want the shader equivilent of C/C++, not the shader equivilent of visual basic script slash D3DRM.

I am describing a run-time shader compiler that is tightly bound to the API - ie. it compiles shaders to the API, based on the underlying hardware, for maximum efficiency. For a compiler tied to a specific API such as DirectX such an implementation is perfectly possible - a compiler produces code for a target, and I just think your definintion of what a target is is too narrow.

I understand the difference between what I describe and the current state of affairs. I don't mean to start a religious war. For a run-time environment a tightly bound compiler has many advantages for since it can generate a far more 'appropriate' program for highest performance, which is still the main target of real-time graphics. A shader compiler that is not tightly bound to the API has to miss out on many (perfectly reasonable) optimisation opportunities since it cannot know the platform. It seems to me that state and shader are becoming more and more inextricably linked anyway.

790 · Jan 1, 2003

Fair enough, but you have to admit that it brings on considerable burden to the environment and portability to have such a complex and high-level runtime compiler, not to mention increasing the learning curve and scope for the developer (they must be aware of what can on behind the scenes, or they'll be like, where the hell did those two samplers go?). All the extra complexities, though they can be dealt with, are not outweighed by any advantages that I can see. And until the benefits are considerably in favour of such an extended compiler, we won't see such a utility.

RussSchultz · Jan 1, 2003

790 said:
they (developers) must be aware of what can on behind the scenes, or they'll be like, where the hell did those two samplers go?

I think you're still missing the point.

You won't notice those samplers missing because either:
a) you're not requiring them, so the compiler is free to use them.
b) you do require them, so the compiler won't use them.

The only time I care about what registers the compiler is using on the DSP cross compiler I use is when the code isn't operating correctly and I've exhausted the possibilities that the written logic in C is incorrect.

790 · Jan 1, 2003

RussSchultz said:
b) you do require them, so the compiler won't use them.

If it's hardware that can't do normalize in assembly, and I am using the samplers, it'll then have to fail. Therefore, you still must know what is going on behind the scenes to know why it failed (Err: On ps1.x hardware, normalize function uses one sampler and one cube map, please free up a sampler. Note that you only need 1 sampler for n normalize calls) . This sort of defeats the idea that these hardware-specific optimizations can be done transparently, since the shader may now fail depending on other conditions (here, namely, samplers used).

That is *not* like the sitatution with compilers where they use registers, because that's transparent. You can't make such a limited resource such as samplers transparent though, not yet, maybe when we have 512+ we can abstract them too, but not yet.

What was that about Cg Not favoring Nvidia Hardware?

KimB

Hyp-X

Irregular

Doomtrooper

Hyp-X

Irregular

RussSchultz

Professional Malcontent

790

DemoCoder

nooneyouknow

790

andypski

KimB

790

RussSchultz

Professional Malcontent

DemoCoder

KimB

790

andypski

790

RussSchultz

Professional Malcontent

790

Similar threads

What was that about Cg *Not* favoring Nvidia Hardware?

Irregular

Irregular

Professional Malcontent

Professional Malcontent

Professional Malcontent

Similar threads

What was that about Cg Not favoring Nvidia Hardware?