Blurb about general C code on GPUs at Ace's Hardware...

Guden Oden · Dec 27, 2003

Dunno if you guys spotted it yet, but the snippet they have on the page mentions 20Gflops performance, roughly equivalent to 10GHz P4 (though the exact GPU that produced this isn't mentioned).

However, if GPUs are ever going to be useable to run general code, surely we HAVE to get rid of all these god-damned cheating/"optimizing" drivers. It's one thing if there's a little less 'shiny shine' on the graphics than the game programmers intended after the video driver has "optimized" the shader, but if you're doing *real* code that could be pretty much fatal.

Hopefully, the "optimizations" are limited to specific titles, and not general in nature so that not EVERY sequence of xxx, yyy, zzz (etc) instructions are replaced with something fairly similar but not equal.

I saw something a couple months ago that M$ was tightening up the WHQL certification process to put an end to Nvidia's antics re. "optimization" of various softwares, did they actually do this, and what changes were done? Is Nvidia (or others) finding ways to circumvent these new tighter specs - if that's the case - or did the changes have the desired effect?

Anyone have anything interesting to say 'bout this (general code on DX9 GPUs), and the potential issues with "optimization"?

Tim Murray · Dec 27, 2003

Uhhhhhhh...

You understand that shader replacement is shader specific, no? So, shader replacement and running general code on a GPU has absolutely nothing to do with each other? Cheating isn't automagic--it takes driver developers sitting down and writing much simpler shaders that produce a very similar effect.

nelg · Dec 27, 2003

http://www.beyond3d.com/forum/viewtopic.php?topic=9570&forum=9

FUDie · Dec 27, 2003

The Baron said:
You understand that shader replacement is shader specific, no? So, shader replacement and running general code on a GPU has absolutely nothing to do with each other? Cheating isn't automagic--it takes driver developers sitting down and writing much simpler shaders that produce a very similar effect.

And you know for the sure the driver doesn't drop down to FP16 on its own?

-FUDie

Tim Murray · Dec 27, 2003

FUDie said:
The Baron said:

You understand that shader replacement is shader specific, no? So, shader replacement and running general code on a GPU has absolutely nothing to do with each other? Cheating isn't automagic--it takes driver developers sitting down and writing much simpler shaders that produce a very similar effect.

Click to expand...

And you know for the sure the driver doesn't drop down to FP16 on its own?

-FUDie

I said similar shaders, not lower precision. Testing to see if a driver switches to FP16 is simple enough, though, unless of course NVIDIA magically managed to defeat all possible tests of current FP precision.

Guden Oden · Dec 27, 2003

The Baron said:
Uhhhhhhh...

You understand that shader replacement is shader specific, no?

Thanks for the patronizing attitude man, and a merry christmas to you too.

ANYway... Can we really be SURE of what you say?

That driver writers will replace individual shaders in popular titles initially is already confirmed... *cough* 3dmk 2003 *cough* However, as the shader code compiler evolves, I think there could be a distinct possibility of it detecting particular instruction sequences rather than entire shaders. After all, there's only so many ways to efficiently do a certain effect.

I'm not saying it is the way it works now, but if it IS, then this would effectively screw with the useage of GPUs as general code processors.

Tim Murray · Dec 27, 2003

I doubt we'll ever see some ultra-advanced shader compiler that dynamically generates new shaders that are mathematically almost equivalent but much faster on the fly. First of all, it's a lot of work, and I doubt that the replacement of small groups of instructions is enough to drastically improve speed. Second, the backlash from developers would be tremendous.

Then again, you could just get Futuremark to "disable the Unified Shader Compiler" with a patch.

demalion · Dec 27, 2003

Both of you have apparently missed the point, at least as far as this project goes.

There are optimizations, and there are "optimizations".

Optimizations are done for CPUs all the time, to increase performance, and produce the same output. So they are not a problem for running general code if done properly, and they increase in impact as the complexity of the task, length of the shader, and quality of the optimizations increase.

This isn't more of a concern for this project than it should be a concern for all the other optimizing compilers used for coding elsewhere (which isn't to say that isn't a concern, just that sucessfully sticking to maintaining identical output solves the problem).

"Optimizations", on the other hand, can be a disaster for running general code, because they don't produce identical output. When used while claiming to be representative of performance for the same workload, this is why they are really cheating. It is this type of word game that results in confusing conversations and concerns about the danger of "optimizing" code.

"Optimizations" are also indeed easily done with with shader replacements. However, the "optimizations" that could be disasterous for the Brook project, for the NV3x family, can consist of reducing below the specification precision where it isn't asked for (_pp, for example, is below the specification when you don't ask for it), which can be done with or without shader replacement.

But this doesn't have anything to do with optimizations, except as part of the above word game.

There are actually already optimization compilers for both ATI and nVidia, and they are likely still evolving in quality (ATI, who discusses it distinctly, has said that clearly at least). It is just that nVidia refers to optimizing and "optimizing" as one thing, and confuses the issue.

In any case, a 10% increase in performance is certainly significant when it is achieved, and between HLSL optimizing and the run-time compilers, 10% seems a pretty conservative figure comparing optimized and unoptimized code. At first glance, there should be similar opportunities with the BrookGPU extension.

DemoCoder · Dec 28, 2003

There's a simple definition for a true compiler optimization: one that does not change the value of the computed result or does not change it within the range of expected error (*).

(*) in floating point, simply reordering an expression according to normal algebra can result in a value which is not bit-identical to the original, which is why you shouldn't expect A == B to work in floating point unless you really know what your doing. On the other hand, if you wanted 6 significant digits of precision, and you only got 2, then the optimization is not valid. Therefore, the compiler has to be careful when reordering expressions.

arjan de lumens · Dec 28, 2003

The Baron said:
I doubt we'll ever see some ultra-advanced shader compiler that dynamically generates new shaders that are mathematically almost equivalent but much faster on the fly. First of all, it's a lot of work, and I doubt that the replacement of small groups of instructions is enough to drastically improve speed. Second, the backlash from developers would be tremendous.

It's not THAT hard to make optimizations that do not depend on full shader replacement. Basic data flow analysis, as described in the "dragon book"(recommended in order to get a clue about what compilers actually can and cannot do) and taught in compiler courses for decades, is enough to do "optimizations" like:

Replacing all use of FP32 that doesn't influence texture lookups with FP16
Recognizing sequences like e.g. DOT3/RSQ/MUL (which is used to normalize vectors) with e.g. a normalization cubemap lookup.

Such optimizations sound rather likely for a GPU vendor to implement (a few man-months at most; the data flow analysis phase is probably already in place in order to schedule instructions properly) and will usually not give a noticeable image quality hit for 3d graphics, but they severely affect the usefulness of GPUs for non-3d-graphics types of tasks.

nelg · Dec 29, 2003

Someone should ask Microsoft to rewrite their reference rasterize to use this.

Guden Oden · Dec 30, 2003

nelg said:
Someone should ask Microsoft to rewrite their reference rasterize to use this.

...Which brings me back to my original post about M$ alledgedly tightening up their WHQL procedure to eliminate, or at least reduce, the "optimizations" we've been seeing lately.

Anyone know anything about this?

Blurb about general C code on GPUs at Ace's Hardware...

Guden Oden

Senior Member

Tim Murray

the Windom Earle of mobile SOCs

nelg

FUDie

Tim Murray

the Windom Earle of mobile SOCs

Guden Oden

Senior Member

Tim Murray

the Windom Earle of mobile SOCs

demalion

DemoCoder

arjan de lumens

nelg

Guden Oden

Senior Member