The Experts Speak.. "Automatic Shader Optomizer"

OpenGL guy · Sep 17, 2003

Humus said:
Quite expensive. The DX9 docs lists the sincos macro as taking 8 instruction slots. And then we haven't included reduction to [-pi,pi], which takes another 3 instructions.

I believe that these are the maximum number of instruction slots that are allowed to be taken. That doesn't mean that you can't do better.

Ailuros · Sep 17, 2003

jimbob0i0 said:
Simon F said:

Dio said:

Frightening, innit. Done so much work inside drivers I don't know what it's like outside any more...

Click to expand...

Could we call that "Coding Xenophobia"?

Click to expand...

Surely you mean "Coding Agoraphobia" ... unless you are referring to NV having "Coding Xenophobia" fearing anything apart from their own 'species' of coding language

ThatÂ´s all greek to me

Simon F · Sep 17, 2003

While on the subject of shader optimisation, a paper titled "Automatic Shader Level of Detail" (linked here) was presented at Graphics Hardware 2003. This discussed ways to automatically generate cheaper/approximate versions of shader code so that as objects were moved into the distance, the shader would be swapped. The change was usually undetectable.

OpenGL guy · Sep 17, 2003

Simon F said:
While on the subject of shader optimisation, a paper titled "Automatic Shader Level of Detail" (linked here) was presented at Graphics Hardware 2003. This discussed ways to automatically generate cheaper/approximate versions of shader code so that as objects were moved into the distance, the shader would be swapped. The change was usually undetectable.

I heard a similar talk but it's not impressive. As the object with the complex shader gets further away, you are shading less pixels so it's less of a burden. It's the nearby objects that cause you to do the most work, and they have to look the best.

Simon F · Sep 17, 2003

OpenGL guy said:
I heard a similar talk but it's not impressive. As the object with the complex shader gets further away, you are shading less pixels so it's less of a burden. It's the nearby objects that cause you to do the most work, and they have to look the best.

But as objects get further away you should be able to see more of them so, potentially, there are still a lot of complex pixels to be shaded.

Humus · Sep 17, 2003

Mintmaster said:
Just curious, but why did you decide not to use a lookup texture? Did you need a lot of accuracy?

Laziness I suppose

It's much quicker to type "sin" in HLSL than to write code generating a sine texture, upload, bind and sampling it. Not that that takes a whole lot of time either ... but there would be minimal performance gains, since the objects I applied this shader onto is small.

Simon F · Sep 17, 2003

Humus said:
Mintmaster said:

Just curious, but why did you decide not to use a lookup texture? Did you need a lot of accuracy?

Click to expand...

Laziness I suppose
It's much quicker to type "sin" in HLSL than to write code generating a sine texture, upload, bind and sampling it. Not that that takes a whole lot of time either ... but there would be minimal performance gains, since the objects I applied this shader onto is small.

I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

Rodéric · Sep 17, 2003

Simon F said:
I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

mmmhhh.....
mmmhhhh.....

Hint ?!

Simon F · Sep 17, 2003

Ingenu said:
Simon F said:

I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

Click to expand...

mmmhhh.....
mmmhhhh.....

Hint ?!

You conspirarcy theorists....

A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

RussSchultz · Sep 17, 2003

Simon F said:
A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!

Kristof · Sep 17, 2003

Ingenu said:
Simon F said:

I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

Click to expand...

mmmhhh.....
mmmhhhh.....

Hint ?!

AAAAAAAAAAAGGGGGGGGGGGHHHRRRR... quick someone unplug SimonF's keyboard

Dio · Sep 17, 2003

This is already true on CPU's; in SSE, it's faster to do a polynomial expansion to compute log, exp, sin, cos, etc. than it is to do it by tables. And the results are very good (if you have a mathemagician handy to work out the difficult bits).

Dio · Sep 17, 2003

Kristof said:
quick someone unplug SimonF's keyboard

Security-mindedness has banned wireless keyboards? Or they just don't produce the response you need for Quake3 / Warcraft III / Jedi Knight (delete as applicable)

arjan de lumens · Sep 17, 2003

Simon F said:
A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

Umm, no - GPUs are very different from CPUs in this respect. In a CPU, a cache miss usually means that you stall the instruction stream and thus the entire processor - in a GPU, a texture cache miss only means that you switch processing to another pixel (modern GPUs are easily capable of holding several tens or hundreds of pixels in flight and swap processing freely between them to maximize efficiency). If several pixels suffer texture cache misses, the memory controllers in modern GPUs are easily capable of pipelining the memory requests for the texture cache misses, usually to such an extent that you sustain ~90-95% of either theoretical texel fillrate or effective memory bandwidth, whichever limitation you hit first.

jimbob0i0 · Sep 17, 2003

Dio said:
Kristof said:

quick someone unplug SimonF's keyboard

Click to expand...

Security-mindedness has banned wireless keyboards? Or they just don't produce the response you need for Quake3 / Warcraft III / Jedi Knight (delete as applicable)

Still needs a physical attachement to a PS/2 or USB port in some way though

Simon F · Sep 17, 2003

Dio said:
This is already true on CPU's; in SSE, it's faster to do a polynomial expansion to compute log, exp, sin, cos, etc. than it is to do it by tables. And the results are very good (if you have a mathemagician handy to work out the difficult bits).

In the old SGL driver we used to use a Newton Rhapson method to compute 1/x (or was it x^(-1/2)?) rather than use a lookup because it was much faster. Sadly it was faster than the built-in equivalent Pentium intrinsic(s)

arjan de lumens said:
Umm, no - GPUs are very different from CPUs in this respect. In a CPU, a cache miss usually means that you stall the instruction stream and thus the entire processor - in a GPU, a texture cache miss only means that you switch processing to another pixel (modern GPUs are easily capable of holding several tens or hundreds of pixels in flight and swap processing freely between them to maximize efficiency). If several pixels suffer texture cache misses, the memory controllers in modern GPUs are easily capable of pipelining the memory requests for the texture cache misses, usually to such an extent that you sustain ~90-95% of either theoretical texel fillrate or effective memory bandwidth, whichever limitation you hit first.

You're telling me how GPUs work? To think, I must have been asleep here for the past 10 years...

Dio said:
Kristof said:

quick someone unplug SimonF's keyboard

Click to expand...

Security-mindedness has banned wireless keyboards? Or they just don't produce the response you need for Quake3 / Warcraft III / Jedi Knight (delete as applicable)

Nahh ... it stops someone from walking off with your keyboard. Anyway, who has time for Q3/etc etc. I haven't played something like that since I got over my multiplayer Descent addiction.

digitalwanderer · Sep 17, 2003

RussSchultz said:
Simon F said:

A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

Click to expand...

There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!

Stop, I'm getting the giggles again!

Simon F · Sep 17, 2003

RussSchultz said:
There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!

Code:

for(i=0; i &lt; 100; i++)
{
  printf(":roll:");
}

RussSchultz · Sep 17, 2003

Simon F said:
RussSchultz said:

There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!

Click to expand...

Code:

for(i=0; i < 100; i++) { printf(":roll:"); }

Your vehement denial will get you nowhere! The cat is out of the bag.

Simon F · Sep 17, 2003

Oh I can't take it anymore. I confess. We're putting a cray YMP on a chip which will emulate the x86 in microcode which in turn runs the refrast.

The Experts Speak.. "Automatic Shader Optomizer"

OpenGL guy

Ailuros

Epsilon plus three

Simon F

Tea maker

OpenGL guy

Simon F

Tea maker

Humus

Crazy coder

Simon F

Tea maker

Rodéric

a.k.a. Ingenu

Simon F

Tea maker

RussSchultz

Professional Malcontent

Kristof

Dio

Dio

arjan de lumens

jimbob0i0

Simon F

Tea maker

digitalwanderer

Simon F

Tea maker

RussSchultz

Professional Malcontent

Simon F

Tea maker

Similar threads