The Experts Speak.. "Automatic Shader Optomizer"

Humus said:
Quite expensive. The DX9 docs lists the sincos macro as taking 8 instruction slots. And then we haven't included reduction to [-pi,pi], which takes another 3 instructions.
I believe that these are the maximum number of instruction slots that are allowed to be taken. That doesn't mean that you can't do better.
 
jimbob0i0 said:
Simon F said:
Dio said:
Frightening, innit. Done so much work inside drivers I don't know what it's like outside any more...

Could we call that "Coding Xenophobia"?

Surely you mean "Coding Agoraphobia" ... unless you are referring to NV having "Coding Xenophobia" fearing anything apart from their own 'species' of coding language ;)

That´s all greek to me ;)
 
While on the subject of shader optimisation, a paper titled "Automatic Shader Level of Detail" (linked here) was presented at Graphics Hardware 2003. This discussed ways to automatically generate cheaper/approximate versions of shader code so that as objects were moved into the distance, the shader would be swapped. The change was usually undetectable.
 
Simon F said:
While on the subject of shader optimisation, a paper titled "Automatic Shader Level of Detail" (linked here) was presented at Graphics Hardware 2003. This discussed ways to automatically generate cheaper/approximate versions of shader code so that as objects were moved into the distance, the shader would be swapped. The change was usually undetectable.
I heard a similar talk but it's not impressive. As the object with the complex shader gets further away, you are shading less pixels so it's less of a burden. It's the nearby objects that cause you to do the most work, and they have to look the best.
 
OpenGL guy said:
I heard a similar talk but it's not impressive. As the object with the complex shader gets further away, you are shading less pixels so it's less of a burden. It's the nearby objects that cause you to do the most work, and they have to look the best.
But as objects get further away you should be able to see more of them so, potentially, there are still a lot of complex pixels to be shaded.
 
Mintmaster said:
Just curious, but why did you decide not to use a lookup texture? Did you need a lot of accuracy?

Laziness I suppose :)
It's much quicker to type "sin" in HLSL than to write code generating a sine texture, upload, bind and sampling it. Not that that takes a whole lot of time either ... but there would be minimal performance gains, since the objects I applied this shader onto is small.
 
Humus said:
Mintmaster said:
Just curious, but why did you decide not to use a lookup texture? Did you need a lot of accuracy?

Laziness I suppose :)
It's much quicker to type "sin" in HLSL than to write code generating a sine texture, upload, bind and sampling it. Not that that takes a whole lot of time either ... but there would be minimal performance gains, since the objects I applied this shader onto is small.
I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.
 
Simon F said:
I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

mmmhhh.....
mmmhhhh.....

Hint ?!
 
Ingenu said:
Simon F said:
I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

mmmhhh.....
mmmhhhh.....

Hint ?!
:rolleyes: You conspirarcy theorists....

A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.
 
Simon F said:
A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!
 
Ingenu said:
Simon F said:
I'd imagine that before long, as has happened with CPUs, using a LUT/texture for these simple types of functions will become slower than computing the result with explicit code.

mmmhhh.....
mmmhhhh.....

Hint ?!


AAAAAAAAAAAGGGGGGGGGGGHHHRRRR... quick someone unplug SimonF's keyboard :LOL:
 
This is already true on CPU's; in SSE, it's faster to do a polynomial expansion to compute log, exp, sin, cos, etc. than it is to do it by tables. And the results are very good (if you have a mathemagician handy to work out the difficult bits).
 
Kristof said:
quick someone unplug SimonF's keyboard :LOL:
Security-mindedness has banned wireless keyboards? Or they just don't produce the response you need for Quake3 / Warcraft III / Jedi Knight (delete as applicable)
 
Simon F said:
A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

Umm, no - GPUs are very different from CPUs in this respect. In a CPU, a cache miss usually means that you stall the instruction stream and thus the entire processor - in a GPU, a texture cache miss only means that you switch processing to another pixel (modern GPUs are easily capable of holding several tens or hundreds of pixels in flight and swap processing freely between them to maximize efficiency). If several pixels suffer texture cache misses, the memory controllers in modern GPUs are easily capable of pipelining the memory requests for the texture cache misses, usually to such an extent that you sustain ~90-95% of either theoretical texel fillrate or effective memory bandwidth, whichever limitation you hit first.
 
Dio said:
Kristof said:
quick someone unplug SimonF's keyboard :LOL:
Security-mindedness has banned wireless keyboards? Or they just don't produce the response you need for Quake3 / Warcraft III / Jedi Knight (delete as applicable)

Still needs a physical attachement to a PS/2 or USB port in some way though ;)
 
Dio said:
This is already true on CPU's; in SSE, it's faster to do a polynomial expansion to compute log, exp, sin, cos, etc. than it is to do it by tables. And the results are very good (if you have a mathemagician handy to work out the difficult bits).
In the old SGL driver we used to use a Newton Rhapson method to compute 1/x (or was it x^(-1/2)?) rather than use a lookup because it was much faster. Sadly it was faster than the built-in equivalent Pentium intrinsic(s) :oops:
arjan de lumens said:
Umm, no - GPUs are very different from CPUs in this respect. In a CPU, a cache miss usually means that you stall the instruction stream and thus the entire processor - in a GPU, a texture cache miss only means that you switch processing to another pixel (modern GPUs are easily capable of holding several tens or hundreds of pixels in flight and swap processing freely between them to maximize efficiency). If several pixels suffer texture cache misses, the memory controllers in modern GPUs are easily capable of pipelining the memory requests for the texture cache misses, usually to such an extent that you sustain ~90-95% of either theoretical texel fillrate or effective memory bandwidth, whichever limitation you hit first.
You're telling me how GPUs work? To think, I must have been asleep here for the past 10 years...

Dio said:
Kristof said:
quick someone unplug SimonF's keyboard :LOL:
Security-mindedness has banned wireless keyboards? Or they just don't produce the response you need for Quake3 / Warcraft III / Jedi Knight (delete as applicable)
Nahh ... it stops someone from walking off with your keyboard. Anyway, who has time for Q3/etc etc. I haven't played something like that since I got over my multiplayer Descent addiction.
 
RussSchultz said:
Simon F said:
A texture lookup is going to behave like a LUT access on a CPU. If the value is in the cache then it will run reasonably fast but if it isn't it'll cost a number of cycles. Furthermore, it eats into the memory bandwidth and push something else that's just as important out of the cache.

There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!
:LOL:

Stop, I'm getting the giggles again! :LOL:
 
RussSchultz said:
There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!
Code:
for(i=0; i < 100; i++)
{
  printf(":roll:");
}
 
Simon F said:
RussSchultz said:
There we go, from the horses mouth! PVR5 will have built in intrinsics and be out soon!

Your wagging tongue reveals too much!
Code:
for(i=0; i < 100; i++)
{
  printf(":roll:");
}
Your vehement denial will get you nowhere! The cat is out of the bag.
 
Oh I can't take it anymore. I confess. We're putting a cray YMP on a chip which will emulate the x86 in microcode which in turn runs the refrast.
 
Back
Top