Programming and Optimizing Drivers

Lately there's been all the circulation about ATI finally writing new OpenGL drivers for their video cards because of the lack of stunning performance when compared to nVidia's solutions. Maybe this is a ridiculous question, but it just came to my mind that I have no idea how a company optimizes and upgrades drivers when it comes to performance. What types of measures are taken? What is the genius behind a 20% or 40% increase in frame rates due to a brand spankin' new driver release? What about the drivers is changed?

Just curious. :rolleyes:
 
I assume its like any other piece of coding..

pretty much anything you write (naturally more complex than a "hello world" proggie) can be tweaked to allow for better speed/efficiency...

when you are dealing with games and new architecture the drivers can be designed to better utilise the hardware available to tweak out speed increases.. look @ the nvidia and ati drivers for d3d for an example...

of course I am holding my breath till ati actually delivers on this new driver promise... the inherent with new and re-written code is bugs...
 
basically, when coding, less is more.

if you get the program to do something with 3 words, it is so much faster than if you use 3 paragraphs.

efficiency is key - the less extra bullshit, the faster it will run
 
Snarfy said:
basically, when coding, less is more.

if you get the program to do something with 3 words, it is so much faster than if you use 3 paragraphs.

efficiency is key - the less extra bullshit, the faster it will run

Which is of course why optimisers unroll loops and such. ;)
 
There are different types of optimizations. Some relate to optimizing code, others relate to making sure the HW is being used efficiently. For example, if you are completely HW limited, looking at CPU utilization is likely to be pointless. The converse is also true: If you are CPU limited, then there's not much point at trying to make the HW go faster until you relieve the CPU bottleneck (if possible).
 
Snarfy said:
basically, when coding, less is more.

if you get the program to do something with 3 words, it is so much faster than if you use 3 paragraphs.

efficiency is key - the less extra bullshit, the faster it will run

This is a misconception. A less verbosely specified program is not neccessarily faster after compilation. Fewer instructions != faster execution. In functional programming languages like Haskell, you can specify complex algorithms in just a few lines of code. For example, Quicksort is specified via a single line of code.

The reason why there is performance to squeeze out of drivers is because when developing anything, your first task is correctness: make sure the program actually does what it is specified to do. Thus, the "first pass" on writing any large project is to get it working and bug free first, then go back and try to squeeze more performance.

The more optimization tricks and special cases you add on the first pass of developer, the more testing and potential for bugs.
 
also you don need to optimise everything
optimized code are more harder to read and understand than non-optimized code.
so the best is to find the bottleneck ("hotspot") and only optimize this part of the code.
in general case, you spend 80% of the time in 20% of your code. This is the 80/20 rule (you'll also heard of the 90/10 rule). So why optimize the other 80% of code .. for 20% of the time.

so .. like democoder said you code clean the first time, then if you need more performance you profile your code .. find the hotspot and then optimize this part. You benchmark again .. if you need more performance you restart the profiling/optmization part until you are happy with the performance or you cant do nothing else.
 
To actually find an optimization it's a matter of logically going throug and catching every little thing that isn't needed. A really basic example would be that if you have to calculate a number for something, but each time the number comes up it's always the same answer it's a spot you could improve on. Or if you know objects behind a given wall aren't there you don't have to worry about rendering them for example. Or for the ansiotropic filtering for example if a wall is almost perpendicular to the angle you're viewing it at doing 16 different sets of calculations just for blending purposes is kind of pointless on something that ends up being 1pixel wide on the screen.

Also from a hardware standpoint it's helpful to give the GPU something to do and go back to the CPU and do some stuff there. So both are working at the same time instead of waiting on eachother.

These are some really basic examples but give you an idea what they're doing.
 
Few points to add:
-watch out for the overhead (function calls to function calls to function calls etc.), avoid if code still remains clean and generic (no hacks if future development is wanted)
-drivers run on CPU thus unrolling of loops is not the best optimization always due to processor cache (thight efficient code runs inside cache i.e. no cache hit)
-avoid repeating tasks which can be done only once (not sure if this applies with drivers though)
-generally write rather simple code as compilers tend to optimise simple code better (more choices when coder has not made the code too compilacated)

-most importantly when writing the driver from scratch (like ATI's OGL driver) is to design it carefully. It takes a long time but shortens the development time, makes it much easier to maintain and many of the big optimisations can be done at the design phase.
 
D3D, and OpenGl are state machines, so I would expect the majority of the optimisations to be the state change ordering (IE re-order changes to best suit the hardware), removal of unnecessary state changes, batching of commands, memcpy's etc and so on.
 
Thanks for the replies guys, I'm finally at the point where I understand what's going on (at least a little more...).

More answers are totally welcome and will be very interesting to me.

Thanks again guys!
 
Back
Top