Question about ATi/Nvidia support of 64/SIMD etc..

This is a Driver related Question.. but Its more of a serious overall performance minded Question than specifically about drivers.. so i am putting it in this forum.

I just Got a Laptop with an Athlon 64 and it has a host of special instructions it can support simultaniously.

3DNow! Technology:
Enhanced 3DNow! Technology:
3DNow! Professional Technology:
IA MMX Technology:
IA Streaming SIMD Extensions:
IA SSE 2:

On top of its 64bit capabilities.

I know that back in the day there was a push to get SEE and 3dnow support into drivers. But.. has that continued to today? Are Drivers really taking advantage of all these additional CPU capabilites available today? What are the types of benefits could be seen from fully utilizing these featues if any? Where would the proper and perhaps innovative use of the above bennefit modern 3d rendering?

Which brings me to my second topic Question.

Can Drivers be programed to directly address the CPU and not need a 64bit OS to get some additional speed/processing bennefits?

Any ATi or Nvidia guys out there.. or any Devs in the know.. I would appriciate your views on this.
 
i know than the driver cannot bypass the OS.

For SSE and 3dnow ... i remember than in old ati driver you could disable the optimisation with a register key ... so you can try to benchmark with and without it
 
Hellbinder said:
Can Drivers be programed to directly address the CPU and not need a 64bit OS to get some additional speed/processing bennefits?

"Directly access the CPU"? Um, the CPU is what runs the driver, so naturally the driver is "accessing" it. :)
 
It seems more accurate to say it the other way around--the driver "runs" the CPU. Or, better yet, it runs on the CPU. Even more acurately, it runs on the CPU via the OS.

I doubt MS will allow ATi/nV/world direct access to the CPU. That sounds like an invitation to crashes and virii.

Of course, I know next to nothing about OS or driver dev, so I'm ready to be proven hilariously wrong. :)
 
I'm no driver guy either, and I haven't read up on this subject in a long time so some of this may not be 100%, but there's this thing called rings in which the CPU can run. On x86 you have rings 0-3 IIRC, and normal applications run in ring 3 (limited permissions), while the OS is running in ring 0 (unlimited permissions). A process running in ring 3 trying to switch to 64bit execution mode would most likely cause an interrupt and control be switched over to the OS which would simply terminate the process. A driver can do a lot more than a regular app though. IIRC rings 1 and 2 are never used under Windows, so that would mean a driver runs in ring 0 (at least the kernel mode part of it). So I guess it would be possible for the driver to switch to 64bit mode. But if it would attempt anything such it better ensure there's no interrupts while it's using it and be sure to restore the state to whatever it was when it's done.
 
SSE and 3DNow driver support is one thing (and there isn't much that needs to be done in drivers wrt SSE/3DNow) ... software support is another, and it's much more important.
 
Humus, I dont see why a processor could not be set to queue interrupts whilst a temporary mode switch is in effect, perhaps even alert the 64 bit code to relinquish processor control and switch mode back before any delay is imposed on the interrupt being addressed.
 
But is it possible for a driver to fully exploit A64's "64-bitness" without a 64-bit-aware OS? I'm under the impression it's not possible. I've read about Window's rings, but the rings are still limited by the OS interms of what CPU features they can expose, no?
 
Since most applications are multithreaded, it would seem to me to be problematic for anything but the OS to use 64-bit extensions.
 
Dave B(TotalVR) said:
Humus, I dont see why a processor could not be set to queue interrupts whilst a temporary mode switch is in effect, perhaps even alert the 64 bit code to relinquish processor control and switch mode back before any delay is imposed on the interrupt being addressed.

I guess it could, but I'm not sure how useful that would be. And more importantly, I don't think x86 has support for it. There are instructions to enable and disable interrupts (STI/CLI), so any code running in ring 0 can guarantee uninterrupted execution if it wants to temporarily switch to 64bit mode.
 
Hrm, but that still seems like it would be problematic. For how long would you actually want to have a driver run uninterrupted? The best-case for a video driver would be that it be a thin layer between the software and the GPU. I wouldn't think you'd want to spend enough time to make the switch worth it.
 
I doubt that any driver that attempts to change the execution mode of the CPU would EVER get approval from Microsoft.
 
Chalnoth said:
Hrm, but that still seems like it would be problematic. For how long would you actually want to have a driver run uninterrupted? The best-case for a video driver would be that it be a thin layer between the software and the GPU. I wouldn't think you'd want to spend enough time to make the switch worth it.

Well, if you limit it to shorter than a regular thread timeslice I don't think there will be any problems. I don't know the length of a thread timeslice in windows, but I don't think it's extremely granular. You'd probably have enough time to do something useful. I don't know exactly what tasks 64bit mode would be beneficial for though.
 
Colourless said:
I doubt that any driver that attempts to change the execution mode of the CPU would EVER get approval from Microsoft.

But how would they know, except by disassembling the binary and analysing it?
 
Humus said:
Well, if you limit it to shorter than a regular thread timeslice I don't think there will be any problems. I don't know the length of a thread timeslice in windows, but I don't think it's extremely granular. You'd probably have enough time to do something useful. I don't know exactly what tasks 64bit mode would be beneficial for though.
Except there's probably a penalty for switching modes.

x86-64 should be good for some FP-intensive calculating tasks that require many registers for optimal operation (more registers than x86 offers). I don't know what in a driver would require this, though...
 
Humus said:
Colourless said:
I doubt that any driver that attempts to change the execution mode of the CPU would EVER get approval from Microsoft.
But how would they know, except by disassembling the binary and analysing it?
Run it on a virtual machine?
 
Switch to 64 bits mode in driver can be very expensive. You need to set up your own page tables and segments, while maintains the connection to the outer world (the API/programs accessing the driver). Using STI/CLI to disable interrupt will practically disable the context switch mechanism of the OS, and it can be problematic. Furthermore, even STI/CLI can not disable NMI (which is, of course, very rare and mostly generated by hardware error).
 
olivier said:
i know than the driver cannot bypass the OS.

For SSE and 3dnow ... i remember than in old ati driver you could disable the optimisation with a register key ... so you can try to benchmark with and without it
3DNow! and SSE are most beneficial for emulating fixed function T&L or vertex shaders.

This was a pretty big deal back when the cards didn't have hardware for that. Nowadays I don't think it will matter much. Besides, even if we're talking about cards that don't have vertex shader hardware, newer DirectX Graphics versions have their own software vertex processing built-in (which heavily uses 3DNow!/SSE I guess), and it may be sufficient to just rely on that.

I think the card that currently benefits the most from SIMD-Code in the driver is the Geforce 4MX. NVIDIA offers full emulation of ARB_vertex_program for all their cards (even on the TNT2) in OpenGL. The Gf4MX is unique however, because it supposedly has "half" a vertex shader.

MMX is very useful for texture processing, shifting and replicating bits, and mipmap generation. This is mostly OpenGL related, because of its flexible (read: complex to implement) pack/unpack functionality, and because there's no common runtime that handles it. The driver must implement these things itself.

Another thing that may be more universally useful is that SSE has instructions for prefetching and explicitly uncached writes (3DNow! only provides prefetching). These are very nice things for moving data around, and drivers do that all the time. It's also very easy to exploit. Just write a few specialized memcpy replacements, and you're done.

(btw the original Athlon [which lacks full SSE support] understands and executes these prefetch and uncached write "SSE" instructions; this was marketed under the "3DNow! enhanced" umbrella)


@Topic,
I don't think there's any useful way to escape from "Legacy mode" for driver code. It may be possible, but even if it is it's a tremendously complex endeavour, and there will be massive performance penalties for mode switching. Before you can execute any code in 64 Bit mode, you need to have the entire execution environment working correctly. You must be able to redirect all interrupts to their respective 32 Bit handlers. You must take care of segment descriptors and all that dirty stuff, because a 32 Bit OS just can't do that for you anymore.

If you want 64 Bit drivers, you'll most certainly need a 64 Bit OS.

If you want to find out if there's any way at all (I'm still not sure), I suggest you start here. This PDF looks promising.
 
Back
Top