when will GPUs run their own driver?

or am I smoking crack here? is this
1) possible
2) likely to gain performance
How would data get from the app to the GPU? The GPU would have to be self-programming for a lot of things we do in the driver, that would add a lot of complexity. Also, how would you fix bugs in the HW/driver/app/API?
 
As far as I know a driver is by definition a software layer that allows communication between applications and harware. And it provides the necessary abstractions to allow multiple processes to make use of it.

However, I do believe that many tasks which previously were handled by the driver are now done by the hardware itself. To handle thousands of draw calls and state changes at a high framerate while still leaving some CPU cycles to the game you need at least some pretty advanced command queuing and sequencing, or even a real programmable control unit on the GPU.

A bigger question is what will happen in the future with multi-core CPUs? Quad-core is quickly becoming mainstream and Nehalem even runs eight threads simultaneously. Even the most intensive game is going to leave some CPU time unused. Also with DirectX 10 the state changing bottleneck has been significantly reduced. On the other hand, drivers and/or GPUs now also have to be able to manage memory allocation for different tasks, virtualize memory, balance workloads, pre-empt threads, etc. So it has to handle some typical O.S. roles.

I guess it would be interesting to discuss how far this can be taken, without going completely overboard. Unless you really want to talk about running entire applications on the GPU... ;)
 
Are you suggesting some artificial intelligence in order to let hardware work with software, i.e. making the driver on the fly tailoring it for the system? Interesting topic btw :)
 
Take it wherever you like. I meant only what I said so if you can expound upon the idea or even take it in another direction, feel free.
 
A bigger question is what will happen in the future with multi-core CPUs? Quad-core is quickly becoming mainstream and Nehalem even runs eight threads simultaneously. Even the most intensive game is going to leave some CPU time unused.

One thing that will have to happen (say around DX11 time) is that DirectX gets command buffers on PC as well. Right now you can't really parallelize rendering tasks since rendering commands have to be called sequentially to the D3D Device. On the consoles you can distribute render tasks to different threads using command buffers. This is not so much of a problem right now, but once DX11 hits the streets we're likely to have 8-16 core CPUs. There's just no way in hell that we'd be able to take advantage of all that power unless we get command buffers on the PC.
 
Well I think a more interesting question is when will GPUs be used to design future GPUs? Maybe they already are but at the least it's coming.

http://www.hpcwire.com/hpc/2280791.html

Meh spice. Let me know when they can do something that is actually a problem in chip design nowadays.

And don't get me started on the outright misinformation contained in that pr fluff! 128/256/512 cores? yeah right man but thats nothing compared to my 8088 with 4K+ core (of course by cores I mean transistor but since we are going to redefine previously defined terminology to mean whatever we damn well please, who cares right?)

Aaron Spink
speaking for myself inc.
 
Given that Larrabee is a Von Neumann architecture, wouldn't that allow to run pretty much the whole driver on the GPU, leaving only some simple interfacing for the CPU side?

It seems to me that Larrabee could be upgraded to DirectX 11 and beyond just by installing a new driver/firmware, whereas current 'classic' Harvard architecture GPUs can't even make the jump from DirectX 10 to DirectX 10.1.

What are the chances that NVIDIA and ATI would transition from a 'Harvard GPU' to a 'Neumann GPU' in the not too distant future? Would they be willing to potentially sacrifice performance for ultimate flexibility? In my personal view they could actually gain performance thanks to the added flexibility...
 
GPUs already run their own drivers (firmware).

What you suggest (complete elimination of client-side drivers) can only happen when all vendors will agree to a single standart, allowing the OS diret and clear communication with the GPU.
 
Given that Larrabee is a Von Neumann architecture, wouldn't that allow to run pretty much the whole driver on the GPU, leaving only some simple interfacing for the CPU side?

It seems to me that Larrabee could be upgraded to DirectX 11 and beyond just by installing a new driver/firmware, whereas current 'classic' Harvard architecture GPUs can't even make the jump from DirectX 10 to DirectX 10.1.

What are the chances that NVIDIA and ATI would transition from a 'Harvard GPU' to a 'Neumann GPU' in the not too distant future? Would they be willing to potentially sacrifice performance for ultimate flexibility? In my personal view they could actually gain performance thanks to the added flexibility...

Larrabee, if it is anything like any x86 in the last 20 years, is a Harvard architecture once you move to the core.

As GPUs eventually interface with a single memory pool for both instructions and data, I need to hear more of your reasoning why Larrabee is different.

Internally, both Larrabee and the GPUs have more specialized storage that treats instructions and data separately.

In addition, I fail to see why a Harvard or Von Neumann architecture is somehow related to DX upgradability.
 
Given that Larrabee is a Von Neumann architecture, wouldn't that allow to run pretty much the whole driver on the GPU, leaving only some simple interfacing for the CPU side?
It might be possible to just send commands and all the related data straight to Larrabee, yes, but:
a) Is that really always more efficient? Sometimes you could detect that some data doesn't need to be sent (yet?) for example - if you literally ran the entire driver on the GPU, you couldn't do that. So a more moderate approach might be better.
b) That's one less thread (out of only four) to be hiding latency and sending instructions to the SIMD unit for every core that is running driver code at a given time. Of course, if 'driver' overhead is low enough, that doesn't really matter.

As 3dilettante said though, modern CPUs tend to defy some of the rules of a 'true' Von Neumann architecture. So how efficient this can be once again deends on some other implementation details too...
 
In addition, I fail to see why a Harvard or Von Neumann architecture is somehow related to DX upgradability.

I take his comment to mean something like this: being able to raster DX9-level graphics entirely in "software" is possible, even though the CPU core isn't graphics related at all. Sure, it's not fast, but you can "program" the DX level of that interface with new DLL's. Thus, the move to DX10 or DX11 would be a similar affair; performance would likely not be as good (depending on the situation) but the featureset would still exist -- simply by function of how "programmable" the underyling hardware is.

However, I don't see why you couldn't do this (to a certain extent) with any existing modern hardware. If you try hard enough, you could abstract it all out and make it work with lots of loops or whatnot; that doesn't mean it will perform well OR that anyone will spend the time to do it.

Most companies would likely prefer to sell you a new piece of hardware rather than expend the considerable development effort to write a new DX software abstraction layer :)
 
It might be possible to just send commands and all the related data straight to Larrabee, yes, but:

that still a driver...

Seriously its a dumb question. Unless you go to directly integrated subsystems targeting an open and standard ISA, you will always have drivers.

Aaron Spink
speaking for myself inc.
 
I take his comment to mean something like this: being able to raster DX9-level graphics entirely in "software" is possible, even though the CPU core isn't graphics related at all. Sure, it's not fast, but you can "program" the DX level of that interface with new DLL's. Thus, the move to DX10 or DX11 would be a similar affair; performance would likely not be as good (depending on the situation) but the featureset would still exist -- simply by function of how "programmable" the underyling hardware is.
That's irrelevant to whether an architecture is Von Neumann or Harvard. The only real distinction between the two is that the Harvard architecture makes a physical distinction between instruction and data memory. Both can be made to be fully programmable, so a Harvard architecture design could just as easily be upgraded.

The fact that CPUs, outside of extremely low-end embedded devices (even then), internally split their instruction and data memory shows that there is no problem here.

The only disadvantages the Harvard architecture core has are possible underutilization of part of its instruction or data memory, and high overhead when dealing with self-modifying code.

Both GPUs and x86 go to an external memory pool that is combined instruction/data storage, so they are externally Von Neumann.
Both GPUs and x86 have differentiated internal memories that separate instructions from data.
GPUs could be, possibly with extreme effort, made to write to memory space that is read in as instructions.
x86 can do the same thing, but like all other Harvard architectures, the performance price is steep.

The question whether this will even come up is if the system context Larrabee operates under will even allow it to write to an instruction page.
In a GPU product, Intel may very well restrict Larrabee from trying it.
 
The relevant bit is that with a Von Neumann architecture you can let the same processor generate code and execute it.

The importance of that is that you can do extra optimizations at run-time. For example, say you have statically compiled vertex and pixel shaders that are practically optimal on their own. The vertex shaders output A and B, and the pixel shaders takes B and C as input (using defaults for missing components). Obviously you can further specialize the vertex shader by only computing B, and the pixel shader can propage constants and eliminate most operations related to C.

You can do all that in the driver, but while doing the optimization the GPU might (partially) run out of work. Recompilation ideally has to be done for every new combination of shaders, input streams/buffers, output streams/buffers, etc. And with future GPUs being tasked with more than just graphics I don't see it getting easier to do it all in the driver. As both CPUs and GPUs get independently working cores, the communication bottleneck between them only gets worse.

Enter Von Neumann. By letting each GPU core handle its own recompilations you get all the benefits of run-time specialization.

You'd also get a higher level of abstraction, without sacrificing a lot of performance. Current DirectX 10 cards could probably support DirectX 11 by putting a lot of effort in driver abstractions or using CUDA/CTM. But it's not going to be efficient. With cores that can generate their own code you can handle any future specification without relying entirely on the driver.

Intel would be utterly foolish not to use that potential of Larrabee. There's no reason I can think of to have it based on x86 other than the possibility to do dynamic code generation.
 
Back
Top