It's Dead Jim - a debate about the future of the graphics API

B3D News · Dec 7, 2011

Will our old friend the graphics API die? Will it morph into something more tightly fitting around the metal hidden under the respectable coolers of modern GPUs? We sail forth seeking to find out!

Read the full news item

Davros · Dec 7, 2011

Im not surprised no one from nv was championing ctm as it gives insight to the hardware and nv are very guarded while they have a linux driver its totally closed source unlike intel/amd who publish register specs ect

fun fact: amd are porting their open source linux driver (not the feature complete windows driver) to windows embedded compact 7
http://lists.freedesktop.org/archives/dri-devel/2011-October/015244.html

ps: Top marks for that article

Nick · Dec 8, 2011

B3D News said:
Will our old friend the graphics API die? Will it morph into something more tightly fitting around the metal hidden under the respectable coolers of modern GPUs?

The APIs have already evolved into something much closer to the hardware architecture, shifting much of the complexity onto the application's graphics engine.

The result is also that relatively speaking there are ever fewer graphics engines. Not so very long ago every game had its own engine, but nowadays we're seeing the same engines being used for a wide range of applications. And these engines themselves are composed of various abstraction layers and components for specific effects.

I expect these trends to continue since application developers want ever more flexibility (to be able to sell more diverse software). The APIs we know today are turning into generic programming languages, and all of the real functionality becomes part of software libraries. The Application Programming Interface really depends on what combination of software you decide to use. The sky is the limit.

Unfortunately the industry is at a bit of an impasse since the GPU manufacturers don't want to expose their entire architecture. But the solution might come from a different direction: CPUs have always exposed their full ISA. Software renderers can implement all existing APIs for backward compatibility, and with the IGP-less version of Haswell expected to deliver 1 TFLOPS there should be plenty of processing power to start creating unique, high-performance software. If nothing else these powerful CPUs will force GPU manufacturers to rethink their stance on offering lower level access to the hardware.

And it's not just about getting access to the hardware, it's about getting rid of limitations in functionality too. It is extremely annoying to have to buy new hardware every few years just to get support for the latest graphics features. This is why lots of people buy the latest and greatest CPU that will last many years, and a mediocre GPU they intend on upgrading. It would be more satisfying for consumers to be able to buy a generic high performance GPU that lasts many years (without any loss of revenue for the IHVs).

Sooner or later we'll be able to download Shader Model 6 support.

hoho · Dec 8, 2011

I could be wrong here but isn't the biggest bottleneck not the API itself but the way OS takes your userland GPU calls and translates them to kernel-level that is the real problem?

DeanoC · Dec 8, 2011

Nick said:
And it's not just about getting access to the hardware, it's about getting rid of limitations in functionality too. It is extremely annoying to have to buy new hardware every few years just to get support for the latest graphics features. This is why lots of people buy the latest and greatest CPU that will last many years, and a mediocre GPU they intend on upgrading. It would be more satisfying for consumers to be able to buy a generic high performance GPU that lasts many years (without any loss of revenue for the IHVs).

Sooner or later we'll be able to download Shader Model 6 support.

When was the last time a CPU extensions came with backwards compat? i'm thinking the software float exception handler (yes I'm an old git). Try using a AES instruction on an older x86 CPU and watch it go pop, its no different for GPU.
Another really good example is popcnt, both GPU and CPU have only recently got the instruction, so a software fallback is still required.

Its not that current GPUs couldn't (in many cases) emulate a new SM standard, its just it would suck. There is no reason that Intel/AMD couldn't provide an emulator for every new ISA extension, but it would suck so they don't.

Its less visible on CPUs, because no one actually uses the new ISAs exclusively for decades so you rarely notice the line where x86 code stops working on your older x86 CPU.

hoom · Dec 8, 2011

It seems to me that some kind of superset/subset of OpenGL ES & the acceleratable parts of HTML 5+ is the future main API.

As far as I see the mainstream future of GPUs seems to be all about ARM SoCs running Android, iOS or Cloud based VMs on server farms.

I think these will probably remain using subsets of the increasingly General but more marginalised hardware/APIs used on PCs etc. to ensure compatibility & performance/watt.

Nick · Dec 8, 2011

DeanoC said:
Try using a AES instruction on an older x86 CPU and watch it go pop, its no different for GPU.

It is very different. On a CPU you can easily implement AES functionality without AES-NI instructions. On a GPU however you can't implement Shader Model 4 on Shader Model 3 hardware (without making heavy use of the CPU).

Current GPUs are too focussed on implementing a restricted set of APIs. Users can't implement any new APIs. The CPU on the other hand doesn't have such limitations.

Its not that current GPUs couldn't (in many cases) emulate a new SM standard, its just it would suck. There is no reason that Intel/AMD couldn't provide an emulator for every new ISA extension, but it would suck so they don't.

Its less visible on CPUs, because no one actually uses the new ISAs exclusively for decades so you rarely notice the line where x86 code stops working on your older x86 CPU.

No, it's less visible on CPUs because software has perfectly viable fallbacks for most instructions. High performance software renderers use dynamic code generation to ensure they use the available ISA extensions optimally and provide compatibility. This is still completely out of the question for GPUs.

For GPUs to become capable of supporting any API, a more generic architecture capable of both high ILP and high DLP is required. It looks like CPUs are actually much closer to achieving that. Haswell will be pretty much Larrabee in a socket.

And while all of this comes at a performance cost for legacy workloads, there's no denying that GPUs have sacrificed ever more performance for the sake of flexibility as well. If that trend continues, and I bet it will, things are inevitably converging towards a homogeneous architecture.

Nick · Dec 8, 2011

hoom said:
It seems to me that some kind of superset/subset of OpenGL ES & the acceleratable parts of HTML 5+ is the future main API.

As far as I see the mainstream future of GPUs seems to be all about ARM SoCs running Android, iOS or Cloud based VMs on server farms.

That might be true for the short-term future of handhelds, but that doesn't mean it's shaping the mainstream future of GPUs. Next-gen consoles for instance won't be limited to OpenGL ES.

I think these will probably remain using subsets of the increasingly General but more marginalised hardware/APIs used on PCs etc. to ensure compatibility & performance/watt.

Actually mobile hardware is also becoming more programmable at an insane pace. Future chips will support OpenGL ES 3.0, Direct3D 10+, and OpenCL. And that's achieved by having highly generic architectures, very similar to simple CPU cores with wide SIMD units. So it's only a matter of time before the actual hardware will be exposed to developers so they can create custom APIs.

Being restricted by power consumption doesn't mean you have to give up functionality. And the legacy APIs are wasting valuable power so getting rid of them sooner rather than later would maximize performance/Watt while creating greater flexibility.

DeanoC · Dec 8, 2011

Nick said:
It is very different. On a CPU you can easily implement AES functionality without AES-NI instructions. On a GPU however you can't implement Shader Model 4 on Shader Model 3 hardware (without making heavy use of the CPU).

As soon as we got general bit ops, we can emulate most operations, just like a CPU. Its not true for the early SMs but thats because they weren't general purpose, but now they are.
Still a bad idea but doable. If you fancy a laugh, porting a byte code interpreter for a GPU is easily doable now, just don't ask about the speeds. I think even a onGPU JIT compiler is possible in CUDA (though not sure on the ability to jump and execute to arbitrary locations)

Nick said:
No, it's less visible on CPUs because software has perfectly viable fallbacks for most instructions. High performance software renderers use dynamic code generation to ensure they use the available ISA extensions optimally and provide compatibility. This is still completely out of the question for GPUs.

For GPUs to become capable of supporting any API, a more generic architecture capable of both high ILP and high DLP is required. It looks like CPUs are actually much closer to achieving that. Haswell will be pretty much Larrabee in a socket.

And while all of this comes at a performance cost for legacy workloads, there's no denying that GPUs have sacrificed ever more performance for the sake of flexibility as well. If that trend continues, and I bet it will, things are inevitably converging towards a homogeneous architecture.

Dynamic recompilation works just as well on GPU as CPUs, why do you think its not already in use? All GPU code is JITed before submission, how aggressively is simply down the driver team.

OpenCL/CUDA/Dx11 (.1) all have made GPU access easy, the hard part is making it worth while. CPUs are awesome at lots of code, GPU aren't, but claiming that CPUs are somehow special is wrong.

OpenCL 1.2 explicitly allows vendor functions, so presumable fixed function objects and API for the other bits of the GPU could start to appear. I would love to have a DrawTris() call running inside a GPU kernel.

Of course OpenCL is even supporting crazy stuff like kernels for FPGA HW. So for all I know (not being a HW guy) in 10 years time, we will be arguing about things that look nothing like what we have now.

hoho · Dec 9, 2011

Nick said:
It is very different. On a CPU you can easily implement AES functionality without AES-NI instructions. On a GPU however you can't implement Shader Model 4 on Shader Model 3 hardware (without making heavy use of the CPU).

Wasn't non-accelerated cryptographic functions some 1000 times slower on CPU? I'd expect similar kind of slowdown when you try to add a "software" fallback on GPUs for non-supported features.

Though obviously there will be things that are far from trivial to emulate like adding geometry shader support to SM3 GPU. Also generally cryptography is just an insignificant portion of the overall load on a CPU and thus having it run 1000x slower won't really matter that much in the end but if you slow down a part of the GPU pipeline by 1000x it becomes unusable.

Nick · Dec 10, 2011

DeanoC said:
Still a bad idea but doable.

Which shows there's still some distance to go to be able to implement any functionality, requiring a less restrictive architecture.

But I'd like to refer to Andrew Lauritzen's answer to question 12 in the article. Just like differences in performance for branching have pretty much leveled out now, GPUs are destined to become even more tolerant of wider ranges of workloads. Branchy code, code that uses lots of variables, code with irregular memory accesses, code with small data types, code with task dependencies, sequential code, etc.

Again we already know where this will lead us to: a cross between today's GPU and CPU architectures. In other words a homogeneous architecture capable of exploiting ILP, DLP, and TLP. Programming such a thing will not be free of APIs, but looking at the thriving software ecosystem of the CPU some of these APIs will be written by independent software developers and not the hardware vendors.

Dynamic recompilation works just as well on GPU as CPUs, why do you think its not already in use? All GPU code is JITed before submission, how aggressively is simply down the driver team.

It's still fundamentally different. JIT compilation on the GPU is essentially used to provide backward compatibility. The hardware is designed to support the latest API and all the previous ones, but nothing more. JIT compilation on the CPU on the other hand can provide a great level of forward compatibility as well.

CPUs are awesome at lots of code, GPU aren't, but claiming that CPUs are somehow special is wrong.

Why would that be wrong? The GPU is actually relying on the CPU for compilation and such. It would be worthless without it. So clearly the CPU is special.

So for all I know (not being a HW guy) in 10 years time, we will be arguing about things that look nothing like what we have now.

I'd be very disappointed if in 10 years from now GPUs weren't as versatile as today's CPUs.

Of course we might no longer call it a Graphics Processing Unit then, but just a Processing Unit. And if it sits in the middle of the motherboard, we'll just call it a Cental Processing Unit. ;-)

Nick · Dec 10, 2011

hoho said:
Wasn't non-accelerated cryptographic functions some 1000 times slower on CPU?

Certainly not. Intel claims only 3x-10x for AES-NI, and in practice it's more like 2x.

Also note that AVX2 and BMI may actually make AES-NI much less relevant. And hardware support for Rijndael is probably worthless when other ciphers are requested.

I'd expect similar kind of slowdown when you try to add a "software" fallback on GPUs for non-supported features.

On a GPU forward compatibility is indeed still a pipe dream, but not on a CPU. If you take the chip size into account (without IGP), there is nowhere near a three orders of magnitude difference between software and hardware rendering. More like one order of magnitude or less (and again it will converge closer with AVX2 since gather support is the most blatant difference between a CPU and GPU today, and FMA won't hurt either). Compared to how helpless a GPU is without a CPU, and how utterly incapable it is of supporting new workloads, I find that quite extraordinary and a clear indication of where things are heading.

So you certainly shouldn't consider forward compatibility to be similar between GPUs and CPUs. Whatever they are converging into, CPU architectures are already much closer to it.

Arwin · Dec 10, 2011

AL : I am not a game developer so I can’t speak to that with any authority, but I think it’s a bit more complicated than just “it costs more to develop”. It’s not unreasonable to expect a low-level interface to enable playable experiences on some piece of hardware that wouldn’t normally provide it through a portable API, thus increasing the potential market. Furthermore I’m not certain that targeting a new graphics API is necessarily a huge cost, assuming that the game is already a multi-platform title. A large fraction of the cost in game development is the asset pipeline, and a low-level graphics API would not necessarily require any changes to that.

This is definitely a big one imho.

hoho · Dec 11, 2011

Nick said:
Certainly not. Intel claims only 3x-10x for AES-NI, and in practice it's more like 2x.

Could have been. I just went by the graphs I remembered for some random ARM chips. Things could be radically different on x86.

It's Dead Jim - a debate about the future of the graphics API

B3D News

Beyond3D News

Davros

Nick

hoho

DeanoC

Trust me, I'm a renderer person!

hoom

Nick

Nick

DeanoC

Trust me, I'm a renderer person!

hoho

Nick

Nick

Arwin

Now Officially a Top 10 Poster

hoho

Similar threads