What I want from next generation GPU & API

Rodéric · Nov 11, 2012

Here's a list of what I want and why I want it, I'm curious about other programmers/designers thought on the topic.

Slim API with standard command stream API.
Software:
Have a standard state block(s) that I manage by myself, let me send the whole state at each draw call and handle them.
Drivers:
Either much simplified or moved into hardware. (see below)
Hardware:
If it's standard you might as well have dedicated hardware to translate them into your GPU specific states.

Standard texture formats
Software:
No more translation from one format to another that takes time, just let me set my textures in the final format and be done with it. (Already true for S3TC I believe).
(Also a pre-requisite for the next point.)
Drivers:
Simplified.
Hardware:
Support for those standard texture layouts.

MMU(like) access/control
Software:
Let me setup the mapping between physical and logical memory, so I can use the whole memory as a nice cache and use whichever algorithm I want.
No more memory management nightmares, I can map virtual contiguous memory to physical pages and just provide a header for the buffers.
Drivers:
Simplified again.
Hardware:
Real access to the MMU might be a little tricky for system security and the like, but there might be a way to do something similarly powerful.
If the hardware could register page faults with some data, I could load what was missing in a usual just-too-late timeframe

That's a bit rough, but I think it's pretty much making GPU more akin to CPU when it comes to access and control.
The MMU part is critical to simplify memory management, handling memory as a cache makes things a lot simpler for streaming data and Virtual Memory avoid fragmentation issues.
(Except internal fragmentation if the page size are too big, I'm rather for 4KiB pages, but I think I could live with 8/16KiB ones.)

Dominik D · Nov 11, 2012

Here's why I think it's not going to work.

Slim API with standard command stream API.
Software: I'm absolutely positive that various HW architectures manage state differently and it's done so for a reason. There will be states superfluous from the POV of your API that exist to aid back-compat (Dx, OGL).
Hardware: No way in hell for standard HW - too many patents and too much investment done in various branches of the technology to unify (now or ever).

Standard texture formats
Software: Translation already happens rarely. And I'm quite sure that for the basic, required formats supported by OGL/DX virtually every piece of hardware manages it w/o any translations. This should be true for both uncompressed and compressed formats. Also: recent APIs cut down on the number of formats and their variations supported AFAIK.
Drivers: Hard to simplify something that's already fairly simple.

There are also many things that have to be taken into account that are not. Ability for ThisNewStuff to work side-by-side with DX/OGL applications is a must. Preemption is a must since most (all?) modern operating systems use acceleration for composition. Interop with accelerated 2D, video and compute. I mean: there really isn't much happening in drivers today except for translating generic calls to the HW-specific ones... and as I said this happens for a reason. I can't imagine nVidia going all tile-based deferred or us going with traditional IMR. This just won't happen: too much monies invested in this or that branch of technology, too much expertise in specific implementations, etc. This are, like, gazzilion threads on this forum with "unified hardware and slim API" slogan on them but I simply cannot see how this would work from the business and technology standpoint.

Davros · Nov 11, 2012

Hard to simplify something that's already fairly simple.

current amd beta'a are 224mb - bloody hell...

my wishes are fairly simple
1: work
2: dont break backwards compatability

If I could wish for something out of the ordinary it would be this :
since gpu's now support audio it would be great if they supported some really high end audio processing.

ERP · Nov 11, 2012

The issue with the Slim API/driver is two fold, one the front end of NVidia and ATI GPU's is actually surprisingly different, you have to abstract that somehow. The second is on a PC the driver has to multiplex access to the GPU between applications.

On the texture front the primary issue is swizzling, last I checked ATI and NVidia used different methods, though internally both allow for unswizzled formats, there is a dramatic performance cost because of the reduction in coherency for the unswizzled formats. Again it's an abstraction.

I believe some APU's share the MMU between both the CPU and GPU, but 4K pages aren't practical with multi gigabyte memory sizes, they cause too much of a performance issue.

You get access to all of this on the current consoles, but it's because you have fixed hardware.

Dominik D · Nov 11, 2012

Davros said:
current amd beta'a are 224mb - bloody hell...

That was specifically about texture conversion in the driver.

since gpu's now support audio it would be great if they supported some really high end audio processing.

It'd be cool to accelerate XAudio with GPU. :]

Rodéric · Nov 11, 2012

My first point was to streamline to driver to almost a noop, not quite so, but instead of tracking states, just let me specify the whole set for each draw call, and write a super fast routine (or even hardware) that translate that state vector into your proprietary format.

Drivers would then mostly be that function and the compiler for HLSL/GLSL to the GPU binary.

Dominik D · Nov 11, 2012

But you can't do that. Some things cannot be "translated", they have to be worked around. Let's say that your HW doesn't support point sprites. You can't just translate state "draw triangle with 8px point sprites" (which essentially means draw 3 point sprites with centers at the triangle vertices) because there's no state to translate to.

Davros · Nov 11, 2012

Dominik D said:
It'd be cool to accelerate XAudio with GPU. :]

No, no, no
Ds3d, xaudio/xinput ect are the spawn of Beelzebub

Dominik D said:
But you can't do that. Some things cannot be "translated", they have to be worked around. Let's say that your HW doesn't support point sprites. You can't just translate state "draw triangle with 8px point sprites" (which essentially means draw 3 point sprites with centers at the triangle vertices) because there's no state to translate to.

If its part of the api, the hardware should be able to do it otherwise it wouldnt be able to call itself dx

compatable a dx11 card should be able to do all of dx11 (didnt dx10 do away with caps bits)
as for the things that can be directly translated would it be a good idea for an ihv's to say to devs heres a list of commands the driver sends to the card along with their corresponding api calls so your game can issue commands straight to the driver and the driver will send them on to the gpu bypassing all the api to internal command translation ???

Dominik D · Nov 12, 2012

Ds3d? If this was supposed to be D3D or something like that: consider hell my home then.

What you're saying about whatever-compatible is not true at all. DX Reference Rasterizer is DX9/10/11 compliant and hardware it's running on is CPU which for example does not natively support point sprites. Yet it renders those. As long as your output is valid, it doesn't matter if you have dedicated silicon doing stuff or you render it with programmable pipeline, hacks, tricks and what not. My bet is that everyone does something that way - there's only that much silicon you can use and implementing every corner case is a waste. At the end of the day: you're running driver which deals with these corner cases. DX-compatible means you pass hardware qualification tests.

And no, it wouldn't be a good idea to let devs build HW-specific command buffers. We've been there with competing APIs which is basically what (some) people are arguing here for (except that this time it's not about functions one calls but about data one pushes through these functions). Different states for different architectures means writing code/data multiple times. And on top of that you'd have to deal with changes between different generations of HW from given vendor, different core revisions. Seriously, there's a reason this stuff is hidden in the driver. :S

ERP · Nov 12, 2012

Davros said:
If its part of the api, the hardware should be able to do it otherwise it wouldnt be able to call itself dx compatable a dx11 card should be able to do all of dx11 (didnt dx10 do away with caps bits)
as for the things that can be directly translated would it be a good idea for an ihv's to say to devs heres a list of commands the driver sends to the card along with their corresponding api calls so your game can issue commands straight to the driver and the driver will send them on to the gpu bypassing all the api to internal command translation ???

The problem isn't even at the feature set level, some parts of the hardware bear little to no resemblance to the API.
For example on ATI hardware, there is an additional hidden shader that exists before the vertex shader, that loads vertex data from memory.
Constant loading for shaders also doesn't look at all like the API.

Davros · Nov 12, 2012

Ds3d = direct sound 3d

Rodéric · Nov 12, 2012

Dominik D said:
But you can't do that. Some things cannot be "translated", they have to be worked around. Let's say that your HW doesn't support point sprites. You can't just translate state "draw triangle with 8px point sprites" (which essentially means draw 3 point sprites with centers at the triangle vertices) because there's no state to translate to.

So basically you are saying that we should get rid of legacy options in the API, that just goes fine with the idea of a slim API.
We could very well have feature level with guaranteed hardware features.

Dominik D · Nov 12, 2012

Not really. Super-legacy stuff (e.g. palletized textures) has probably been removed from HW a loooong time ago (e.g. Windows doesn't really support palettes on Win8/Win8 RT as far as I can see). But a lot of obscure features are actually useful. If your HW supports them (because they may happen to be there "for free" due to the way your HW works) certain scenarios will work better than on the HW w/o support for this or that feature. At the end of the day if something can be accelerated and it's part of the driver: it's most likely that it's pretty well tuned and every developer can benefit from that.

On top of that pretty much every single thing that's out there has some reason for existence. You won't use it all the time but there are perfectly valid scenarios. For example point based sprites give Halo 4 loading screen great looks. Have they not existed, you'd have to implement billboarding plain quads in your engine. And then if you build software for multiple platforms, you'd have to tune your code for each of those. What you have today on the other hand is point rendering support via the unified API and HW vendors have to make sure that it works as expected and performs great. There's a net gain (less manpower spent) if more than one person is using given feature if it's implemented in the driver and not pushed onto the developer.

What, I think, developers would really benefit from is more knowledge about the driver/HW internals and better tools for profiling and fine-tuning. I seriously see no benefit in pushing complexity from the driver to the game code. HW is incredibly complex and expecting it to be documented and open leads to something that's more useless than not.

Davros · Nov 12, 2012

Point Sprites from 3Dmark 2001

milk · Nov 12, 2012

Is there any chance of we seeing some robust and standardised support for efficient multi-fragment rasterisation and rendering in the near future? Like HW support for A-buffers, k-buffers, or some variant of those things? I know there are hacks to get somewhat there, but so was tesselation doable through geometry shaders and look how far it got that way...

Rodéric · Nov 13, 2012

Dominik D said:
On top of that pretty much every single thing that's out there has some reason for existence. You won't use it all the time but there are perfectly valid scenarios. For example point based sprites give Halo 4 loading screen great looks. Have they not existed, you'd have to implement billboarding plain quads in your engine.

That also is legacy, that's been handled by Geometry Shaders since D3D10.
(That's 6 years ago for the hardware & API.)

Dominik D · Nov 13, 2012

You're missing the point of point sprites being just an example out of the top of my head. There may also be a valid reason why given HW would still support native point sprites (mobile scenarios for example - there must be a reason why Windows Mobile is 9.0c only AFAIR).

3dcgi · Nov 14, 2012

Rodéric said:
That also is legacy, that's been handled by Geometry Shaders since D3D10.
(That's 6 years ago for the hardware & API.)

They should bring back the old point sprites since this was IMO only done to ensure something used geometry shaders.

AlexV · Nov 14, 2012

3dcgi said:
They should bring back the old point sprites since this was IMO only done to ensure something used geometry shaders.

In all fairness geometry shaders are extremely interesting (and sadly doomed to eternal underutilization). They are definitely a more flexible way of handling geometry generation / destruction than what we get via HS + TS + DS. We should thank one of the IHVs for ensuring that it was anything but still-born.

Alexko · Nov 14, 2012

Why are they doomed to eternal underutilization?

What I want from next generation GPU & API

Rodéric

a.k.a. Ingenu

Dominik D

Davros

ERP

Dominik D

Rodéric

a.k.a. Ingenu

Dominik D

Davros

Dominik D

ERP

Davros

Rodéric

a.k.a. Ingenu

Dominik D

Davros

milk

Like Verified

Rodéric

a.k.a. Ingenu

Dominik D

3dcgi

AlexV

Heteroscedasticitate

Alexko

Similar threads