Basic API Questions?

invictis · Apr 20, 2020

So how much work and expertise goes into making an API?
If we are to compare MS with DX12 Ultimate with Sony's PlayStation APIs, how much difference would there be between them?
With the advent of DX12 Ultimate it has brought in alot of new features such as VRS, Ray Tracing and Machine Learning amongst others. These additions are new to this next generation of consoles. MS can and has rolled their new API already for not only RDNA2 but also Nvidia, so there is a fair bit of trial and error gone into it's development.
If the PS5 has all three additions as stated above, how easy would it be for Sony to add those features to their own API and have them to be as good as MSs?
I don't have alot of knowedge on APIs, so I am wondering if this could be a differential between next gen consoles?

JoeJ · Apr 20, 2020

invictis said:
With the advent of DX12 Ultimate it has brought in alot of new features such as VRS, Ray Tracing and Machine Learning amongst others. These additions are new to this next generation of consoles. MS can and has rolled their new API already for not only RDNA2 but also Nvidia,

All new Turing features have been available to Vulkan since quite some time. (I don't know how long or DX, using developer previews or whatever they have.) It's NVs job to make the necessary extensions. There was also an Optix RT API from NV long before DXR and RTX, which is not that different.
So i assume the bulk work of API design is done (indirectly) by IHVs already. Take Mantle -> DX12 / Vulkan as another example, with all those APIs being very similar to Mantle.
MS / Khronos work es then to make sure their APIs are about common feature sets for all IHVs, but i don't see much innovation or development necessary from their side. It's a lot of work, but there is no big unsolved problem they'd need to research first.

invictis said:
If the PS5 has all three additions as stated above, how easy would it be for Sony to add those features to their own API and have them to be as good as MSs?

Unlike MS, Sony does not have to restrict their API design or exposed features to a common base over multiple HW from three different vendors. They only have to care for a single GPU model.
So probably it is easier for Sony, and the resulting API is probably also better, meaning it allows more features and higher efficiency.
Ofc. MS can also add console exclusive features to DX12U to compensate. Usually they do, but i don't know if it's different this time.

In any case Sony has a potential advantage here, but they could decide to avoid going too close to the metal this time to have it easier with BC for PS6.
With Sonys APIs being hidden under NDAs, it will remain hard to discuss in a public forum.

JoeJ · Apr 20, 2020

About ML, Sony might not need to design an API.
AFAIK, DirectML is basically a library of algorithms commonly used. I don't know anything about ML, but guess this is similar approach than linear algebra packages like BLAS implemented by multiple IHVs optimized to their hardware.
So Direct ML offers an interface, and IHVs give optimized implementations. That's the primary idea.
Sony can do the same, probably they will, and AMD gives them the same implementation it gives to MS. But they could leave it also entirely to the devs to do this, since there is no need to support multiple IHVs.
Again there is little research necessary, because we already know which algorithms are ML building blocks and need to be supported.

(Ofc. things are never simple in practice, but APIs are just that.)

invictis · Apr 20, 2020

Thanks for the reply.
So Sony could well end up with the more efficient API then? I had heard that Sony's PS4 API allowed far closer to the metal than Direct X did for Xbox. But then you hear that the main APIs like Direct X and Vulkan are pretty close to the metal nowadays compared to what they were?
So if a developer is going to release their game on say PS5, XSX and PC, how would that effect their API choice for PC? Would They by default use DX because they will have to on the Xbox, or is it no big deal to develop the game with three different APIs (PS API, DX on Xbox and Vulkan on PC)?

JoeJ · Apr 20, 2020

invictis said:
So Sony could well end up with the more efficient API then? I had heard that Sony's PS4 API allowed far closer to the metal than Direct X did for Xbox. But then you hear that the main APIs like Direct X and Vulkan are pretty close to the metal nowadays compared to what they were?

Yeah, that's what we often here, so my guess Sonys API is potentially more efficient. It's just logical, but i have never used or seen GNM API, and those that do can't talk.

But i noticed issues in DX/VK where it is not close to the metal enough, and i have serious performance losses from the missing features.
In my case it's Mantles feature to have conditional command buffer execution, which allows to avoid unnecessary work. Using VK, many of my prerecorded commands do no work but they cause overhead mainly from unnecessary cache flushes although nothing has been changed.
To avoid the flushes, i could generate command buffers per frame on CPU, but this requires read backs from GPU to CPU and that's much slower than just doing the pointless flushes each frame.
The mantle feature also would make it easy to repeat certain tasks until error is small enough, or until time spent on current frame comes close to the allowed maximum for constant fps.

Other typical low level functionality that lack full support on PC are things like subgroup instructions, MSAA sampling positions, occupancy limits.

For next gen, probably the hottest topic would be traversal shaders. AMDs TMU patent implies they are possible with their RT solution, but DX12U has no support.

Another AMD patent about low latency extra CUs, usuable e.g. for audio processing, simply requires a custom API because it's custom HW.
Sony can do such things without even thinking about crossplatform restrictions.

invictis said:
So if a developer is going to release their game on say PS5, XSX and PC, how would that effect their API choice for PC? Would They by default use DX because they will have to on the Xbox, or is it no big deal to develop the game with three different APIs (PS API, DX on Xbox and Vulkan on PC)?

It is no big deal to support multiple APIs. DX12 will increase over DX11 with next gen, but VK will also stay because of Google and Nintendo. There is no big difference between those APIs anyways. Maintaining multiple APIs means some extra cost, but it's nothing compared to the transition from DX11 engine to DX12.

iroboto · Apr 20, 2020

JoeJ said:
About ML, Sony might not need to design an API.

You wouldn't do data science on a nvidia card without CUDA

Well they did, and it was painful before the introduction of compute shaders. CUDA was made for General Purpose computing was made and over time without much competition became #1 (the library continues to improve)
In the same way, to do ML through compute shaders will be super painful for some deep learning operations. Sony must provide some functionality for developers that are specific to typical ML functions ]but are not present in graphics APIs.

DavidGraham · Apr 20, 2020

invictis said:
But then you hear that the main APIs like Direct X and Vulkan are pretty close to the metal nowadays compared to what they were?

It's dangerous for APIs to be so close to the metal, specifically in the PC space, where AMD/NVIDIA change architectures all the time, metal APIs will easy -depending on the circumstances- break compatibility with early or future GPU generations, so APIs tend to be general purpose enough to avoid this serious problem. For example Mantel games broke compatibility with future GCN GPUs (you can't run them using Mantel).

JoeJ · Apr 20, 2020

DavidGraham said:
For example Mantel games broke compatibility with future GCN GPUs (you can't run them using Mantel).

Probably they could if AMD would still maintain Mantle, but it's discontinued and no longer needed.

DavidGraham said:
It's dangerous for APIs to be so close to the metal, specifically in the PC space, where AMD/NVIDIA change architectures all the time

It's difficult to draw the line between 'too close to the metal' and 'too much compromise to support all vendors and older GPUs'. (Or between 'too cumbersome to work with' and 'too restricted'.)

Now, with BC becoming more important in console space, the same problems apply there too.
It's important to preserve games for the future. It's also important to expose all power and flexibility there is, especially with something new like RT.

To me, the simplest solution for consoles would be still close to the metal, as trial to have abstraction over everything including past and future adds its own complexity and is doomed to fail for the same reason: HW changes too much over time. DX12 / VK won't stay forever either.
But if people want to play all prev gen games that's no longer possible. I wonder if it would be easier / cheaper to port the most important games to next gen than being constrained with HW BC.

On the other hand i would not wonder if with next next gen, and DX25 things had become to a rest, and no further 'revolutions' are to expect. Everything fine and done.
Probably we'll discuss more about deprecation than about new features then

DavidGraham · Apr 20, 2020

JoeJ said:
Probably they could if AMD would still maintain Mantle, but it's discontinued and no longer needed.

This happened before the "official deprecation" of Mantle, Fury cards were not able to run Mantle well, as opposed to the HD 290/280/270/260 line.

JoeJ said:
It's important to preserve games for the future. It's also important to expose all power and flexibility there is, especially with something new like RT.

When it comes to PC, I would argue the opposite actually. How much more performance will you be able to extract from a complete to the Metal API? 10% more? 20% more? this is a cheap price to pay to be able to play all the past and future games without the need to swap hardware/software.

JoeJ · Apr 20, 2020

DavidGraham said:
How much more performance will you be able to extract from a complete to the Metal API? 10% more? 20% more? this is a cheap price to pay

To me the speed up from OpenGL / OpenCL was close to 200% on the compute side (for both NV and AMD). That's enough to convince myself, although i don't enjoy working with low level where everything is so tedious.
But i could get this easier, with a low level API that adds command lists to avoid 'draw call for each compute dispatch'. This and exposing multiple GPU queues + multi threaded CPU context would probably give most benefits, and resource transitions and barriers could be automatic and easy.
I think the sweet spot would be somewhere in the middle, but not sure. Somebody who optimizes rendering may be very happy about resource transition control and getting a speed up from there.

If we ever get an API that gives us both: Easy to get going, with automatic handling of everything we do not specify explicitly, then i would be impressed about API design and development.
On the other hand, with game engines becoming more the work of fewer experts, there seems no more need for an easy to use API. I assume that's why DX11 will indeed die out - if we like it or not.

DavidGraham said:
When it comes to PC, I would argue the opposite actually.

I do not disagree in general, but actually i do when it comes to RT:
* It is new and has many issues and open problems.
* Chances the first API will remain optimal on the long run are small.
Conclusion: If HW vendors have different solutions allowing more flexbility, it should be exposed early. This gives more options and so faster progress. We will converge to a robust standard more quickly at the cost of later incompatibility of some early soft and hardware.
My motivation is not to get 10% more perf from close to metal, but to tackle problems that require more flexibility to be addressable at all.

... primary reason why i prefer Khronos extensions mechanism over MS fixed HW Tiers. Using VK i could target the same HW tiers with the same feature set as MS proposes for actual games to release, but i can also experiment with features that may become widely adopted or not in the future when it comes to research. Discussion likely tends to put to much focus on the games, ignoring the research side.

Lurkmass · Apr 21, 2020

invictis said:
If we are to compare MS with DX12 Ultimate with Sony's PlayStation APIs, how much difference would there be between them?

We don't, DX12 Ultimate is nowhere near as powerful as GNM with extensions on PS5. D3D on Xbox is very different to D3D on PC and if Microsoft are smart they'll offer console specific extensions like last time unless they want to risk losing their performance advantage compared to their competition.

Binary compatibility isn't much of an issue with AMD compared to other vendors since they've largely converged on an ISA. The vast majority of RDNA/GFX10's instruction encoding is nearly identical to GFX7 instruction encoding based off of the LLVM code. Maintaining binary compatibility might be an issue with Intel or Nvidia but that concern is irrelevant for AMD since they don't follow the same practices as the others.

There are only incentives for Microsoft to diverge in API design between D3D on Xbox and DX12 Ultimate. DX12 Ultimate will never be a true replacement for console APIs unless Microsoft decides to expose PM4 packets and GFX7/GFX10 ISA on PC or make D3D on Xbox available on PC too.

invictis · Apr 22, 2020

Lurkmass said:
We don't, DX12 Ultimate is nowhere near as powerful as GNM with extensions on PS5. D3D on Xbox is very different to D3D on PC and if Microsoft are smart they'll offer console specific extensions like last time unless they want to risk losing their performance advantage compared to their competition.

Binary compatibility isn't much of an issue with AMD compared to other vendors since they've largely converged on an ISA. The vast majority of RDNA/GFX10's instruction encoding is nearly identical to GFX7 instruction encoding based off of the LLVM code. Maintaining binary compatibility might be an issue with Intel or Nvidia but that concern is irrelevant for AMD since they don't follow the same practices as the others.

There are only incentives for Microsoft to diverge in API design between D3D on Xbox and DX12 Ultimate. DX12 Ultimate will never be a true replacement for console APIs unless Microsoft decides to expose PM4 packets and GFX7/GFX10 ISA on PC or make D3D on Xbox available on PC too.

Does MS typically put work into their APIs for console use?
Would you expect them to alter DX12U to make it more console streamlined?

iroboto · Apr 22, 2020

invictis said:
Does MS typically put work into their APIs for console use?
Would you expect them to alter DX12U to make it more console streamlined?

Xbox has its own version of DX11 and 12 that exposes a little more functionality and has specific calls that are specific to Xbox hardware.
DX12U is not an API. It is a baseline level of support that should a GPU support at least these features at specific levels can be labelled as DX12U.

invictis · Apr 22, 2020

iroboto said:
Xbox has its own version of DX11 and 12 that exposes a little more functionality and has specific calls that are specific to Xbox hardware.
DX12U is not an API. It is a baseline level of support that should a GPU support at least these features at specific levels can be labelled as DX12U.

So the XSX will get DX12 with support for VRS, DirectML and DXR1.1 added on then?

iroboto · Apr 22, 2020

invictis said:
So the XSX will get DX12 with support for VRS, DirectML and DXR1.1 added on then?

It’s always been DX12 even on PC. Yes the latest version of DX12 has support for those features and it’s on Xbox.

DmitryKo · Apr 23, 2020

invictis said:
If we are to compare MS with DX12 Ultimate with Sony's PlayStation APIs, how much difference would there be between them?
If the PS5 has all three additions as stated above, how easy would it be for Sony to add those features to their own API and have them to be as good as MSs?

Surely console APIs can adopt new features throughout the production run, just like how Direct3D 12 API was added to the XBox One in a SDK update that was released several years after initial availability of the console.
Console hardware is fixed (and typicallly backward compatible) and the GPU 'driver' is statically linked with the game executable to run in an isolated virtual machine.

DavidGraham said:
It's dangerous for APIs to be so close to the metal, specifically in the PC space, where AMD/NVIDIA change architectures all the time

Direct3D 12 and Mantle/Vulkan are not really 'close to metal', there are still several layers of abstraction; what they do is reduce CPU overhead by passing responsibility for resource/memory management into the hands of the programmer.

APIs are required because there are still fixed-function blocks. It's too early to expect that graphics hardware and APIs will drop their 20 years of legacy and become heterogeneous and fully programmable like general-purpose CPUs. Backward compatibility and perofrmance requirements made the GPU and its respective video driver become a specialized computer board (video card) running their own customized OS (user-mode driver).

Now the mesh shader pipeline and the raytracing pipeline are good candidates for compute-only implementations that will break free from legacy fixed-function hardware and API designs, but unlike general-purpose CPUs, graphics processors require a lot of memory bandwidth and thus separate expansion boards with their own graphics memory.

Maybe truely heterogeneous APUs, with powerfull CPU/GPU and HBM2 memory integrated on the same interposer - like AMD X3D and TSMC CoWoS - will be programmable with standard C++ code like general-purpose CPUs and won't require a specialized graphics API anymore. Something like the AMD ROCm framework on Linux, where LLVM-derived C++ HC / HIP (and CUDA) compilers generate native GCN code for the ROCk driver and ROCr runtime. But this would probably require another major revision of the WDDM driver model and changes to the OS scheduler.

This would be a single-chip version of the computing node in the Cray El Capitan supercomputer, which uses EPYC Genoa (Zen4) CPU and four CDNA2 Radeon Instinct accelerators, interconnected with Infinity Architecture 3 over PCIe 5.0 bus, with full cache coherency between CPU and GPUs. EPYC Genoa and CDNA2 should debut in 2021-2022, but they are specifically targeted for HPC and it would probably take a few more years before these capabilities trickle down to performance desktop parts, i.e. Ryzen 7/9 CPUs and RDNA3/4 GPUs.

DavidGraham · Apr 23, 2020

DmitryKo said:
Direct3D 12 and Mantle/Vulkan are not really 'close to metal', there are still several layers of abstraction

Yeah I know that for DX12/Vulkan, I was stating that going closer to the metal than those standards is not ideal, I also think that Mantle is closer to the Metal than either DX12/Vulkan.

JoeJ · Apr 23, 2020

DavidGraham said:
I also think that Mantle is closer to the Metal than either DX12/Vulkan.

I have only skimmed the Mantle API doc, but it looked not really closer to the metal either. (Not sure if it's still online. Could not find it.)
Seems DX12/VK is pretty much the same, but lacking few things that did not work for other vendors.

Lurkmass · Apr 25, 2020

JoeJ said:
I have only skimmed the Mantle API doc, but it looked not really closer to the metal either. (Not sure if it's still online. Could not find it.)
Seems DX12/VK is pretty much the same, but lacking few things that did not work for other vendors.

I don't think this is true based off of the Mantle programming guide.

On Mantle, the shaders can be written using a subset of AMDIL which by itself is already specific to a hardware vendor. Renderpasses or subpasses do not have equivalents in Mantle compared to Vulkan.

Pipeline state objects are also a major pain point of both D3D12 and Vulkan. Eric Lengyel highlighted this when Vulkan lacks the ability to dynamically set the state whether a triangle is front facing or not. Mantle is even more powerful in regards to dynamic states since the blend modes can be dynamic compared to either D3D12 or Vulkan. It's very painful having a bunch of state tied to pipelines when it's faster for the hardware to just change the dynamic states rather than switching to a whole new pipeline. It was a massive challenge for id Software to keep the number of pipelines down.

If using a subset of AMDIL on Mantle "isn't closer to the metal" then you'd have argue that using the PTX ISA on CUDA isn't any closer to the metal as well either in comparison to the other APIs which isn't necessarily true because there's no other vendors who could map their hardware to the PTX ISA.

JoeJ · Apr 25, 2020

Lurkmass said:
I don't think this is true based off of the Mantle programming guide.

Fair enough. TBH, i've looked only at table of contents read only the chapters of my actual compute interest, but skipped anything related to rasterization and related states. Also did not pay attention to shading language.
Overall the structure of the API seemed close to VK but different form OpenGL to me.

I remember, surprisingly Mantle seemed pretty easy to use and understand. VK seems much more complicated.
Maybe that's also just because i've ignored the details, but it's one point why i think vendor APIs would make sense nowadays.

Basic API Questions?

invictis

JoeJ

JoeJ

invictis

JoeJ

iroboto

Daft Funk

DavidGraham

JoeJ

DavidGraham

JoeJ

Lurkmass

invictis

iroboto

Daft Funk

invictis

iroboto

Daft Funk

DmitryKo

DavidGraham

JoeJ

Lurkmass

JoeJ

Similar threads