Basic API Questions?

Render states and shaders are high-level abstractions, introduced 25-30 years ago in software frameworks (specifically OpenGL and Photorealistic Renderman) for 'workstation' computers that had several dozen MBytes of memory, a single CPU with several MFLOPs, and a simple 'framebuffer' graphics card.

During the GPU revolution, abstractions were necessary to make life simpler for both developers and end-users because there were many different GPU architectures on the market, so proprietary 'direct-to-metal' APIs from 3Dfx (Glide), Rendition (RRedline), and PowerVR (PowerSGL) were ultimately displaced by MiniGL / OpenGL.

But today the hardware landscape is quite different. As more fixed-function blocks are replaced and emulated with compute shaders - and meshlet rasterization and programmable BVH ray tracing displaces the traditional triangle rasterization pipeline - legacy state-management APIs and shader languages will be supplanted by parallel processing frameworks written in C/C++ and Python. IMHO this will happen overnight once AMD brings their ROCm compilers and runtime to the Windows platform, as it's source-code compatible with CUDA.
 
Last edited:
As more fixed-function blocks are replaced and emulated with compute shaders - and meshlet rasterization and programmable BVH ray tracing displaces the traditional triangle rasterization pipeline - legacy state-management APIs and shader languages will be supplanted by parallel processing frameworks written in C/C++ and Python. IMHO this will happen overnight once AMD brings their ROCm compilers and runtime to the Windows platform, as it's source-code compatible with CUDA.

Unfortunately, AMD does not intend to bring ROCm on Windows. There were potential plans at the time for the OpenCL team to develop a new compute stack on Windows according to Gregory Stoner but then things changed once Khronos released their new Vulkan API standard at the time so AMD instead assigned their OpenCL team to develop their Vulkan graphics stack since it was arguably more important to invest in an industry standard rather than a vendor specific standard.

Had Vulkan not panned out in the Khronos Group, there might've been a real possibility that we could've at the very least seen the release of the HIP API on Windows. ROCm was meant to be a successor on the work they did in HSA since the HSA Foundation ended in a grind to a halt because none of the other members like ARM, ImgTec, Qualcomm or Samsung really wanted what AMD were having so HSAIL died since only AMD were ever committed to supporting it. AMD wanted to do compute APIs right this time around and the first thing they did is not dealing with vendor neutral intermediate languages/representations like DXBC, DXIL, HSAIL, SPIR/SPIR-V so instead the HIP API offers a fully offline compilation model where HIP kernels get compiled into raw GCN binaries. A truly offline compilation model has only been traditionally seen on console APIs like where PSSL shaders are also compiled into raw GCN binaries on GNM. Offline compilation has the advantage of having no JIT compilation overhead and even CUDA offers this with their "fat binaries" if you wanted to avoid the runtime compilation from PTX as well. The other thing that AMD did with ROCm/HIP is that they offered tools like hipify to convert the CUDA kernels into HIP kernels so that they could make porting CUDA applications easier on their HIP API.

You're pretty much stuck with Linux and specific hardware models for the foreseeable future if you want to do any serious compute applications on AMD. Both OpenCL and HSAIL are dead for obvious political reasons. Only Intel really cares about making any effort to support Khronos' SYCL standard despite still not being a conformant implementation. If history is anything to go by then Khronos' new effort in the making is likely already DOA since it's been almost 3 years since we've yet to hear real details about their new compute API assuming that they're even still working on it anymore ...
 
I believe AMD is focused on delivering ROCm 4.0 for the HPC platform, to support the growing demand for EPYC / Radeon Instinct processing nodes, as used in HP/Cray El Capitan supercomputer (see Financial Analyst Day 2020 presentations - Driving GPU Leadership, slide 16 Data Center Software Evolution)

AMD-CDNA-Radeon-Instinct-GPU_4-1480x833.png

WCCFTech

Hopefully they will have enough resources to support the desktop computing platform as well.

Going forward, API and language fragmentation remains a problem. The ISO C++ committee has been discussing HPC-friendly language extensions in modern C++20/23/26. The latest proposal is a unified execution control model based on the 'executor' interfaces and properties (proposals P0443, P1341, P1436) to replace vendor-specific extensions and framework constructs (such as defer, define_task_block, dispatch, strand<>, future::then, async, invoke, post etc).

See IWOCL / DHPC++ 2019 presentation Towards Heterogeneous and Distributed Computing in C++ for preliminary details.

But it will probably take a decade for current vendor and consortium-specific language extensions, APIs and frameworks to catch up...
 
Last edited:
ROCm™ – Open Software Ecosystem for Accelerated Compute
July 6, 2020

In this blog, I am happy to announce our first set of on demand videos on the ROCm technology. You can find them here below. In these videos you will learn about AMD GPUs and how to develop applications that can utilize their compute power to accelerate your applications. You will learn how the GPU works, how threading works on them and how to write your programs using the HIP API in the ROCm SDK.

ROCm Video Series
1) Introduction to AMD GPU Hardware: Link
2) GPU Programming Concepts Part 1 – Porting with HIP: Link
3) GPU Programming Concepts Part 2 – Device Management, Synchronization and MPI Programming: Link
4) GPU Programming Concepts Part 3 – Device Code, Shared Memory and Thread Synchronization: Link
5) GPU Programming Software – Compilers, Libraries and Tools: Link
6) Porting CUDA to HIP: Link
https://www.hpcwire.com/2020/07/06/rocm-open-software-ecosystem-for-accelerated-compute/
 
Shader compilation stuttering is one of the biggest issues with PC gaming at the moment. I wish there was a way to download precompiled shaders and have them cached so that the issue would be reduced/eliminated. Some games already do pre-compile shaders at startup, but not enough games do it.

I don't mind waiting 10-15min for shaders to compile and cache the first time I run the game, and then have a great smooth experience.

Hopefully something can be done about that in the future.
 
Nvidia has an option to save the shaders to disk
GJJimM3.jpg

I have it disabled because i dont know how much drive space they take up (could be an issue when you have hundreds of games installed)
 
Nvidia has an option to save the shaders to disk
GJJimM3.jpg

I have it disabled because i dont know how much drive space they take up (could be an issue when you have hundreds of games installed)
Yea, you really shouldn't turn that setting off. That will REALLY affect the performance in many games. Generally they're not going to take up a huge amount of space, although I suppose if you never do clean driver installations and have played tons and tons of games. I have heard of some people having massive shader caches in the gigabytes, but I think that's pretty rare and not really applicable to a typical user. Even still, if you find your cache getting too big for your liking you can simply delete the contents of the folder and rebuild it around the games you're currently playing.

The issue of course is that to cache shaders you have to play the game, and the first playthrough is the most important. It's the one the user bases their experience around... and having a first experience full of hitches while a cache is being generated isn't really acceptable. Something has to give here. Developers either have to start letting us generate a shader cache for our systems before we begin playing, or better yet, some really smart people figure out how to eliminate the issue by some other means.
 
Shader compilation stuttering is one of the biggest issues with PC gaming at the moment. I wish there was a way to download precompiled shaders and have them cached so that the issue would be reduced/eliminated. Some games already do pre-compile shaders at startup, but not enough games do it.

It's not going to happen ...

What if an IHV (Nvidia) wants to change their ISA ? In that case they make a new compiler for their new architecture and thus the shaders that were compiled and saved before become invalidated since the old binaries only work for their previous architecture.

Much less, what if an IHV (AMD) has some binary compatibility but they still want to update their compiler ? While the old binaries might still be valid to run on the new hardware, an updated compiler will provide new optimization paths which can have potentially have different code gen results. In this case a driver update could invalidate the old binaries as well in the end because the code gen between different versions of the compiler aren't the same.

I don't mind waiting 10-15min for shaders to compile and cache the first time I run the game, and then have a great smooth experience.

Borderlands 3 did this with it's D3D12 backend and some people felt it negatively impacted the game's experience. I can only imagine that this will get old very soon after a new driver update or a new hardware release ...

It's more preferable instead for developers to just split compilation during level loading times or do the compilation in the background.

Hopefully something can be done about that in the future.

There are solutions right now in which some are good like we see on consoles and most of them are bad or far from ideal like we see on PC.

On consoles, games are shipped with precompiled binaries so runtime shader compilation doesn't exist.

On PC, all shaders can be compiled ahead of time or you could compile them in smaller chunks for different loading sections and both have their own annoying drawbacks ...
 
Borderlands 3 did this with it's D3D12 backend and some people felt it negatively impacted the game's experience. I can only imagine that this will get old very soon after a new driver update or a new hardware release ...

It's more preferable instead for developers to just split compilation during level loading times or do the compilation in the background.

Constant stuttering during play gets old far quicker than waiting for shaders to compile once. I mean, who's really updating their hardware and GPU drivers mid game all the time to the point where you'd constantly be rebuilding a shader cache? The people who complain about that stuff simply don't realize what's happening and what's causing the stutters. They think it's "lazy developers" or "bad optimization" and they write it off.... It's annoying. Developers need to allow people who would prefer the option to pre-compile shaders to do so, and let those who don't suffer with stutters.

There are solutions right now in which some are good like we see on consoles and most of them are bad or far from ideal like we see on PC.

On consoles, games are shipped with precompiled binaries so runtime shader compilation doesn't exist.

On PC, all shaders can be compiled ahead of time or you could compile them in smaller chunks for different loading sections and both have their own annoying drawbacks ...

Here's my question... the more games become threaded and the more optimized engines become at streaming in data JIT.. utilizing the ever increasing amount of cores PC CPUs have.. could the issue be alleviated, or generally solved? I don't know enough about engines and programming to understand how it works, but as a gamer, I wonder why shaders have to stall the game completely until compiled and why they don't use more threads and compile in the background well ahead of when they are needed?

I just think that with the new consoles, and loading screens becoming a thing of the past, I wonder what's going to happen?

Just give me the option to pre-compile the shaders for my hardware and I'll do it every single time. Like I said, something has to give.
 
Last edited:
Constant stuttering during play gets old far quicker than waiting for shaders to compile once. I mean, who's really updating their hardware and GPU drivers mid game all the time to the point where you'd constantly be rebuilding a shader cache? The people who complain about that stuff simply don't realize what's happening and what's causing the stutters. They think it's "lazy developers" or "bad optimization" and they write it off.... It's annoying. Developers need to allow people who would prefer the option to pre-compile shaders to do so, and let those who don't suffer with stutters.

Recompilation doesn't affect just one game, it affects ALL of your games and every time you reopen any applications after a hardware upgrade/driver update it will trigger that process again. By comparison the stutter caused by compilation usually persists for no more than an hour ...

Here's my question... the more games become threaded and the more optimized engines become at streaming in data JIT.. utilizing the ever increasing amount of cores PC CPUs have.. could the issue be alleviated, or generally solved? I don't know enough about engines and programming to understand how it works, but as a gamer, I wonder why shaders have to stall the game completely until compiled and why they don't use more threads and compile in the background well ahead of when they are needed?

I just think that with the new consoles, and loading screens becoming a thing of the past, I wonder what's going to happen?

Just give me the option to pre-compile the shaders for my hardware and I'll do it every single time. Like I said, something has to give.

Even on PC some precompilation does happen when HLSL shaders get compiled into DXIL/SPIR-V binaries but a JIT compiler is still needed to translate them into native GPU bytecode. I don't believe there is any effective way to circumvent this expensive compilation model. The only good solution is to converge on an ISA so that precompiled binaries can be shipped like we see on consoles ...

Shaders have to be fully compiled otherwise your choice is skipping a draw which can either cause visual artifacts or game bugs/crashes since some game logic can rely on those compiled shaders ...
 
Recompilation doesn't affect just one game, it affects ALL of your games and every time you reopen any applications after a hardware upgrade/driver update it will trigger that process again. By comparison the stutter caused by compilation usually persists for no more than an hour ...
I know that. As I said.. how many games are people playing through at one time, and how many times are they upgrading hardware/drivers inside of game playthroughs? I think it's largely exaggerated just how much of a "bother" it really is to wait ~5 min to build a shader cache likely once for an entire playthrough. Detroit: Become Human does this, and it took ~2:30sec to build the shader cache on my 3900x.

The first play through is everything.. it's what you impressions off of. Its ridiculous that a person has to experience stuttering cutscenes, camera cuts, effect assets, and open world streaming, due to this as a first impression.

Like I said... they've either got to solve this by putting more effort into ensuring that the main thread isn't being stalled, or they've got to give the people that give a damn about it the option to pre-compile shaders and generate caches before we play the first time. I think it's hilarious the touting from companies like Intel, AMD, and Nvidia about higher performance CPUs/GPUs, super high framerates, freesync/gsync, 144/165/240hz+ refreshrates... and yet you have to deal with stupid shader stutter. It's a problem that needs to be solved because as consoles blur the lines and reach high enough fidelity and framerates and reduce/eliminate loading and have smoother performance, it's going to become more noticeable how PCs are stuttering more and more.

Even on PC some precompilation does happen when HLSL shaders get compiled into DXIL/SPIR-V binaries but a JIT compiler is still needed to translate them into native GPU bytecode. I don't believe there is any effective way to circumvent this expensive compilation model. The only good solution is to converge on an ISA so that precompiled binaries can be shipped like we see on consoles ...

Shaders have to be fully compiled otherwise your choice is skipping a draw which can either cause visual artifacts or game bugs/crashes since some game logic can rely on those compiled shaders ...
You said the only good solution is to converge on an ISA so precompiled binaries can be shipped like they are on consoles... well I'm saying.. they've got to do what they've got to do. If they can do that somehow, then they should. The powers that be (Nvidia/AMD/Intel) and developers should all be focusing on this ever increasingly important issue. It should be a top priority. Anything that can help mitigate, eliminate, reduce, improve this issue should be done.

I 100% admit I don't understand the complexity of the issue on a programming level and that everyone is different when it comes to tolerances to certain things... but I can only say that it's absolutely an issue that everyone on PC experiences, whether they admit it or not, and downplaying the issue isn't going to ever help it get better. More people need to speak up and bring attention to this stuff, and understand it for what it is. I can't count how many times I've read forum posts about people that play a game, complaining they have stutter, then are offered some kind of advice, then they go back and play the part over and say "oh it's great now, must just have been my PC".... No.. nothing was fixed... you just have a shader cache now...

It's not like every game suffers from this.. there's lots of well optimized games out there. But there are lots that DO have issues, and I think with the coming generation it's going to become more and more apparent unless developers and IHVs work together to improve it.
 
I know that. As I said.. how many games are people playing through at one time, and how many times are they upgrading hardware/drivers inside of game playthroughs? I think it's largely exaggerated just how much of a "bother" it really is to wait ~5 min to build a shader cache likely once for an entire playthrough. Detroit: Become Human does this, and it took ~2:30sec to build the shader cache on my 3900x.

On Borderlands 3, it takes 10+ minutes to build your shader cache so I can easily foresee pathological cases in the future reaching an upwards of over half an hour to compile all the shaders. How willing would you be now to tolerate those time scales that I mentioned ?

You said the only good solution is to converge on an ISA so precompiled binaries can be shipped like they are on consoles... well I'm saying.. they've got to do what they've got to do. If they can do that somehow, then they should. The powers that be (Nvidia/AMD/Intel) and developers should all be focusing on this ever increasingly important issue. It should be a top priority. Anything that can help mitigate, eliminate, reduce, improve this issue should be done.

The best possible solution on PC would be is to expose an API extension to use native bytecode or a vendor specific intermediate language like AMDIL/NVPTX ...

AMD could offer this option to developers seeing as how Mantle's shaders used AMDIL and was originally designed to accept other binary formats as well. To Nvidia, this will likely never be an option including exposing NVPTX for graphics APIs since they don't trust game developers to do the right thing for their hardware compared to CUDA developers.

The obvious downside to having multiple shader binaries is the required higher maintenance work and the increased size bloat of needing to ship multiple binaries for different IHVs ...
 
On Borderlands 3, it takes 10+ minutes to build your shader cache so I can easily foresee pathological cases in the future reaching an upwards of over half an hour to compile all the shaders. How willing would you be now to tolerate those time scales that I mentioned ?



The best possible solution on PC would be is to expose an API extension to use native bytecode or a vendor specific intermediate language like AMDIL/NVPTX ...

AMD could offer this option to developers seeing as how Mantle's shaders used AMDIL and was originally designed to accept other binary formats as well. To Nvidia, this will likely never be an option including exposing NVPTX for graphics APIs since they don't trust game developers to do the right thing for their hardware compared to CUDA developers.

The obvious downside to having multiple shader binaries is the required higher maintenance work and the increased size bloat of needing to ship multiple binaries for different IHVs ...
I'm pretty sure I've already said multiple times I have absolutely NO issue with waiting to compile all the shaders. I'm BEGGING developers to do it at this point. Like I said... give us the option. I'm not downloading and installing new drivers 3-4 times during a single playthrough of a game... I'm not changing hardware every few days....

Let me put it this way.. I would wait as long as it takes.. it can take up as many GBs of drive space as it needs... I want a smooth stutter free experience when I first play the game... not after an hour or 2 of stutter city while I build a cache..

There's people who have slow internet connections which takes hours/days to download a game.. for some people, it's merely minutes. I realize that some people have slower PCs and thus compiling shaders would take even that much longer... but in my case I could probably download the game and spend 30min compiling shaders, and still be done before many people would even have the game downloaded.

In my opinion, the experience you have OUTSIDE of the game.. meaning, the wait to download, the install process, and everything leading up to the main menu... is separate from the experience when you click that button and begin the game. So if I have to wait an extra 10min or 30min while my game "decrypts" or "compiles shaders" once... then so be it. I want that experience once the game launches to be as good as it can possibly be. And we don't have to act like EVERY game will have issues or needs to cache EVERY shader... but the AAA graphically intense open world ect ect.. games should offer this option.
 
Can't these shaders be compiled once and be downloaded by the game and driver combination? Seems like an obvious improvement to be had for every PC gamer.

That way the individual's PC would never have to do the task of compiling unless the precompiled wasn't found online.
 
Can't these shaders be compiled once and be downloaded by the game and driver combination? Seems like an obvious improvement to be had for every PC gamer.

That way the individual's PC would never have to do the task of compiling unless the precompiled wasn't found online.

That's pretty much an intractable solution too ...

As the library of software grows, you could end up with downloading hundreds of gigabytes worth of data for just the games that you do own. Game updates can also become another potential source of shader recompilations too along with the different driver versions. Congested network traffic would be another concern as well since ISPs would have an incentive to throttle sending this much data ...

There's also the question of whether IHVs should maintain this library of shader caches or should it rest on developers ? Do they also only maintain the latest versions of these shader caches along with the latest drivers or should older versions of shader caches be available for download as well upto a certain date ?
 
Just take a look at what e.g. Steam does. When users run the game via Steam as a launcher, Steam hooks up the graphic APIs and dumps all distinct pipeline states, as well as the corresponding driver caches. (At least for Vulkan, DX12 and recently also OpenGL.)

Then Steam distributes both the dumped state caches (see e.g. Valves Fossilize layer for Vulkan) to all users (for local pre-compilation outside the shipped application, it's simply replaying the pipeline state creations), as well as specific driver caches from DX12/Vulkan titles to users with the same driver release and hardware.

Letting the CDN do all the work is a good choice here, as they can easily crowd source the shader compilation process on demand, and the developer isn't required to play catch-up with every new driver release on his own. Pre-compilation as a developer is only viable if your platform is essentially fixed. Say, you are shipping for a console.

Pre-compilation in-application is likewise bad, as pre-caching should never happen only on first start, but always already during distribution. As a user, you wouldn't notice when an update to the shader cache costs another 5 minutes alongside a software update you did during off hours, but stalling for the same 5 minutes when you just wanted to launch the game is unacceptable.

IVH could maintain a cache repository, but let's be realistic, that requires a beefy CDN foremost, and there is also the catch that IVH are not aware about software updates or versioning. That's to say, they wouldn't even know what specific state cache set you would require.
 
Last edited:
I see opportunity to repeat my request on caching BVH for RT.
I think it's a much larger problem than compiling shaders, which i can avoid by.... using less shaders :)
But i can't do that for BVH. If i traverse open world, every time i switch LOD for some chunk of geometry i have to rebuild BVH at runtime. That's a total waste of CPU and a serious limitation. It even makes it impossible to handle 'unlimited and insane detail' that is becoming the norm.

So i want API support to pregenerate BVH and cache it to disk. To kill our SSDs at maximum efficiency :D
 
The issue of course is that to cache shaders you have to play the game,
Theres a project for you.
Create the nvidia compiled shader repository web site
get people to play games with the "cache shaders to disc option enabled" put them in a zip and upload them to the site.
Then people can download them.
 
Back
Top