OpenCL 3.0 [2020]

DmitryKo · Jun 7, 2021

Lurkmass said:
OpenCL over D3D12 is probably hot garbage. I can't see how Microsoft could fully implement OpenCL C since DXIL lacks support for pointers last time I checked

DXIL certainly supports memory pointers, right from the first public commit.

https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst#memory-accesses

Indexable thread-local and groupshared variables are represented as variables and accessed via LLVM C-like pointers.... The following pointer types are supported:

Non-indexable thread-local variables.
Indexable thread-local variables (DXBC x-registers).
Groupshared variables (DXBC g-registers).
Device memory pointer.
Constant-buffer-like memory pointer.

The type of DXIL pointer is differentiated by LLVM addrspace construct.

(root descriptors don't count)

Resource descriptor structure does include a memory pointer to the actual 'texture' data, although descriptors use opaque hardware-specific formats. When you construct UAV/SRV/CBV descriptor heaps and link them to your shaders with root descriptors and descriptor tables, these are translated to actual memory addresses for execution.

LLVM IR =/= SPIR/SPIR-V (SPIR/SPIR-V being forked off of LLVM IR just introduces design divergence between them)

SPIR and DXIL are rather 'frozen at' specific LLVM version, but they still constitute legitimate LLVM IR bitcode version which can be read even by current LLVM releases.

DXIL uses standard LLVM assembly instructions (such as Add, FAdd, Sub, FSub, Mul, FMul, UDiv, SDiv, FDiv, etc) and external functions which have to be implemented in LLVM assembly by each individual vendor - an example would be trigonometric functions (Cos, Sin, Tan, Acos, Asin, Atan, Hcos, Hsin, Htan, Exp, Frc, Log, Sqrt, etc.) which are expanded to Taylor series approximations by in the HLSL compiler (see DxilExpandTrigIntrinsics.cpp).

SPIR also uses standard LLVM assembly instructions and data types, and a only a few 'built-in' functions (see the SPIR specifications registry)

Another way to see that having a SPIR-V backend was a disaster is that Intel still doesn't have a oneAPI backend for their GPUs in Tensorflow which is the most popular ML framework so they're making less progress on outside projects ...

Not sure why you have to blame Intel and SPIR-V for just about everything that's wrong in this world...

Intel and Microsoft already had working oneDNN and DirectML forks of TensorFlow 1.1x. Their changes were not merged because TF developers started an overhaul to support multiple pluggable GPU devices - this should be available in TensorFlow 2.5, which is still not ready.
https://github.com/tensorflow/community/pull/243#issuecomment-837383825

Good practices doesn't outweigh bad design principles. Vendors designing their own compiler for source languages was the greatest sin committed against portability

Call it 'practices' or 'principles', major vendors don't bother with proprietary OpenCL C compilers anymore and maintain Clang/LLVM forks or branches instead (which offers an opportunity to move away from unsafe C-style language constructs to safe C++ STL abstractions like containers, iterators and constructors with move semantics).

You forget that SPIR/SPIR-V is independently developed from LLVM IR by Khronos Group so technically I am correct

SPIR 1.2/2.0 is a subset of LLVM 3.2/3.4 (as per feature table from the Khronos SPIR page), just like DXIL 1.x is a subset of LLVM 3.7. SPIR-V is indeed defined by Khronos, but it can be mapped to LLVM IR as well.

Intel open sourced their offline compiler in addition to open sourcing their SPIR-V backend in upstream LLVM

It was their design decision to compile DPC++/SYCL source into SPIR-V target, because they wanted to support third-party FPGA accelerators and they even acquired one (and their SPIR-V to machine code translator is also more compact in comparison to LLVM).

AMD already implemented their machine code translator as a LLVM back-end, so they just need to support intrinsic functions and assembly instructions issued by the SPIRV-LLVM Translator.

When we take a look at the amount of code between the ROCm runtime and Intel's compute runtime, the Intel runtime has like over ~200k more LOC compared to the ROCm equivalent

ROCr is just a tiny user-mode runtime. AFAIK the bulk of open-source ROCm work goes into the actual Clang/LLVM compiler front-end and AMDGPU back-end, as well as ROCd Kernel driver and the Linux kernel. Development mostly happens in a proprietary AMD repository though, and public GitHub repositories are updated with bulk commits only once in a while.

Lurkmass · Jun 7, 2021

DmitryKo said:
DXIL certainly supports memory pointers, right from the first public commit.

https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst#memory-accesses

Indexable thread-local and groupshared variables are represented as variables and accessed via LLVM C-like pointers.... The following pointer types are supported:

Non-indexable thread-local variables.

Indexable thread-local variables (DXBC x-registers).

Groupshared variables (DXBC g-registers).

Device memory pointer.

Constant-buffer-like memory pointer.

Does DXIL support pointers to global memory ? If it did then why do developers keep asking Microsoft to expose them in HLSL ?

DmitryKo said:
Resource descriptors do include a memory pointer to the actual data, although descriptor has an opaque hardware-specific format. When you construct UAV/SRV/CBV descriptor heaps and link them to your shaders with root descriptors and descriptor tables, these are translated to actual memory addresses for execution.

Descriptors may contain GPU VAs but aside from root descriptors you can't pass this information to the shaders! The resource bindings have to be done and accessed through the root signature which is strictly limited to 64 DWORDs in space and you can't do any of the fun stuff such as creating complex data structures like linked lists as you would normally expect from a real pointer ...

If D3D12 did truly support pointers like other APIs such as OpenCL or Vulkan did then we wouldn't need painful abstractions such as root signatures and we'd be able to place pointers directly in our resources like SRV/UAV/CBVs ...

DmitryKo said:
SPIR and DXIL are rather 'frozen at' a specific LLVM version, but they still constitute legitimate LLVM IR bitcode version which can be read even by current LLVM releases.

DXIL uses standard LLVM assembly instructions (such as Add, FAdd, Sub, FSub, Mul, FMul, UDiv, SDiv, FDiv, etc) and external functions which have to be implemented in LLVM assembly by each individual vendor - an example would be trigonometric functions (Cos, Sin, Tan, Acos, Asin, Atan, Hcos, Hsin, Htan, Exp, Frc, Log, Sqrt, etc.) which are expanded to Taylor series approximations by in the HLSL compiler (see DxilExpandTrigIntrinsics.cpp).

SPIR also uses standard LLVM assembly instructions and data types, and a only a few 'built-in' functions (see the SPIR specifications registry)

DXIL has been "diverging" from LLVM as well since Microsoft keeps updating it's specifications ...

DmitryKo said:
Not sure why you have to blame Intel and SPIR-V for just about everything in this world...

SPIR-V has it's uses which is being a useful portable abstraction for graphics shaders but you and I know that it hasn't really lived up to promise of being a portable abstraction for compute kernels ...

SPIR-V is literally the best thing that's ever happen to the Khronos Group since it played a big role in their success to developing Vulkan and for it becoming widely adopted across the industry. Anyone would've wished for the same to happen for OpenCL/SYCL but most vendor had their own plans instead ...

DmitryKo said:
Call it practices or principles, vendors don't bother with proprietary OpenCL C compilers anymore, they maintain Clang/LLVM forks or branches (which offers the opportunity to move away from unsafe C-style language constructs and embrace C++ abstractions like iterators and constructors with move semantics).

That might be true in the future because vendors plan on deprecating OpenCL! The reality is very different in the past and even now because projects like Blender infamously kept blaming AMD's OpenCL C compiler limitations for their inability to workaround them so it wasn't until AMD themselves had to intervene fix the project itself by submitting patches to them which was supposed to be the project's responsibility. OpenCL C compilers matter a lot because it's what drivers will accept and it's what led to a failure of portability ...

DmitryKo said:
It was their design decision to compile DPC++/SYCL source into SPIR-V target, because they want to support FPGA accelerators - so each device class has its own SPIR-V to machine code translator (which is also more compact comparing to LLVM).

Last I checked, Intel FPGAs only supported offline compilation so I don't think they support SPIR-V yet if they even plan to ...

DmitryKo said:
ROCr is just a tiny user-mode runtime. AFAIK the bulk of open-source ROCm work goes into the actual Clang/LLVM compiler front-end and AMDGPU back-end, as well as ROCd Kernel driver and the Linux kernel. Development mostly happens in a proprietary AMD repository though, and public GitHub repositories are updated with bulk commits only once in a while.

Everybody is aware that ROCm is not a community project which makes it very hard for outsiders to develop patches for them ...

DmitryKo · Jun 11, 2021

Lurkmass said:
Does DXIL support pointers to global memory ? If it did then why do developers keep asking Microsoft to expose them in HLSL ?

If D3D12 did truly support pointers like other APIs such as OpenCL or Vulkan did then we wouldn't need painful abstractions such as root signatures and we'd be able to place pointers directly in our resources like SRV/UAV/CBVs

These abstractions exist by design, Resource Binding is a fundamental concept of Direct3D 12 (and WDDM 2.0 GPUMMU memory management model), and this is not going to change by a GitHub request.

HLSL (and NVIDIA's Cg) was developed at time of Direct3D 9 and fixed function shader units with limited parameter memory - though Direct3D 11/12 shader hardware evolved to hugely multithreaded general purpose SIMD processors, SM 2.x/4.x concepts of input / output / constant memory were retained for backward compatibility, and resource views (and resource descriptors) were added to Direct3D API (and WDDM drivers) for a limited form of random access.

To entirely retire these abstractions, and expose unified, cache-coherent memory access to system memory and video local memory from both CPU and GPU, as would be possible in future AMD CDNA2 and Intel Xe-HPC GPUs, it would probably take a new shader model and a major Direct3D version (SM 7.0? Direct3D 13?) - maybe even a new C++ derived shader language or a single-source C++ library.

Also remember how most of the enhancements in WDDM 2.x were actually presented at WinHEC 2006 but it took major vendors a dozen years to actually implement them in hardware and expose the benefits through new APIs (AMD Mantle and Direct3D 12).

As for DXIL implementation details, LLVM IR supports addrspace (address space) attributes to tag different types of memory (i.e. thread-local, indexable thread-local, group shared, constant buffer, device memory etc. pointers above), so global virtual address space could be addded in new revisions of HLSL if needed; generic resource pointers are on the roadmap, and Vulkan SPIR-V target has been maintained as a community contribution.

Pointer size seems to be limited to 32-bit, but the documentation is still stuck at DXIL 1.2 (SM 6.2) while most current revision is DXIL 1.6 (SM 6.6), and experimental branches like HLSL 2021 include function template syntax from C++ 98 and SM 6.7.

The resource bindings have to be done and accessed through the root signature which is strictly limited to 64 DWORDs in space

Root Signatures were designed to contain links to descriptor heaps/tables (UAV/SRV/CBV), which will in turn contain millions of descriptors (with Resource Binding tier 3). While you can also store root constants and root descriptors, only a limited number would fit in the size of the structure.
https://microsoft.github.io/DirectX-Specs/d3d/ResourceBinding.html#root-signature

you can't do any of the fun stuff such as creating complex data structures like linked lists as you would normally expect from a real pointer

Walking bidirectional lists in system memory with compute shaders is not a good idea IMHO. GPUs are designed to process large continuous chunks of input data, like buffers or resources (textures), in local video memory, and random access would easily kill the performance.

You can access system memory buffers using OpenCL 2.0 global address space (Shared Virtual Memory), or similar APIs in CUDA and SYCL. Memory management would be tricky though, as on existing hardware the GPU driver would have to page full 64 Kbyte blocks from system memory to local video memory, even if you only access a single variable.

https://software.intel.com/content/...opencl-20-shared-virtual-memory-overview.html
https://rocmdocs.amd.com/en/latest/...-programming-guide.html#shared-virtual-memory
https://developer.amd.com/fine-grain-svm-with-examples/

OpenCL C compilers matter a lot because it's what drivers will accept and it's what led to a failure of portability
projects like Blender infamously kept blaming AMD's OpenCL C compiler limitations for their inability to workaround them
it wasn't until AMD themselves had to intervene fix the project itself by submitting patches to them which was supposed to be the project's responsibility

So Blender Cycles team, which has several developers from Nvidia on a payroll, released faulty OpenCL code and they blamed it on AMD OpenCL runtime, until AMD developers fixed errors in their code? Not sure why that's Khronos or AMD fault.

DXIL has been "diverging" from LLVM as well since Microsoft keeps updating it's specifications

It's still a subset of LLVM 3.7 - adding new HLSL intrinsic functions or supporting additional LLVM data types wouldn't really break bytecode compatibility.

it hasn't really lived up to promise of being a portable abstraction for compute kernels

Since there are no released implementations, you can't really say anything about portability. 'Not lived up to the promise' presumes it has been tested and found to be lacking.

SPIR-V is literally the best thing that's ever happen to the Khronos Group
Anyone would've wished for the same to happen for OpenCL/SYCL but most vendor had their own plans instead

That's OK as long as they would support C++ source code in either OpenCL or their proprietary APIs.

Ethatron · Jun 13, 2021

DmitryKo said:
To entirely retire these abstractions, and expose unified, cache-coherent memory access to system memory and video local memory from both CPU and GPU, as would be possible in future AMD CDNA2 and Intel Xe-HPC GPUs, it would probably take a new shader model and a major Direct3D version (SM 7.0? Direct3D 13?) - maybe even a new C++ derived shader language or a single-source C++ library.

Microsoft could experiment with it inside the C++ AMP run-time, without requiring re-tooling (on our side).

DmitryKo · Jun 13, 2021

Microsoft C++ AMP uses Direct3D 11 runtime, and WDDM / DXGK 1.x don't support shared virtual address space, so it's even more page copying and address patching behind the scenes.

Vulkan (Mantle) and D3D 12/WDDM 2.0 drivers can allocate device local, host visible memory pool (limited to legacy 256 Mbytes until very recently), and host visible, host cached pool in system memory; CUDA and ROCm/HIP also support 'pinned' host memory. However each pool can be only be write cached by its local processor, and memory coherency is implemented by actually disabling (flushing) the CPU cache, which incurs significant overhead.

HIP Programming Guide - Host Memory - Coherency Controls
https://rocmdocs.amd.com/en/latest/Programming_Guides/hip-programming-guide.html#host-memory

Memory management in Vulkan and DX12
Adam Sawicki (AMD)
(Powerpoint slides)
https://gpuopen.com/events/gdc-2018-presentations/

Differences in memory management between Direct3D 12 and Vulkan
https://www.asawicki.info/articles/memory_management_vulkan_direct3d_12.php5

OTOH future GPUs and CPUs would use directory-based coherency protocols like GenZ/CXL over PCIe in Intel Xe-HPC parts, or Infinity Architecture in future AMD EPYC Genos/CDNA2 parts, instead of just snooping the other processor's cache over PCIe:

https://wccftech.com/amd-next-gen-e...u-accelerator-power-el-capitan-supercomputer/
https://www.tomshardware.com/news/amd-infinity-fabric-cpu-to-gpu

https://www.nextplatform.com/2020/04/03/cxl-and-gen-z-iron-out-a-coherent-interconnect-strategy/

https://www.nextplatform.com/2019/09/18/eating-the-interconnect-alphabet-soup-with-intels-cxl/

JoeJ · Jun 13, 2021

If i got that correctly, AMP also lacks a definition of LDS memory, which is why i've ruled it out back then.
That's also the point where i see issues with something like modern C++ on GPU. Too much abstractions. HW limits like LDS memory size, subgroup size, register file size, etc. seemingly prevent convenient abstractions to become possible. :/

Lurkmass · Jun 17, 2021

DmitryKo said:
These abstractions exist by design, Resource Binding is a fundamental concept of Direct3D 12 (and WDDM 2.0 GPUMMU memory management model), and this is not going to change by a GitHub request.

HLSL (and NVIDIA's Cg) was developed at time of Direct3D 9 and fixed function shader units with limited parameter memory - though Direct3D 11/12 shader hardware evolved to hugely multithreaded general purpose SIMD processors, SM 2.x/4.x concepts of input / output / constant memory were retained for backward compatibility, and resource views (and resource descriptors) were added to Direct3D API (and WDDM drivers) for a limited form of random access.

To entirely retire these abstractions, and expose unified, cache-coherent memory access to system memory and video local memory from both CPU and GPU, as would be possible in future AMD CDNA2 and Intel Xe-HPC GPUs, it would probably take a new shader model and a major Direct3D version (SM 7.0? Direct3D 13?) - maybe even a new C++ derived shader language or a single-source C++ library.

Also remember how most of the enhancements in WDDM 2.x were actually presented at WinHEC 2006 but it took major vendors a dozen years to actually implement them in hardware and expose the benefits through new APIs (AMD Mantle and Direct3D 12).

As for DXIL implementation details, LLVM IR supports addrspace (address space) attributes to tag different types of memory (i.e. thread-local, indexable thread-local, group shared, constant buffer, device memory etc. pointers above), so global virtual address space could be addded in new revisions of HLSL if needed; generic resource pointers are on the roadmap, and Vulkan SPIR-V target has been maintained as a community contribution.

Pointer size seems to be limited to 32-bit, but the documentation is still stuck at DXIL 1.2 (SM 6.2) while most current revision is DXIL 1.6 (SM 6.6), and experimental branches like HLSL 2021 include function template syntax from C++ 98 and SM 6.7.

Compatibility should never serve as a reason to hinder the future development of a source language. Having pointers in HLSL is one oldest requests by developers to Microsoft ...

DmitryKo said:
Root Signatures were designed to contain links to descriptor heaps/tables (UAV/SRV/CBV), which will in turn contain millions of descriptors (with Resource Binding tier 3). While you can also store root constants and root descriptors, only a limited number would fit in the size of the structure.
https://microsoft.github.io/DirectX-Specs/d3d/ResourceBinding.html#root-signature

Walking bidirectional lists in system memory with compute shaders is not a good idea IMHO. GPUs are designed to process large continuous chunks of input data, like buffers or resources (textures), in local video memory, and random access would easily kill the performance.

You can access system memory buffers using OpenCL 2.0 global address space (Shared Virtual Memory), or similar APIs in CUDA and SYCL. Memory management would be tricky though, as on existing hardware the GPU driver would have to page full 64 Kbyte blocks from system memory to local video memory, even if you only access a single variable.

https://software.intel.com/content/...opencl-20-shared-virtual-memory-overview.html
https://rocmdocs.amd.com/en/latest/...-programming-guide.html#shared-virtual-memory
https://developer.amd.com/fine-grain-svm-with-examples/

On Mantle, you could build similarly complex data structures via "hierarchical descriptor set" with nested descriptor sets and it also supported pointers as well. Even GLSL with it's comparatively backwards design was able to have standardized pointers on Vulkan so why must it be D3D/HLSL the last one to hold out ?

DmitryKo said:
So Blender Cycles team, which has several developers from Nvidia on a payroll, released faulty OpenCL code and they blamed it on AMD OpenCL runtime, until AMD developers fixed errors in their code? Not sure why that's Khronos or AMD fault.

Even if Nvidia is funding them, it's mostly to work on Cycle's CUDA/Optix backend rather than sabotaging the OpenCL backend. Don't you see the problem yet ? With OpenCL, developers have no idea what they're doing is even right or wrong. AMD's interference is just a sign that developers ultimately can't be trusted ...

DmitryKo said:
It's still a subset of LLVM 3.7 - adding new HLSL intrinsic functions or supporting additional LLVM data types wouldn't really break bytecode compatibility.

DXIL stopped being a subset of LLVM especially with the release of DX12 Ultimate features. DXIL just like SPIR-V are both diverging from LLVM as Khronos/Microsoft intended ...

DmitryKo said:
Since there are no released implementations, you can't really say anything about portability. 'Not lived up to the promise' presumes it has been tested and found to be lacking.

Well there were an implementations from ARM, Intel, and AMD attempted similar a concept in the past. AMD canned SPIR support for their OpenCL implementation for whatever reason they deemed that was unfit. Before that AMD also supported HSAIL but no other vendors were interested so it wasn't a viable option for portability either. That leaves us with ARM's Mali devices for which we have no idea on the quality of it's implementation and we have Intel who keeps adding more SPIR-V extensions which is bound to introduce more divergence ...

DmitryKo said:
That's OK as long as they would support C++ source code in either OpenCL or their proprietary APIs.

Soon, your idea is going to be put to the test ...

Ethatron · Jun 18, 2021

DmitryKo said:
Microsoft C++ AMP uses Direct3D 11 runtime ...

You said one needs to invent/derive a language. I noted that there is already one available, MS only needs to experiment with DX13 under the hood. The CPU targeting for verification is also readily available.

JoeJ said:
If i got that correctly, AMP also lacks a definition of LDS memory, which is why i've ruled it out back then.

Ofc it has: tile_static

JoeJ · Jun 18, 2021

Ethatron said:
Ofc it has: tile_static

Oh, no wonder it did not find it. MS really is creative with renaming things.
Seems a very attractive option then...

DmitryKo · Jun 26, 2021

Ethatron said:
You said one needs to invent/derive a language. I noted that there is already one available, MS only needs to experiment with DX13 under the hood

Microsoft is not likely to reimplement C++ AMP on top of Direct3D 12 (or 13, if that exists). Herb Sutter has since lost his moustache and is now more interested in improving standard C++, rather than baking another set of proprietary extensions...

Lurkmass said:
Compatibility should never serve as a reason to hinder the future development of a source language. Having pointers in HLSL is one oldest requests by developers to Microsoft

It's not just about HLSL shaders - this is how the entire resource binding API has been designed.

On Mantle, you could build similarly complex data structures via "hierarchical descriptor set" with nested descriptor sets and it also supported pointers as well.
why must it be D3D/HLSL the last one to hold out ?

Because it will be a significant departure from their Direct3D 10/11 programming model, which was largely retained in Direct3D 12 (in hopes of bringing mobile GPUs to the Windows 10 platform, which never materialized). Mantle was redesigned from scratch to support one single architecture, AMD GCN.

DXIL stopped being a subset of LLVM especially with the release of DX12 Ultimate features

It's still a subset of LLVM 3.7 - however recent LLVM tools cannnot generate old versions of the bitcode, so Microsoft is unable to rebase their code on the latest Clang/LLVM 13; they would probably need another clean break for SM 7.0.

Soon, your idea is going to be put to the test

Well, it's still in an early stage, just the Intel DPC++ fork of LLVM/Clang compiler merging the AMD ROCm branch of AMDGPU LLVM backend. But if they could make it work and port their changes back to the LLVM/Clang tree, that would be a significant step toward unification...

Don't you see the problem yet ? With OpenCL, developers have no idea what they're doing is even right or wrong

Writing uportable OpenCL code is indeed a problem, but it need to be solved by better developer tooling.

we have Intel who keeps adding more SPIR-V extensions

At least they port their changes back to open-source repositories.

DegustatoR · Jul 12, 2021

https://twitter.com/x/status/1414585336437555207

536571616e74 · Jul 13, 2021

DmitryKo said:
Again, these are implementation details for processing obvious programming errors.

This is from the current draft of the C2x standard,
http://www.open-std.org/jtc1/sc22/wg14/www/projects#9899

3.4.3
undefined behavior

Behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements

Note 1 to entry: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

I.e. non-portable or erroneous code can be prevented from compiling and silently discarded, or translated according to some architecture-specific behavior, and this can change with each different target machine architecture or new/updated programming environment.

A Guide to Undefined Behavior in C and C++, Part 1
https://blog.regehr.org/archives/213
Part 2
https://blog.regehr.org/archives/226
Part 3
https://blog.regehr.org/archives/232

Only because machine code translators can decide how to process these errors on a specific machine architecture - i.e. signed integer overflow could be silently wrapped around to a two's complement representation, or could throw a runtime error when the overflow flag is detected.

LLVM IR actually preserves undefined behavior coming from the C/C++ source by using 'undef', 'poison' and 'freeze' attributes to mark potentially undefined results - see below for details:

Taming Undefined Behavior in LLVM
https://blog.regehr.org/archives/1496
https://www.microsoft.com/en-us/research/publication/taming-undefined-behavior-llvm/

https://llvm.org/docs/LangRef.html#undefined-values
https://llvm.org/docs/LangRef.html#poison-values

Alive2 Part 1: Introduction
https://blog.regehr.org/archives/1722
Alive2 Part 2: Tracking miscompilations in LLVM using its own unit tests
https://blog.regehr.org/archives/1737
Alive2 Part 3: Things You Can and Can’t Do with Undef in LLVM
https://blog.regehr.org/archives/1837

On this planet, C and C++ are portable languages which discourage writing unportable code, but do not expressly forbid it, trusting the programmer to understand the implications on their specific machine architecture.

Relying on programmers to stop making obvious programming errors was surely a mistake That's why modern C/C++ compilers include strict warning levels, code analysis, sanitizers, and debug builds/runtimes, to help the programmer discover and correct these errors.

Undefined behaviour is often completely non-obvious, and its existence is the reason that companies that use C/C++ invest heavily in tooling / sanitisers to avoid them, most notably memory and lifetime issues. Despite this, Microsoft estimates that 70% of security issues arise from undefined behaviour (memory/lifetime issues). Also, all non-C languages were created in response to the difficulty of avoiding undefined behaviours.

Undefined behaviours have nothing to do with the obviousness of the error. They’re literally just undefined behaviours from the perspective of the language spec, originally intended to improve performance by leaving the choice to do the right thing in the programmers hands.

Lastly, LLVM IR is something that is consumed only by the LLVM compiler. IRs are intermediate representations of higher level languages. I have no insight into whether they would ever be consumed by a driver, but if it were it would be purely to compile byte code. I could imagine uses for that, for instance if you wanted to optimise some ML pipeline based on the features of data known only at runtime.

Lurkmass · Aug 18, 2021

AMD has deliberated their final decision. They're going to introduce their HIP API over the WDDM kernel driver on Windows and Blender is going to be the first application to support this environment ...

It's only a matter of time before OpenCL disappears for good. There can be no future behind portable source languages or reusable binaries in the long-term ...