OpenCL 3.0 [2020]

DmitryKo · Apr 27, 2020

OpenCL 3.0 has been released, with the language specifications upgraded to 'C++ for OpenCL' , a C++17 and OpenCL C 2.0 compliant compiler based on the Clang/LLVM which replaces 'OpenCL C++' from v2.2.

khronos.org/news/press/khronos-group-releases-opencl-3.0
khronos.org/assets/uploads/developers/library/2020-iwocl-syclcon/OpenCL-3.0-Launch-Apr20.pdf

khronos.org/opencl/
khronos.org/registry/OpenCL/

khronos.org/developers/library/2020-iwocl-syclcon
iwocl.org/iwocl-2020

Lurkmass · Apr 27, 2020

@DmitryKo Things are turning out worse than I had imagined at the Khronos Group ...

I thought we were going to have a true OpenCL successor but instead they made a compatibility breaking update to OpenCL to get rid of the so called 'cruft' from OpenCL 2.x just like how OpenGL 3.x introduced "compatibility profiles" to put OpenGL 1.x/2.x APIs into deprecation.

I guess we can now pretend that OpenCL 2.x never really happened and instead OpenCL 1.x and 3.0 are considered 'canon' but AMD still remains deathly silent about OpenCL 3.0 so I still don't see them putting in any real effort towards OpenCL since their OpenCL 2.0 driver implementation supports more features than a base OpenCL 3.0 implementation would. AMD all in all can't be too happy the way things played out since they were originally one of the biggest proponents about OpenCL 2.0 and now seeing that their biggest compute legacy has been shattered inside the Khronos Group must make them feel very bitter of how things played out. I'm convinced that after Khronos' new press release, AMD has to stop wasting their time on stupid industry consortium's like that and focus on making their own ROCm/HIP compute stack because they realize nobody else is coming to help them. It's the same deal with Intel and their oneAPI stack. I don't think either ROCm or their HIP API is planned to ever come to Windows in the future and I expect it to be exclusively a CDNA based product type of feature thing ... (ROCm doesn't even officially support GFX10)

Khronos Group has had a pretty crap record so far aside from maybe Vulkan. OpenGL went into dumpster fire despite insisting that Vulkan somehow "wouldn't be it's successor" when that's exactly what's happening right now. OpenCL defacto died out in favour of CUDA and then some Khronos members formed the HSA Foundation but that died out too so now we're in this spot where we have AMD ROCm/HIP and Intel oneAPI/DPC++ (SYCL with Intel specific extensions). Even WebGL is going to die out once WebGPU comes around. OpenCL 3.0 can go into the trash bin where it truly belongs.:smile2:

DmitryKo · Apr 27, 2020

This is not 'OpenCL Next ' they've been talking for a while, it's rather a maintenance release with improved tools and API interfaces to bring support for the mobile platforms and lay the foundation for new features in OpenCL Next.

While 'C++ for OpenCL' specification is substantially different from the 'OpenCL C++' 2.2 kernel language, no vendor ever supported the latter. Intel has an OpenCL 2.1 runtime, but only uses OpenCL C 2.0 and SYCL 1.2. AMD never implemented OpenCL 2.1 with SPIR-V and 'OpenCL C++' language - it's only OpenCL 2.0, OpenCL C, and 'C++ Static' extension. NVIDIA never implemented OpenCL 2.0 in the first place.

The new 'C++ for OpenCL' is a superset of OpenCL C 2.0, with the usual C++ compiler improvements like implicit type checking, nullptr, initializers and constructors/destructors etc. OpenCL C 1.x dialect is supported as well.
Given that ROCm HC/HIP compiler uses the same Clang/LLVM infrastructure, it will be relatively easy for AMD to implement C++ language features and SPIR-V bytecode (and probably native binaries) for the updated OpenCL 3.0 runtime. There are also OpenCL on Vulkan translation layers.

What else would you expect from 'a true OpenCL successor' ?

Lurkmass · Apr 28, 2020

DmitryKo said:
This is not 'OpenCL Next', rather a maintenance release laying the foundation for new features with improved tools and API interfaces.

It's an admission of defeat is what it is. OpenCL 3.0 is literally a rehash of OpenCL 1.2 ...

While changes from the 'OpenCL C++' 2.2 kernel language seem substantial, the new 'C++ for OpenCL' is a superset of OpenCL C, with the usual C++ compiler improvements like implicit type checking, nullptr, initializers and constructors/destructors etc.
I don't think AMD ever supported OpenCL 2.1 with SPIR-V bytecode and 'OpenCL C++' language - it's only OpenCL 2.0, OpenCL C, and C++ Static extension. Given that ROCm HC/HIP compiler uses the same Clang/LLVM infrastructure, it will be relatively easy for AMD to implement C++ language features and SPIR-V (and probably native binaries) for the updated OpenCL 3.0 runtime.
NVIDIA never supported OpenCL 2.0 features in the first place.

'C++' for OpenCL sucks too. It can't do any of the fun stuff such as templates, lambdas, or classes like we would see with CUDA or HIP kernels so that means no single source programming model.

AMD also does not want SPIR-V for their compute APIs. In fact, their ROCm OpenCL implementation does offline compilation as well. Introducing SPIR-V means that AMD would need to implement two compilers which would potentially introduce more bugs. One compiler to translate the OpenCL kernels into SPIR-V bytecode and another compiler to translate SPIR-V bytecode into GCN binaries.

What else would you expect from 'a true OpenCL successor' ?

I don't expect anything since the Khronos Group obviously can't get it together so might as well chalk it up as another "design by committee" failure ...

Deleted member 2197 · Apr 28, 2020

Good discussion on OpenCL (track 12:18)
January 17, 2020

DmitryKo · Apr 28, 2020

Lurkmass said:
It can't do any of the fun stuff such as templates, lambdas, or classes like we would see with CUDA or HIP kernels

'C++ for OpenCL' does support templates, lambdas, and classes - it's standard ISO C++17 and OpenCL C 2.0, except where the specification says otherwise.

Introducing SPIR-V means that AMD would need to implement two compilers

SPIR-V bytecode is optional in OpenCL 3.0, just like other vendor-specific intermediate languages. Executable device code binaries have been supported since OpenCL 1.0, with no changes in OpenCL 3.0 - these may contain platform-specific native code or vendor-specific intermediate code, depending on the implementation.

Lurkmass · Apr 28, 2020

DmitryKo said:
'C++ for OpenCL' does support templates, lambdas, and classes - it's standard ISO C++17 and OpenCL C 2.0, except where the specification says otherwise.

It finally catches up to SYCL which was only released over 5 years ago! I don't have much hope at this point for any of the new OpenCL kernel languages gaining traction beyond OpenCL C ...

SPIR-V bytecode is optional in OpenCL 3.0, just like other vendor-specific intermediate languages. Executable device code binaries have been supported since OpenCL 1.0, with no changes in OpenCL 3.0 - these may contain platform-specific native code or vendor-specific intermediate code, depending on the implementation.

If you only ship vendor specific binaries then that pretty much kills OpenCL's 'portability' argument altogether since the software author would have to explicitly develop against multiple differing implementations to target a wider range of hardware so what exactly would the incentive be for AMD to develop OpenCL again when their HIP API is much more and better ?

If you're going to generate native binaries and possibly for even different vendors as well why not target vendor specific APIs too since they'll likely have more features, higher performance and most important of all less maintenance for the vendor ? AMD don't get anything out of supporting OpenCL since it has less features, less performance, and it's more maintenance too.

JoeJ · Apr 28, 2020

DmitryKo said:
NVIDIA never implemented OpenCL 2.0 in the first place.

AFAIK, they did for Quadros. (See here second paragraph: https://www.heise.de/newsticker/mel...100-mit-16-GByte-HBM2-und-NVLink-3617609.html)

I did not know there seems consumer support too now: https://streamhpc.com/blog/2017-02-22/nvidia-enables-opencl-2-0-beta-support/
The mentioned limitations are not really limitations. But personally i never used CL2.0 because NV did not intend to support it, obviously for political reasons.

Lurkmass said:
If you only ship vendor specific binaries then that pretty much kills OpenCL's 'portability' argument altogether since the software author would have to explicitly develop against multiple differing implementations to target a wider range of hardware so what exactly would the incentive be for AMD to develop OpenCL again when their HIP API is much more and better ?

If you're going to generate native binaries and possibly for even different vendors as well why not target vendor specific APIs too since they'll likely have more features, higher performance and most important of all less maintenance for the vendor ? AMD don't get anything out of supporting OpenCL since it has less features, less performance, and it's more maintenance too.

Yes. Especially if we still talk about games i'm not sure if a comfortable generic compute API is possible because HW differences force us to optimize per vendor anyways, if we want best performance.
Can a C++ alike standard deal with different wavefront sizes? Can it give a reasonable abstraction over LDS memory? Can it utilize (an eventually upcoming) generalization of task shaders for general purpose compute? Can all IHVs implement a CL2.0 device side enqueue mechanism efficiently?
And also: Would we have control about async execution of e.g. VK for graphics and CL for some compute at all?

The goal seems too high and not yet practical, maybe even for business and scientific application.
I expect we would get there faster with vendor APIs, and after some time, if vendors converge to similar HW a generalization would show off without much effort.

But what i don't get is all that constant rant against Khronos. It's all we have, and even MS within their smaller range of soft and hardware to maintain does not do any better. AMP? pfffhh... :/

DmitryKo · Apr 28, 2020

Lurkmass said:
It finally catches up to SYCL which was only released over 5 years ago

SYCL 1.x is not a language/compiler specification, it's an abstraction layer - a class framework implemented in ISO C++11, with a few features from C++14/17. Upcoming SYCL 2020/2021 will be based on ISO C++17/20.

'C++ for OpenCL' is implemented with Clang/LLVM, so it will support C++20/23/26 the same moment Clang supports it.

If you only ship vendor specific binaries then that pretty much kills OpenCL's 'portability'

The point is, OpenCL gives you a choice between using source code or intermediate bytecode, with offline or online compilation, or native executable code binaries.

Applications will probably continue compiling from OpenCL source code directly to the native binaries, and most vendors won't bother with implementing SPIR-V. It's definitely not required for native code compilation.

what exactly would the incentive be for AMD to develop OpenCL again when their HIP API is much more and better
AMD don't get anything out of supporting OpenCL

Open CL 3.0 upgrade requires minimal effort. The API is an extension of OpenCL 2.0, and C++ for OpenCL uses the same Clang/LLVM compiler infrastructure as HIP (which is based on CUDA 8.0 dialect of C++11).

I don't really understand why can't AMD port ROCm/HIP to Windows either.

If you're going to generate native binaries and possibly for even different vendors as well why not target vendor specific APIs too since they'll likely have more features, higher performance and most important of all less maintenance for the vendor ? .

Compute APIs are straightforward - the most troublesome part is vendor- and framework-specific conventions. C++-based abstraction layers are trying to solve this, but ultimately we need an unified ISO C++ specification.

BTW hipSYCL compiler can translate SYCL 1.2.1 source into HIP (CUDA) source code, consumable by respective compilers. Also DPC++ has been extended to target CUDA in addition to OpenCL.

DmitryKo · Apr 28, 2020

JoeJ said:
Can a C++ alike standard deal with different wavefront sizes? Can it give a reasonable abstraction over LDS memory?
Would we have control about async execution of e.g. VK for graphics and CL for some compute at all?

OpenCL/SYCL developers are working with the ISO C++ committee on a unified execution control model, based on 'executor' interfaces/properties to control such fine details. It's currently planned for C++23 or C++26.

i don't get is all that constant rant against Khronos. It's all we have

It's common widsom that an open standard can never succeed in replacing established proprietary standards, at least not until all patents have expired so royalties are no longer collectable.
https://www.explainxkcd.com/wiki/index.php/927:_Standards

Lurkmass · Apr 29, 2020

JoeJ said:
Yes. Especially if we still talk about games i'm not sure if a comfortable generic compute API is possible because HW differences force us to optimize per vendor anyways, if we want best performance.
Can a C++ alike standard deal with different wavefront sizes? Can it give a reasonable abstraction over LDS memory? Can it utilize (an eventually upcoming) generalization of task shaders for general purpose compute? Can all IHVs implement a CL2.0 device side enqueue mechanism efficiently?
And also: Would we have control about async execution of e.g. VK for graphics and CL for some compute at all?

The goal seems too high and not yet practical, maybe even for business and scientific application.
I expect we would get there faster with vendor APIs, and after some time, if vendors converge to similar HW a generalization would show off without much effort.

But what i don't get is all that constant rant against Khronos. It's all we have, and even MS within their smaller range of soft and hardware to maintain does not do any better. AMP? pfffhh... :/

I doubt GPUs will be able to seamlessly cope with standard C++ until at least a decade later from now. Why else do we have to use nonstandard C++ compilers like NVCC or HCC ? If CPU folks had it their way, they wouldn't want to duplicate significant parts of their codebase just to rewrite their algorithms into kernels for device execution and they would prefer it if APIs like CUDA or HIP just disappeared into the compilers altogether ...

As far as Khronos is concerned, they're failure is a big part of the reason why their members wasted so much effort in vain while CUDA went on to dominate for years to come and still do now. AMD literally developed 3 different OpenCL driver stacks (ORCA/PAL/ROCm) and only god knows how many times Intel kept entirely remaking their OpenCL drivers ? Compared to C++ AMP which died off rather gracefully, Khronos' OpenCL standard drove every one of it's members into constant hell.

DmitryKo said:
SYCL 1.x is not a language/compiler specification, it's an abstraction layer - a class framework implemented in ISO C++11, with a few features from C++14/17. Upcoming SYCL 2020/2021 will be based on ISO C++17/20.

'C++ for OpenCL' is implemented with Clang/LLVM, so it will support C++20/23/26 the same moment Clang supports it.

You mean like how AMD had their own "OpenCL Static C++ Kernel Language Extension" years ago which just shows how late OpenCL 3.0 is ? C++ for OpenCL is dead on arrival. If people weren't going to touch an AMD extension years ago to use such useful functionality then nothing is going to change now.

The point is, OpenCL gives you a choice between using source code or intermediate bytecode, with offline or online compilation, or native executable code binaries.

Applications will probably continue compiling from OpenCL source code directly to the native binaries, and most vendors won't bother with implementing SPIR-V. It's definitely not required for native code compilation.

Runtime compilation from the source ? Yuck ...

Now you're just falling into the same trap that plagued GLSL which was a part of the reason why OpenGL also died out. Shipping GLSL source in applications ended up being massively painful because of the inconsistencies in each different vendor's shader compiler. Being at the mercy of multiple compiler implementations is worse compared to having one bytecode to rule them all like we see with DXIL/SPIR-V shaders for D3D12 or Vulkan.

Open CL 3.0 upgrade requires minimal effort. The API is an extension of OpenCL 2.0, and C++ for OpenCL uses the same Clang/LLVM compiler infrastructure as HIP (which is based on CUDA 8.0 dialect of C++11).

I don't really understand why can't AMD port ROCm/HIP to Windows either.

OpenCL 3.0 can't be anymore minimal than it is because it's ultimately a step backwards! Also, it is not an extension to OpenCL 2.0 because an application that specifically uses one of it's features can't be guaranteed to run on all OpenCL 3.0 implementations ...

Also, a technical limitation was elaborated from a former AMD engineer as to why AMD can't have ROCm/HIP running on Windows but I have no idea if this has changed or not ever since. AMD does not want any part of OpenCL anymore so they won't even implement SPIR-V kernels too ... (Have you even seen how buggy their OpenCL drivers are getting in the recent years ?)

Compute APIs are straightforward - the most troublesome part is vendor- and framework-specific conventions. C++-based abstraction layers are trying to solve this, but ultimately we need an unified ISO C++ specification.

BTW hipSYCL compiler can translate SYCL 1.2.1 source into HIP (CUDA) source code, consumable by respective compilers.

Standard C++ isn't going to solve this by the end of this decade. hipSYCL isn't even usable for production.

DmitryKo · Apr 30, 2020

Lurkmass said:
I doubt GPUs will be able to seamlessly cope with standard C++

They're are already doing this, although with some vendor and framework-specific extensions.

AMD had their own "OpenCL Static C++ Kernel Language Extension" years ago which just shows how late OpenCL 3.0

Better late than never. It's standard ISO C++17/20 which makes a difference for tedious work, such as designing framework abstractions.

Shipping GLSL source in applications ended up being massively painful because of the inconsistencies in each different vendor's shader compiler. Being at the mercy of multiple compiler implementations is worse compared to having one bytecode to rule them

Yes, there are comprimises to make, and it's the reason why intermediate representations exist.

technical limitation was elaborated from a former AMD engineer as to why AMD can't have ROCm/HIP running on Windows

To me, it looks like their hardware actually needs some kernel hacks to work around virtual address space limitations and cache coherency issues.

OpenCL 3.0 can't be anymore minimal than it is because it's ultimately a step backwards!

From the specification that no vendor has ever fully implemented.

AMD does not want any part of OpenCL anymore

OK, good for them. There are translation layers that run on top of Vulkan or Direct3D 12.

Standard C++ isn't going to solve this by the end of this decade.

Why not, it could be used as an universal abstraction for proprietary extensions and APIs, with tools to convert/recompile sources between proprietary implementations.

Lurkmass · May 1, 2020

DmitryKo said:
They're are already doing this, although with some vendor and framework-specific extensions.

No, they really can't. GPU kernel languages still don't support features like exception handling, RTTI or the C++ standard library like we would see with standard C++ on CPUs.

Better late than never. It's standard ISO C++17/20 which makes a difference for tedious work, such as designing framework abstractions.

It's not even required functionality for conformant OpenCL 3.0 implementations last I checked so why would a new extension which offered the same functionality change anything in the grand scheme of things ?

Yes, there are comprimises to make, and it's the reason why intermediate representations exist.

The big players like AMD, Intel(?), and Nvidia stopped wanting compromises altogether so we have nothing in the end!

The only vendor left that will even implement SPIR-V kernels is Intel and even they'll probably offer tons of extensions that'll bypass it as much as possible because it's advantageous for their competitors to avoid it as well.

To me, it looks like their hardware actually needs some kernel hacks to work around virtual address space limitations and cache coherency issues.

It's mostly because Microsoft doesn't really care about HSA and even the AMD engineers admit that it's a Windows kernel limitation since not even the Linux subsystem will work with ROCm. CUDA also has less limitations on Linux so you might want to come to terms that Linux will always have the superior compute stack compared to either macOS or Windows because even Nvidia don't want to be at the mercy of Apple/Microsoft if it means losing features and performance or at the extreme case (macOS) which means losing your compute stack altogether on a platform.

From the specification that no vendor has ever fully implemented.

It's a dead end regardless ...

OK, good for them. There are translation layers that run on top of Vulkan or Direct3D 12.

Pretty much doomed to fail because neither D3D12 or Vulkan have have all of the same capabilities as OpenCL does. Now good luck with that because you'll arguably have a better chance at implementing a multi-vendor OpenCL solution using Mesa's clover stack but even a community project like that is pretty much dead as well.

Idealists like Codeplay are pretty much toothless without an army of driver or compiler engineers at their disposal.

Why not, it could be used as an universal abstraction for proprietary extensions and APIs, with tools to convert/recompile sources between proprietary implementations.

Just to give you an idea not even state of the art stuff like Nvidia's libcu++ can mimic the C++ standard library in it's entirety ...

DmitryKo · May 1, 2020

Lurkmass said:
exception handling, RTTI

These will be supplemented by more efficient static constructs in C++20/23.

or the C++ standard library
not even ... libcu++ can mimic the C++ standard library in it's entirety .

SYCL is focused on standard ISO C++17/20/23 instead of vendor-specific implementations.

It's not even required functionality for conformant OpenCL 3.0 implementations

C++ for OpenCL and OpenCL C languages are not optional.

The big players ... stopped wanting compromises

These same vendors sent their engineers to sit on the commitee and vote for the specification, then chose to not implement it. Not sure how this is Khronos Group's fault.

Also you don't like C++ source code, as much as native binary code or intermediate bytecode. Then what would be your no-compomise solution, exactly?

It's mostly because Microsoft doesn't really care about HSA and even the AMD engineers admit that it's a Windows kernel limitation

This had nothing to do with HSA (Heterogeneous System Architecture). WSL cannot load Linux drivers - it's not an isollated virtual machine, it's a 'lightweight' VM where Linux Kernel calls are mapped to the Windows kernel. WSL2 does include a customized Linux kernel but still no hardware isolation.

As for user-space work queries, WDDM driver does work in the user space since Windows Vista.

CUDA also has less limitations on Linux

This specific error was fixed in 2015 with a driver update.

It's a dead end regardless
Pretty much doomed to fail
Idealists like Codeplay are pretty much toothless

OK, I get your idea.

Lurkmass · May 2, 2020

DmitryKo said:
These will be supplemented by more efficient static constructs in C++20/23.

So you're conclusion is that GPU kernels won't have either exception handling or RTTI ? :smile2:

So does SYCL, which is focused on standard ISO C++17/20/23 instead of vendor-specific implementations.

I think you need to slow down because I specifically said, "not even Nvidia's libcu++ can mimic the C++ standard library" and no SYCL specification currently even attempts to solve this ...

C++ for OpenCL and OpenCL C languages are not optional.

There's no conformance tests for this so this can't really be enforced no matter how much you desire the feature. If AMD's vendor extension is anything to go by then it'll likely stay as an extension in the final specs ...

These same vendors sent their engineers to sit on the commitee and vote for the specification, then chose to not implement it. Not sure how this is Khronos Group's fault.

Also you don't like C++ source code, as much as native binary code or intermediate bytecode. Then what would be your no-compomise solution, exactly?

Yeah, AMD and Nvidia sent just one representative each compared to Codeplay and Intel sending in tons of representatives so neither AMD or Nvidia really cares about OpenCL anymore. Nvidia doesn't even send any representatives for SYCL! At most both AMD and Nvidia are only in the OpenCL working group to maintain future potential compatibility in case their own compute standard fails. There's wolves (AMD & Nvidia) pretending to be among the sheep herd (Codeplay & Intel) so kudos to Apple for the only ones showing their true feelings ?

The "no compromise" solution is pretty much what both AMD and Nvidia are doing right now which is providing vendor specific APIs like CUDA or HIP while ignoring BS like OpenCL as much as possible. BTW, Intel can have both OpenCL or SYCL since most serious developers will start to only target their implementation so they can pretty much forget about getting their OpenCL/SYCL applications running on anything besides Intel ...

This had nothing to do with HSA (Heterogeneous System Architecture). WSL cannot load Linux drivers - it's not an isollated virtual machine, it's a 'lightweight' VM where Linux Kernel calls are mapped to the Windows kernel. WSL2 does include a customized Linux kernel but still no hardware isolation.

As for user-space work queries, WDDM driver does work in the user space since Windows Vista.

It pretty much does have to do with HSA because ROCm is just an extension of AMD's HSA kernel drivers and until WDDM ships with at least the same capabilities then there's no room for ROCm on Windows.

This specific error was fixed in 2015 with a driver update.

Nope ...

I think it's time to embrace Linux as the true platform for compute because that's where the market is and there's no facing conflicts of interests from either Apple or Microsoft ...

DmitryKo · May 5, 2020

Lurkmass said:
GPU kernels won't have either exception handling or RTTI

Just like the majority of real-world C++ projects which avoid these features because of significant programming and performance overhead.

"not even Nvidia's libcu++ can mimic the C++ standard library" and no SYCL specification currently even attempts to solve this

It does not need to. SYCL is a C++ class abstraction framework, it relies on standard C++17 compilers to provide C++ Standard Library and Standard Template Library, and most implementations use Clang/LLVM.

libcu++ allows parallel heterogeneous memory access to objects in local and system memory, and SYCL is supposed to do the same by analyzing the source code with an optimizing compiler for each specific implementation's device targets and runtime.

The "no compromise" solution ... is providing vendor specific APIs like CUDA or HIP while ignoring BS like OpenCL

You want to replace open APIs and open-source tools for multiple platforms with a single proprietary API (since HIP is a verbatim copy of CUDA) that's only available on Linux and Windows. It's probably fine for HPC workloads, which run on specific fixed hardware and software configurations, but it will be a nightmare for anyone else.

neither AMD or Nvidia really cares about OpenCL anymore. Nvidia doesn't even send any representatives for SYCL
there's wolves (AMD & Nvidia) pretending to be among the sheep herd (Codeplay & Intel)

Is this is really Khronos' fault? Anyway, there are several open-source SYCL implementations which target OpenCL C source code, CUDA/HIP source code, Vulkan SPIR-V byte code, CUDA PTX bytecode, and even DXIL bytecode (with DXIL translation tools).

There's no conformance tests for this so this can't really be enforced

Khronos does perform OpenCL conformance tests for its members.

Nope

Looks like some MMU page table restriction to me. Nvidia GPUs support a large virtual address space (49-bit on Pascal / Turing) since Kepler, but their CUDA implementations only allow GPU memory 'oversubscription' - i.e. allocating system memory to the GPU virtual address space - in Linux, bit not Windows or MacOS, and their WDDM drivers only support 40-bit GPU virtual address space (1TB).

until WDDM ships with at least the same capabilities then there's no room for ROCm on Windows.

WDDM 2.7 and Windows 10 version 2004 introduce a lightweight compute-only driver model (MCDM) and a new feature level (1_0_CORE) which only support the Direct3D12 compute pipeline, but not display or rendering pipeline capabilities. This is similar to Tesla Computing Cluster (TCC) mode for Tesla / Quadro drivers.

Lurkmass · May 6, 2020

DmitryKo said:
Just like the majority of real-world C++ projects which avoid these features because of significant programming and performance overhead.

It still doesn't change the fact that GPU kernel languages are missing standard C++ features so that doesn't mean anyone can claim that they have full support for C++.

It does not need to. SYCL is a C++ class abstraction framework, it relies on standard C++17 compilers to provide C++ Standard Library and Standard Template Library, and most implementations use Clang/LLVM.

Clang/LLVM has many backends but not all of them like Nvidia's PTX backend can support standard C++. Clang/LLVM support does not confer support for standard C++ so it's absolutely necessary for the target hardware itself to support these features to claim full support for standard C++.

libcu++ allows parallel heterogeneous memory access to objects in local and system memory, and SYCL is supposed to do the same by analyzing the source code with an optimizing compiler for each specific implementation's device targets and runtime.

There might be a SYCL extension or an implementation that could support a similar features as found in libcu++ but it is absolutely not required to support SYCL.

You want to replace open APIs and open-source tools for multiple platforms with a single proprietary API (since HIP is a verbatim copy of CUDA) that's only available on Linux and Windows. It's probably fine for HPC workloads, which run on specific fixed hardware and software configurations, but it will be a nightmare for anyone else.

CUDA/HIP's intended market is for high-end compute and since CUDA has become the defacto industry standard. I don't see your issue of having to maintain another vendor specific standard. Politics will prevent multi-vendor standards proliferating in this case so it's a necessity for each vendor to bring their own standards to compete against each other. Also HIP's implementation is open source so any vendor can see for themselves and implement it if they want even if the specs are controlled by AMD. Maybe Intel should be the one to implement HIP instead of having AMD waste time with another potentially zombie standard like OpenCL/HSAIL/SYCL if they're so interested in converging on a multi-vendor industry standard ?

Is this really Khronos' fault? Anyway, there are several open-source SYCL implementations which target OpenCL C source code, CUDA/HIP source code, Vulkan SPIR-V byte code, CUDA PTX bytecode, and even DXIL bytecode (with DXIL translation tools).

Khronos isn't entirely at fault but yes they do hold some responsibility for this mess since they kept placating towards a couple of terrible members like Apple or Nvidia. Having SYCL as a translation layer over superior APIs isn't sustainable in the long term. Codeplay don't have loads of driver or compiler engineers laying around to even come close making a quality SYCL implementation over CUDA and the author behind hipSYCL is just a single person. Compute standards just aren't meant to be community projects since even Mesa's clover project died out despite having the most community backing out of all of the projects.

If portability is as pressing of an issue as you make it out to be then relying on Khronos isn't the answer IMO. Let AMD and Intel handle this with tools like hipify or the DPC++ compatibility tool which will convert high level CUDA source into HIP source or DPC++ source. How about that ? CUDA arguably is a stronger proposition in terms of portability compared to either Khronos' OpenCL or SYCL standard because even AMD and Intel think that CUDA is the way to go ...

Khronos does perform OpenCL conformance tests for its members.

I see conformance tests for OpenCL C++ but I don't see any tests C++ for OpenCL so I assume it isn't required for OpenCL 3.0 since it is not explicitly stated otherwise.

Looks like some MMU page table restriction to me. Nvidia GPUs support a large virtual address space (49-bit on Pascal / Turing) since Kepler, but their CUDA implementations only allow GPU memory 'oversubscription' - i.e. allocating system memory to the GPU virtual address space - in Linux, bit not Windows or MacOS, and their WDDM drivers only support 40-bit GPU virtual address space (1TB).

Which is exactly why you should ditch Windows otherwise no using cudaMallocManaged to do oversubscription for you ...

Just to further drive home the point why Microsoft doesn't care about high-end compute, AMD once offered an 'upgraded' version of C++ AMP known as HCC so not even Microsoft had any real successor to C++ AMP meanwhile AMD's HCC API was the only option if you wanted a spiritual successor ...

Lurkmass · May 8, 2020

And just like that AMD deprecated their only sane way left for developing OpenCL applications on Windows. CodeXL isn't being maintained anymore by AMD and they dropped it briefly after the release of the OpenCL 3.0 provisional specs ...

JoeJ · May 8, 2020

Lurkmass said:
CodeXL isn't being maintained anymore by AMD

Oh great. When i moved to Vulkan there were no AMD profiling tools yet, so i kept an OpenCL branch just to use CodeXL.
Now AMD first dropped OpenCL support on their CPUs, and now this. I do not understand it. OpenCL was their only option to compete CUDA on consumer hardware. This is still important. think of Adobe etc.
So why do they loose any interest on OpenCL although there is no alternative?

Now i hope profiling tools for Vulkan are fine now. Those game frame analyzing tools are not very useful for compute. Last time i tried Radeon GPU profiler, i could not even find register and LDS usage per shader. But this was very early and i hope it became better.

I'd also love to see some proper shading language with pointers for Vulkan. GLSL / HLSL sucks - can not cast pointers to change type of LDS memory. And i surely don't want to invent my own language and Spir-V transpiler crap.
I hoped Vulkan would adopt OpenCL C shaders. There seems some stuff around but it's never fool proof and easy. Too lazy to figure out how to use this with Vulkan buffers, textures and constants.

Lurkmass · May 8, 2020

JoeJ said:
Oh great. When i moved to Vulkan there were no AMD profiling tools yet, so i kept an OpenCL branch just to use CodeXL.
Now AMD first dropped OpenCL support on their CPUs, and now this. I do not understand it. OpenCL was their only option to compete CUDA on consumer hardware. This is still important. think of Adobe etc.
So why do they loose any interest on OpenCL although there is no alternative?

It's not as important as you think anymore because many OpenCL applications/backends have fallen out of maintenance and there's only one adobe product that uses OpenCL. I don't think it's true that there are no alternatives to OpenCL in their case. CPUs for instance are starting to become powerful and if you're looking to only do compute then you can check out HIP. Some others will transition to industry standards like D3D12 or Vulkan.

These are ultimately the alternatives that AMD are filling the vacuum with in the future.

Now i hope profiling tools for Vulkan are fine now. Those game frame analyzing tools are not very useful for compute. Last time i tried Radeon GPU profiler, i could not even find register and LDS usage per shader. But this was very early and i hope it became better.

RGP 1.3 does show data like register and LDS usage for SPIR-V or DXIL shaders.

I'd also love to see some proper shading language with pointers for Vulkan. GLSL / HLSL sucks - can not cast pointers to change type of LDS memory. And i surely don't want to invent my own language and Spir-V transpiler crap.
I hoped Vulkan would adopt OpenCL C shaders. There seems some stuff around but it's never fool proof and easy. Too lazy to figure out how to use this with Vulkan buffers, textures and constants.

You can use buffer references with SPIR-V shaders which are a limited form of pointers but as you mentioned this does not work with local memory and it only applies to global memory. Chances are slim if Khronos will ever expose OpenCL C kernels in Vulkan.

You're going to have to move on eventually. The future looks brighter than ever before with CPUs and new industry standards like D3D12/Vulkan will offer quite a bit of longevity on a practical basis. If you must insist on a compute API then CUDA is probably the way to go and if you want to target more hardware then you should look at using hipify or DPC++ compatibility tool because I'm very skeptical about the long-term support of other APIs like HIP or DPC++ and even I think the lifespan of CUDA support is too short given the deprecation of Kepler and 1st gen Maxwell. At most, CUDA will only be supported for 6 to 7 years from the launch of the architecture before it's retired. If you're looking to use the newest features then the payoff of using vendor specific compute APIs might not be too bad ...

OpenCL 3.0 [2020]

Deleted member 2197

Guest