GPU kernels won't have either exception handling or RTTI
Just like the majority of real-world C++ projects which avoid these features because of
significant programming and performance overhead.
"not even Nvidia's libcu++ can mimic the C++ standard library" and no SYCL specification currently even attempts to solve this
It does not need to. SYCL is a C++ class abstraction framework, it relies on standard C++17 compilers to provide
C++ Standard Library and
Standard Template Library, and most implementations use Clang/LLVM.
libcu++ allows parallel heterogeneous memory access to objects in local and system memory, and SYCL is supposed to do the same by analyzing the source code with an optimizing compiler for each specific implementation's device targets and runtime.
The "no compromise" solution ... is providing vendor specific APIs like CUDA or HIP while ignoring BS like OpenCL
You want to replace open APIs and open-source tools for multiple platforms with a single proprietary API (since HIP is a verbatim copy of CUDA) that's only available on Linux and Windows. It's probably fine for HPC workloads, which run on specific fixed hardware and software configurations, but it will be a nightmare for anyone else.
neither AMD or Nvidia really cares about OpenCL anymore. Nvidia doesn't even send any representatives for SYCL
there's wolves (AMD & Nvidia) pretending to be among the sheep herd (Codeplay & Intel)
Is this is really Khronos' fault? Anyway, there are several open-source SYCL implementations which target OpenCL C source code, CUDA/HIP source code, Vulkan SPIR-V byte code, CUDA PTX bytecode, and even DXIL bytecode (with DXIL translation tools).
There's no conformance tests for this so this can't really be enforced
Khronos does perform
OpenCL conformance tests for its members.
Looks like some MMU page table restriction to me. Nvidia GPUs support a
large virtual address space (
49-bit on Pascal / Turing) since Kepler, but their CUDA implementations only allow
GPU memory 'oversubscription' - i.e. allocating system memory to the GPU virtual address space - in Linux, bit not Windows or MacOS, and their WDDM drivers only support
40-bit GPU virtual address space (1TB).
until WDDM ships with at least the same capabilities then there's no room for ROCm on Windows.
WDDM 2.7 and Windows 10 version 2004 introduce a lightweight
compute-only driver model (MCDM) and a new feature level (1_0_CORE) which only support the Direct3D12 compute pipeline, but not display or rendering pipeline capabilities. This is similar to
Tesla Computing Cluster (TCC) mode for Tesla / Quadro drivers.