Intel Gen Architecture Discussion

  • The OpenCL compiler still has nasty bugs. The worst are on Broadwell. (Bug+repro reported months ago but apparently the Intel OpenCL team doesn't fix bugs in the summer time)

Summer must be over since, as of yesterday, the showstopper bugs I was seeing in Broadwell's OpenCL 2.x driver are fixed in Windows 10 (driver 4256) and there is a new Code Builder. Huzzah!

I think I'm seeing the result of Broadwell Gen8's doubled integer throughput too.

FWIW, GPU-Z reports a 950 MHz HD 6000 using 5.5-6.0 Watts when continuously sorting 131K 64-bit keys and only 4 Watts for 262K 32-bit keys.

We're really at the dawn of the golden age of GPUs where even power-sensitive GPUs have significant FLOPS.
 
Interesting. Haswell turns out even more limited in hardware than I imagined. Intel just upgraded Broadwell and Skylake to OpenGL 4.4., but Haswell is staying on 4.3.
Now I'm not saying this is a bad thing, as there certainly were far more important features to include/improve when Haswell was being developed. I had just expected that as Fermi got OpenGL 4.5, that the OpenGL 4.4 extensions do not require any actual hardware support.
The missing extension seems to be GL_ARB_bindless_texture – any connection to DX12 resource binding?
 
I had just expected that as Fermi got OpenGL 4.5, that the OpenGL 4.4 extensions do not require any actual hardware support.
The missing extension seems to be GL_ARB_bindless_texture – any connection to DX12 resource binding?
GL_ARB_indirect_parameters is also problematic. However DX12 ExecuteIndirect is a superset of it, and Haswell can emulate it just fine.
 
Interesting. Haswell turns out even more limited in hardware than I imagined. Intel just upgraded Broadwell and Skylake to OpenGL 4.4., but Haswell is staying on 4.3.
The only real hardware functionality differences between Haswell and Broadwell are likely related to ARB_sparse_texture. Haswell's limited VA makes a useful implementation of sparse textures difficult (i.e. the virtual texture size limitations are typically too severe to be very useful). ARB_bindless_texture is likely not supported on either of them, but I don't think bindless textures are required for GL 4.4 (Fermi wouldn't support 4.4 then) or even 4.5? For that matter, I don't think sparse textures are required by any current GL either.

The difference here may not even be related to hardware architecture.
 
Does anyone know if Intel GPU's have a dedicated frame and depth buffer cache? Or if the depth or frame buffer reads/writes go through the standard cache?
 
Does anyone know if Intel GPU's have a dedicated frame and depth buffer cache? Or if the depth or frame buffer reads/writes go through the standard cache?
Both - they each have small dedicated caches which are themselves backed by the GPU's L3, then the SoC's LLC/eLLC.
 
Back
Top