Have you tried Quake 4 on a GTX 1080? I have and it's atrocious.
Page flip, normally. I.e. switch front buffer to back and vice versa.
Since Vista, all graphics APIs have been primarily user-mode. Prior to Vista, OpenGL was always user-mode except for "swap buffers" which was...
Note that those "16"s are not related to each other at all since if you had a workgroup with 16 wavefronts, you couldn't store 16 workgroups in a...
GCN has 64KB of registers per SIMD, so 256KB per CU: 4 SIMDs per CU * 64 threads per SIMD * 256 registers per thread per SIMD * 4 bytes per...
No problem getting over 230 GB/s. About 277 GB/s pure read b/w and 275 GB/s pure write b/w on a FE GTX 1080. Getting over 300 GB/s would be...
I assume you mean that GPU A is supposed to be Nvidia Maxwell or Pascal. You should note that Maxwell takes at least 4 warps per SMM to get peak...
sebbbi is correct if referring to how a CU works. On GCN, there are 4 SIMDs per CU. Each SIMD executes a typical instruction in 4 clocks as there...
I agree that PC tools are lacking in this regard. You only need to worry about the bandwidth of spilling if your kernel is largely memory-bound....
Right, but, as I stated, you are free to spill. This is not always the case and I will leave it at that. Also, how would the compiler report...
A kernel is free to use as many registers as it needs, it's the compiler that has to work within the limits of the hardware. If the kernel uses...
Demonstrably false. DirectCompute requires support for work group sizes of at least 1024 threads, for example. This has been recommended on GPU...
OpenCL 2.0 platform does not mean all devices are OpenCL 2.0. End of discussion.
You don't know that. There is more to performance than shader optimizations. It doesn't matter how many there are, what matters is how important...
So suppose AMD does these optimizations. What then? Do you think they have absolutely no CPU overhead? Many people complain about the apparent...