Recent content by OpenGL guy

O
Vulkan is a GCN low level construct?

Have you tried Quake 4 on a GTX 1080? I have and it's atrocious.
- OpenGL guy
- Post #60
- Sep 1, 2016
- Forum: Rendering Technology and APIs
O
Vulkan is a GCN low level construct?

Page flip, normally. I.e. switch front buffer to back and vice versa.
- OpenGL guy
- Post #59
- Sep 1, 2016
- Forum: Rendering Technology and APIs
O
Vulkan is a GCN low level construct?

Since Vista, all graphics APIs have been primarily user-mode. Prior to Vista, OpenGL was always user-mode except for "swap buffers" which was handled by the KMD. There is a still a KMD component for memory management, display management, etc.
- OpenGL guy
- Post #54
- Aug 31, 2016
- Forum: Rendering Technology and APIs
O
GCN and mixed wavefronts

Note that those "16"s are not related to each other at all since if you had a workgroup with 16 wavefronts, you couldn't store 16 workgroups in a CU :)
- OpenGL guy
- Post #11
- Aug 31, 2016
- Forum: Architecture and Products
O
Why is AMD losing the next gen race to Nvidia?

GCN has 64KB of registers per SIMD, so 256KB per CU: 4 SIMDs per CU * 64 threads per SIMD * 256 registers per thread per SIMD * 4 bytes per register = 256 KB per CU.
- OpenGL guy
- Post #50
- Aug 22, 2016
- Forum: Architecture and Products
O
Nvidia Pascal Reviews [1080XP, 1080ti, 1080, 1070ti, 1070, 1060, 1050, and 1030]

No problem getting over 230 GB/s. About 277 GB/s pure read b/w and 275 GB/s pure write b/w on a FE GTX 1080. Getting over 300 GB/s would be pretty incredible as that would be well over 90% memory utilization, which is tough on any GPU I've seen. Also, you can easily check the memory clock in...
- OpenGL guy
- Post #487
- Jul 20, 2016
- Forum: Architecture and Products
O
AMD: Speculation, Rumors, and Discussion (Archive)

I assume you mean that GPU A is supposed to be Nvidia Maxwell or Pascal. You should note that Maxwell takes at least 4 warps per SMM to get peak ALU rate since there are 4 vector units per SMM. A single warp per SMM can only harness, at best, 1/4 of the SMM's ALU horsepower, and at worst 1/24th.
- OpenGL guy
- Post #1,700
- May 16, 2016
- Forum: Architecture and Products
O
AMD: Speculation, Rumors, and Discussion (Archive)

sebbbi is correct if referring to how a CU works. On GCN, there are 4 SIMDs per CU. Each SIMD executes a typical instruction in 4 clocks as there are 16 ALUs per SIMD so a 64-thread wavefront takes 4 clocks to process.
- OpenGL guy
- Post #108
- Jul 14, 2015
- Forum: Architecture and Products
O
AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

I agree that PC tools are lacking in this regard. You only need to worry about the bandwidth of spilling if your kernel is largely memory-bound. If you are compute-bound, then you might have enough work to hide the latency and you likely have bandwidth to spare. This is why it's crucial to...
- OpenGL guy
- Post #2,517
- Jul 6, 2015
- Forum: Architecture and Products
O
AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

Right, but, as I stated, you are free to spill. This is not always the case and I will leave it at that. Also, how would the compiler report warnings about spilling? Sure, this is possible in OpenCL where there is a log for the compilation, but what about other APIs? Spilling is not always...
- OpenGL guy
- Post #2,515
- Jul 6, 2015
- Forum: Architecture and Products
O
AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

A kernel is free to use as many registers as it needs, it's the compiler that has to work within the limits of the hardware. If the kernel uses more registers than are available, then spilling will occur. With a work group of 1024 threads, you will get up to 64 registers per work-item on GCN...
- OpenGL guy
- Post #2,512
- Jul 6, 2015
- Forum: Architecture and Products
O
AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

Demonstrably false. DirectCompute requires support for work group sizes of at least 1024 threads, for example. This has been recommended on GPU for ages. See the huge progress made with LuxRenderer for another example.
- OpenGL guy
- Post #2,501
- Jul 6, 2015
- Forum: Architecture and Products
O
AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

OpenCL 2.0 platform does not mean all devices are OpenCL 2.0. End of discussion.
- OpenGL guy
- Post #1,825
- Jun 14, 2015
- Forum: Architecture and Products
O
Hairworks, apprehensions about closed source libraries proven beyond reasonable doubt?

You don't know that. There is more to performance than shader optimizations. It doesn't matter how many there are, what matters is how important they are to reviews and gamers, wouldn't you agree? Windows API? What is that? Do you mean the DDI layer? If so, how would one correlate the DDI...
- OpenGL guy
- Post #100
- May 25, 2015
- Forum: Graphics and Semiconductor Industry
O
Hairworks, apprehensions about closed source libraries proven beyond reasonable doubt?

So suppose AMD does these optimizations. What then? Do you think they have absolutely no CPU overhead? Many people complain about the apparent increased CPU overhead of AMD's drivers relative to Nvidia, yet they never stop to consider why that might be true. Regarding the closed source library...
- OpenGL guy
- Post #65
- May 23, 2015
- Forum: Graphics and Semiconductor Industry