To add further context: http://gpuopen.com/dcc-overview/ describes the addition of delta compression to the color block.
I cannot, AFAIK it's source is not open and I don't have access to it.Can you post the kernel source as a reference to that chart?
That test IMHO is quite erratic, so take this with another extra dose of salt. Erratic in the sense that the results can vary a couple of hundred GB/s from run to run. I took the best out of ~10 tries, so here you go.Ahh, that's the old GPCBenchmark. Carsten, can you post some numbers from the local memory sub-test with Fiji and Hawaii?
The Fiji block diagram seems to imply otherwise:By the way, Fiji doubles the L2 size because of the doubled count of the memory controllers -- 32*64KB partitions = 2048KB, the bandwidth should also scale proportionally.
Thanks.That test IMHO is quite erratic, so take this with another extra dose of salt. Erratic in the sense that the results can vary a couple of hundred GB/s from run to run. I took the best out of ~10 tries, so here you go.
Thanks.
Indeed, it is erratic. I noticed that the application doesn't trigger the highest P-state or boost clock on my 980Ti. I have to find a way to run the tests with power management off somehow.
We could do with the slides from some of the other presentations that I linked the URL for, they also seemed to have some very interesting real world experiences.want more leaked slides Q_Q
DX11 drivers are able to circumvent HW pitfalls. We’re matching DX11 GPU perf on Maxwell + AMD.
CPU perf: Sure DX12 can be much faster, but if your engine design is such that you don’t swamp the API with draw calls, the actual API overhead might not be significant in your overall CPU cost. We saved ~10% overall renderer time
From MS presentation
Faster porting between Xbox and PC?