Say you are rendering a rectangle. That's 2 triangles. So how many vertices do you use? 4 or 6? With indices you use 4 vertices + 6 indices and without indices you have 6 vertices.
Depends. Multi draw indirect is also something quite powerfull in combination with compute shaders. Mesh shaders...
SMs can issue 4 warps per clock, warp is 32 threads. There are 4 fp32 SIMDs that are 16 wide and 4 fp32+int32 SIMDs that are also 16 wide. Issuing a warp to SIMD will take 2 clocks to consume. So other combinations are possible if warps are available.
That's said it much better yes. :)
The implication from my post might be that there's a ton of RF bandwidth available if not using tensor cores which is not the case. Tensor cores could also raise number of active registers of a kernel thus hurting occupancy a bit.
So if I understand you correctly you mean that general purpose FP16 should be even higher then x4 (78 Tflops) but it's not due to RF usage/bandwidth? 312 Tflops figure does burn pretty much all the RF bandwidth and the 78 Tflops does not. But TC are special pieces of hardware. They for example...
FP16 are 2x faster so no. Bandwidth for matrix multiplications depends a lot on how large chunks of both matrices you can keep as close to the ALUs as possible. In context of tensor cores this basically means register file directly. If I remember correctly there were some investigations around...
No. Data leaving SMs can be compressed prior to being written to L2 or memory. Afterwards if compute is accessing that data again it will be read in compressed form to L2. So you can save bandwidth on the way out of the GPU and on the way back in the GPU as well as increase available L2 cache...
It sort of does reduce footprint. It can keep data compressed in L2 so there is more cache available. Can't reduce footprint in main memory as you don't know in advance if output could be compressed or not.
So does Turing (finally). But I think the point of wtf was that neither expose them directly through D3D. You have to use both IHVs custom hacks/extensions.
I meant GPU only. But I missed that there's the option that AIBs can get a bundle with memory included. So yeah, my guess work was off and would be hard to refine.
According to JPR 9.9M AIB cards were shipped in Q3 2019 and down 36% year to year.
Some quick math: ETH hash rate went from ~90 TH/s in August 2017 to ~290 TH/s in August 2018. At 20MH/s for GTX 1060 that's about 10M GPUs added in a year. Reported NV gaming revenue for Q3 FY 2019 is $1764M. If average NV price for GPU is $100 that makes it 17M GPUs per quarter.
So what am I...