Recent content by MDolenc

M
IHV Business strategies and consumer choice

Remind me when did GeForce 3 come out (aka NV20)? OlegSH is right.
- MDolenc
- Post #80
- Sep 6, 2023
- Forum: Graphics and Semiconductor Industry
M
Investigation into different Polygon pipeline performance *spawn

Say you are rendering a rectangle. That's 2 triangles. So how many vertices do you use? 4 or 6? With indices you use 4 vertices + 6 indices and without indices you have 6 vertices. Depends. Multi draw indirect is also something quite powerfull in combination with compute shaders. Mesh shaders...
- MDolenc
- Post #7
- Dec 8, 2022
- Forum: Rendering Technology and APIs
M
Nvidia Ampere Discussion [2020-05-14]

SMs can issue 4 warps per clock, warp is 32 threads. There are 4 fp32 SIMDs that are 16 wide and 4 fp32+int32 SIMDs that are also 16 wide. Issuing a warp to SIMD will take 2 clocks to consume. So other combinations are possible if warps are available.
- MDolenc
- Post #1,334
- Sep 7, 2020
- Forum: Architecture and Products
M
Nvidia Ampere Discussion [2020-05-14]

That's said it much better yes. :) The implication from my post might be that there's a ton of RF bandwidth available if not using tensor cores which is not the case. Tensor cores could also raise number of active registers of a kernel thus hurting occupancy a bit.
- MDolenc
- Post #1,261
- Sep 5, 2020
- Forum: Architecture and Products
M
Nvidia Ampere Discussion [2020-05-14]

So if I understand you correctly you mean that general purpose FP16 should be even higher then x4 (78 Tflops) but it's not due to RF usage/bandwidth? 312 Tflops figure does burn pretty much all the RF bandwidth and the 78 Tflops does not. But TC are special pieces of hardware. They for example...
- MDolenc
- Post #1,257
- Sep 5, 2020
- Forum: Architecture and Products
M
Nvidia Ampere Discussion [2020-05-14]

FP16 are 2x faster so no. Bandwidth for matrix multiplications depends a lot on how large chunks of both matrices you can keep as close to the ALUs as possible. In context of tensor cores this basically means register file directly. If I remember correctly there were some investigations around...
- MDolenc
- Post #1,245
- Sep 5, 2020
- Forum: Architecture and Products
M
Nvidia Ampere Discussion [2020-05-14]

No. Data leaving SMs can be compressed prior to being written to L2 or memory. Afterwards if compute is accessing that data again it will be read in compressed form to L2. So you can save bandwidth on the way out of the GPU and on the way back in the GPU as well as increase available L2 cache...
- MDolenc
- Post #214
- Jun 1, 2020
- Forum: Architecture and Products
M
Nvidia Ampere Discussion [2020-05-14]

It sort of does reduce footprint. It can keep data compressed in L2 so there is more cache available. Can't reduce footprint in main memory as you don't know in advance if output could be compressed or not.
- MDolenc
- Post #210
- Jun 1, 2020
- Forum: Architecture and Products
M
Direct3D feature levels discussion

Cool, so NV supports this with standard D3D now.
- MDolenc
- Post #974
- Jul 12, 2019
- Forum: Rendering Technology and APIs
M
Direct3D feature levels discussion

So does Turing (finally). But I think the point of wtf was that neither expose them directly through D3D. You have to use both IHVs custom hacks/extensions.
- MDolenc
- Post #970
- Jul 11, 2019
- Forum: Rendering Technology and APIs
M
Nvidia shows signs in [2019]

I meant GPU only. But I missed that there's the option that AIBs can get a bundle with memory included. So yeah, my guess work was off and would be hard to refine. According to JPR 9.9M AIB cards were shipped in Q3 2019 and down 36% year to year.
- MDolenc
- Post #8
- Jan 7, 2019
- Forum: Graphics and Semiconductor Industry
M
Nvidia shows signs in [2019]

Some quick math: ETH hash rate went from ~90 TH/s in August 2017 to ~290 TH/s in August 2018. At 20MH/s for GTX 1060 that's about 10M GPUs added in a year. Reported NV gaming revenue for Q3 FY 2019 is $1764M. If average NV price for GPU is $100 that makes it 17M GPUs per quarter. So what am I...
- MDolenc
- Post #5
- Jan 6, 2019
- Forum: Graphics and Semiconductor Industry
M
AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Well, that's just a no. Unified memory is already here and I have missed news about huge efficiency of sticking say 2 Vegas in CF.
- MDolenc
- Post #5,789
- Dec 27, 2018
- Forum: Architecture and Products
M
Nvidia Turing Architecture [2018]

Still. Mesh shaders are on chip and integrated into graphics pipeline. Compute approach needs a round trip to memory.
- MDolenc
- Post #230
- Dec 24, 2018
- Forum: Architecture and Products
M
AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Vega 64 with 1:2 double precision, additional deep learning instructions and interconnect. Nothing of this is particularly gamey.
- MDolenc
- Post #5,518
- Oct 2, 2018
- Forum: Architecture and Products