Recent content by MDolenc

  1. M

    IHV Business strategies and consumer choice

    Remind me when did GeForce 3 come out (aka NV20)? OlegSH is right.
  2. M

    Investigation into different Polygon pipeline performance *spawn

    Say you are rendering a rectangle. That's 2 triangles. So how many vertices do you use? 4 or 6? With indices you use 4 vertices + 6 indices and without indices you have 6 vertices. Depends. Multi draw indirect is also something quite powerfull in combination with compute shaders. Mesh shaders...
  3. M

    Nvidia Ampere Discussion [2020-05-14]

    SMs can issue 4 warps per clock, warp is 32 threads. There are 4 fp32 SIMDs that are 16 wide and 4 fp32+int32 SIMDs that are also 16 wide. Issuing a warp to SIMD will take 2 clocks to consume. So other combinations are possible if warps are available.
  4. M

    Nvidia Ampere Discussion [2020-05-14]

    That's said it much better yes. :) The implication from my post might be that there's a ton of RF bandwidth available if not using tensor cores which is not the case. Tensor cores could also raise number of active registers of a kernel thus hurting occupancy a bit.
  5. M

    Nvidia Ampere Discussion [2020-05-14]

    So if I understand you correctly you mean that general purpose FP16 should be even higher then x4 (78 Tflops) but it's not due to RF usage/bandwidth? 312 Tflops figure does burn pretty much all the RF bandwidth and the 78 Tflops does not. But TC are special pieces of hardware. They for example...
  6. M

    Nvidia Ampere Discussion [2020-05-14]

    FP16 are 2x faster so no. Bandwidth for matrix multiplications depends a lot on how large chunks of both matrices you can keep as close to the ALUs as possible. In context of tensor cores this basically means register file directly. If I remember correctly there were some investigations around...
  7. M

    Nvidia Ampere Discussion [2020-05-14]

    No. Data leaving SMs can be compressed prior to being written to L2 or memory. Afterwards if compute is accessing that data again it will be read in compressed form to L2. So you can save bandwidth on the way out of the GPU and on the way back in the GPU as well as increase available L2 cache...
  8. M

    Nvidia Ampere Discussion [2020-05-14]

    It sort of does reduce footprint. It can keep data compressed in L2 so there is more cache available. Can't reduce footprint in main memory as you don't know in advance if output could be compressed or not.
  9. M

    Direct3D feature levels discussion

    Cool, so NV supports this with standard D3D now.
  10. M

    Direct3D feature levels discussion

    So does Turing (finally). But I think the point of wtf was that neither expose them directly through D3D. You have to use both IHVs custom hacks/extensions.
  11. M

    Nvidia shows signs in [2019]

    I meant GPU only. But I missed that there's the option that AIBs can get a bundle with memory included. So yeah, my guess work was off and would be hard to refine. According to JPR 9.9M AIB cards were shipped in Q3 2019 and down 36% year to year.
  12. M

    Nvidia shows signs in [2019]

    Some quick math: ETH hash rate went from ~90 TH/s in August 2017 to ~290 TH/s in August 2018. At 20MH/s for GTX 1060 that's about 10M GPUs added in a year. Reported NV gaming revenue for Q3 FY 2019 is $1764M. If average NV price for GPU is $100 that makes it 17M GPUs per quarter. So what am I...
  13. M

    AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

    Well, that's just a no. Unified memory is already here and I have missed news about huge efficiency of sticking say 2 Vegas in CF.
  14. M

    Nvidia Turing Architecture [2018]

    Still. Mesh shaders are on chip and integrated into graphics pipeline. Compute approach needs a round trip to memory.
  15. M

    AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

    Vega 64 with 1:2 double precision, additional deep learning instructions and interconnect. Nothing of this is particularly gamey.
Back
Top