Recent content by RecessionCone

  1. R

    NVidia Ada Speculation, Rumours and Discussion

    People forget that A100 has a big cache already.
  2. R

    CryptoCurrency Mining with GPUs *spawn*

    Ethereum mining doesn’t need TOPs. It needs GB/s of random memory accesses. Performance depends on the memory subsystem, not so much on the cores.
  3. R

    Nvidia Ampere Discussion [2020-05-14]

    CarstenS: tensor cores lower register file bandwidth per math operation because they work on tensors rather than scalars (N^3/N^2). Nvidia actually shows this phenomenon in their animated tensor core cartoons in their keynotes. So it’s likely they are close to peak RF bandwidth both at 78...
  4. R

    Huge Explosion in port of Beirut [2020-08]

    2.7M grams. 2.7M kg would be a thousand times more.
  5. R

    Digital Foundry Article Technical Discussion [2020]

    VGTech just does FPS benchmarking right? I’ve never seen an analysis as in-depth or insightful as DF anywhere else. FPS benchmarks aren’t a dime a dozen but don’t inform about the broader issues.
  6. R

    Nvidia DLSS 1 and 2 antialiasing discussion *spawn*

    DLSS makes the game faster only when the frame rate is low. If the frame rate is high, the cost of running the neural network, even with tensor cores, will dominate the rendering time, meaning you won’t see a performance improvement.
  7. R

    AMD: Navi Speculation, Rumours and Discussion [2017-2018]

    They didn’t say anything about cost, though, did they? A lot of product decisions hinge on cost, not technology.
  8. R

    Nvidia Volta Speculation Thread

    I believe that was 36 DGX-2H systems, not one. They chose 36 because that’s the number of ports in the normal Infiniband switch.
  9. R

    Nvidia Volta Speculation Thread

    The full Volta memory model is supported to all remote GPU memories connected by NVSwitch. You can dereference any pointer, without doing any work in software to figure out where in the system that pointer points. You can use atomics. It’s not transparent to the GPUs themselves - obviously...
  10. R

    Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

    You can train with the quoted TFLOPS on Volta. For example, large LSTM models do quite well.
  11. R

    Nvidia Volta Speculation Thread

    The intrinsic is fine. The missing performance is because the CUDA compiler can’t optimally schedule and register allocate the code that uses the intrinsic. Hopefully that will improve with time. Getting 100% utilization of the tensor cores requires the whole chip to work at full tilt, doing...
  12. R

    Nvidia Volta Speculation Thread

    The CUDA example is using WMMA, the CUDA abstraction for tensor cores. 50 TFlops is about right for the WMMA interface with current CUDA. To get full performance, use CUBLAS.
  13. R

    Tensors! *spawn*

    Don’t forget that the tensor cores produce FP32 outputs.
  14. R

    AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

    GP100 has a completely different SM than GP102. The ratio of scheduling to math hardware and on-chip memory is quite different. So this comparison is not as straightforward as you'd like to make it.
Back
Top