Recent content by RecessionCone

R
NVidia Ada Speculation, Rumours and Discussion

People forget that A100 has a big cache already.
- RecessionCone
- Post #639
- May 4, 2022
- Forum: Architecture and Products
R
CryptoCurrency Mining with GPUs *spawn*

Ethereum mining doesn’t need TOPs. It needs GB/s of random memory accesses. Performance depends on the memory subsystem, not so much on the cores.
- RecessionCone
- Post #936
- Feb 10, 2021
- Forum: Architecture and Products
R
Nvidia Ampere Discussion [2020-05-14]

CarstenS: tensor cores lower register file bandwidth per math operation because they work on tensors rather than scalars (N^3/N^2). Nvidia actually shows this phenomenon in their animated tensor core cartoons in their keynotes. So it’s likely they are close to peak RF bandwidth both at 78...
- RecessionCone
- Post #1,259
- Sep 5, 2020
- Forum: Architecture and Products
R
Huge Explosion in port of Beirut [2020-08]

My mistake!
- RecessionCone
- Post #33
- Aug 7, 2020
- Forum: General Discussion
R
Huge Explosion in port of Beirut [2020-08]

2.7M grams. 2.7M kg would be a thousand times more.
- RecessionCone
- Post #14
- Aug 5, 2020
- Forum: General Discussion
R
Digital Foundry Article Technical Discussion [2020]

VGTech just does FPS benchmarking right? I’ve never seen an analysis as in-depth or insightful as DF anywhere else. FPS benchmarks aren’t a dime a dozen but don’t inform about the broader issues.
- RecessionCone
- Post #873
- Jul 26, 2020
- Forum: Gaming Technology
R
Nvidia DLSS 1 and 2 antialiasing discussion *spawn*

DLSS makes the game faster only when the frame rate is low. If the frame rate is high, the cost of running the neural network, even with tensor cores, will dominate the rendering time, meaning you won’t see a performance improvement.
- RecessionCone
- Post #215
- Feb 13, 2019
- Forum: Architecture and Products
R
AMD: Navi Speculation, Rumours and Discussion [2017-2018]

They didn’t say anything about cost, though, did they? A lot of product decisions hinge on cost, not technology.
- RecessionCone
- Post #903
- Dec 18, 2018
- Forum: Architecture and Products
R
Nvidia Volta Speculation Thread

I believe that was 36 DGX-2H systems, not one. They chose 36 because that’s the number of ports in the normal Infiniband switch.
- RecessionCone
- Post #1,217
- Nov 21, 2018
- Forum: Architecture and Products
R
Nvidia Volta Speculation Thread

The full Volta memory model is supported to all remote GPU memories connected by NVSwitch. You can dereference any pointer, without doing any work in software to figure out where in the system that pointer points. You can use atomics. It’s not transparent to the GPUs themselves - obviously...
- RecessionCone
- Post #1,110
- Mar 29, 2018
- Forum: Architecture and Products
R
Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

You can train with the quoted TFLOPS on Volta. For example, large LSTM models do quite well.
- RecessionCone
- Post #122
- Mar 3, 2018
- Forum: Architecture and Products
R
Nvidia Volta Speculation Thread

The intrinsic is fine. The missing performance is because the CUDA compiler can’t optimally schedule and register allocate the code that uses the intrinsic. Hopefully that will improve with time. Getting 100% utilization of the tensor cores requires the whole chip to work at full tilt, doing...
- RecessionCone
- Post #1,009
- Jan 11, 2018
- Forum: Architecture and Products
R
Nvidia Volta Speculation Thread

The CUDA example is using WMMA, the CUDA abstraction for tensor cores. 50 TFlops is about right for the WMMA interface with current CUDA. To get full performance, use CUBLAS.
- RecessionCone
- Post #1,004
- Jan 10, 2018
- Forum: Architecture and Products
R
Tensors! *spawn*

Don’t forget that the tensor cores produce FP32 outputs.
- RecessionCone
- Post #31
- Dec 12, 2017
- Forum: Architecture and Products
R
AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

GP100 has a completely different SM than GP102. The ratio of scheduling to math hardware and on-chip memory is quite different. So this comparison is not as straightforward as you'd like to make it.
- RecessionCone
- Post #4,105
- Sep 14, 2017
- Forum: Architecture and Products

Top