Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
Every result where it's more than +15% (half of flops gain) is highly likely to be a result helped by the additional memory b/w.
So in a hypothetical world where the bandwidth had increased by the same amount as the FLOPS gain (~32%) to maintain the same ratio as the 4090, what performance difference would we see?
 
So in a hypothetical world where the bandwidth had increased by the same amount as the FLOPS gain (~32%) to maintain the same ratio as the 4090, what performance difference would we see?
I'm sure someone reading this will get a 5090. It would be very interesting to downclock the memory and see how performance scales. But it's only interesting if you post the results here :yes:
 
So in a hypothetical world where the bandwidth had increased by the same amount as the FLOPS gain (~32%) to maintain the same ratio as the 4090, what performance difference would we see?
On average? Doubt that the results would be much different. Some games which are showing +50% now though would start showing something closer to +30.
 
On average? Doubt that the results would be much different. Some games which are showing +50% now though would start showing something closer to +30.
Ok so the question is whether on average the 5070 Ti and 5070 will see the ~20% improvements suggested by the slides, or Nvidia just cherry picked games that benefit from the extra bandwidth.
 
Ok so the question is whether on average the 5070 Ti and 5070 will see the ~20% improvements suggested by the slides, or Nvidia just cherry picked games that benefit from the extra bandwidth.
We'll know the answer to that in several days since what 5080 will show will likely be very similar to what 5070Ti will show and - probably - 5070 too.
 
Just purely speculating here but I wonder if GB202 has some experimentation in terms of how to approach a MCM design. This would be akin to how they approached preceeding GPUs before actually moving to MCM with GB200.
 
It's interesting it seems like the CNN model uses less VRAM on the above? I'm wondering if the transformer models leverage FP8 (or INT8) more which wouldn't benefit Ampere (and presumbly Turing as well).
 
Back
Top