NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
Government grants?
I always thought TOP 10 HPC was all about the dick metering on a global scale, no?
The one who promises longer fp64 flops at the smallest energy footprints wins.
Then why have University's like ANU who have only ever had Nvidia GPU's in there HPC clusters also going MI200? I guess not only is big government stupid but so is academia.

Which is something nice to brag about in press, but kind of useless even for the scientists who will use this supercomputer. DGEMM has little to do with real HPC tasks, which are mostly bandwidth and scaling bound.
But also what's your point MI200 does packed FP32 and has the most HBM / bandwidth of any GPU.
 
Is the FP64 theoretical peak rate ~5x higher when compared against A100 when it is using the tensor cores (~20TF/s) or without them (~10TF/s)?

Also what's MI200 TDP?
The rumors I know are based on a slide for the australian supercomputer Setonix as reported on by hpcwire.com; the data itself has been around for much longer though. If you do the math on that one, an MI200 will be at about 55 FP64-TFLOPS, so 5x A100 w/o Tensor cores.
 
Last edited:
The rumors I know are based on a slide for the australian supercomputer Setonix as reported on by hpcwire.com; the data itself has been around for much longer though. If you do the math on that one, an MI200 will be at about 55 FP64-TFLOPS, so 5x A100 w/o Tensor cores.
Which makes it that more mysterious since AMD did not release official specs when MI200 was announced shipping to customers on July 28, so still a lot of speculation. Unless something went wrong what's the point in keeping this from the public or potential customers?
 
50PFlops is probably CPU+GPU combined though.
Yes, sure. At 200k Milan-cores, that's roughly 9 PFlops for the CPU (with 2,75 TFLops for a single 64core Epyc 7763) and leaves 41 for GPU.
(50-9)/750 = 0,05466 PFlops or 54,66 TFlops
 
Yes, sure. At 200k Milan-cores, that's roughly 9 PFlops for the CPU (with 2,75 TFLops for a single 64core Epyc 7763) and leaves 41 for GPU.
(50-9)/750 = 0,05466 PFlops or 54,66 TFlops

I see. I just calculated the numbers and see you already subtracted CPU numbers out of it.
Isn't MI200 still going to be on 7nm though? I really doubt MI200 would achieve almost 5 times of FP64 performance of MI100 even if it is MCM of 2 dies if it is still on 7nm.
 
I see. I just calculated the numbers and see you already subtracted CPU numbers out of it.
Isn't MI200 still going to be on 7nm though? I really doubt MI200 would achieve almost 5 times of FP64 performance of MI100 even if it is MCM of 2 dies if it is still on 7nm.
MI200 does fullrate FP64 which already quadruples the performance at the same clocks
 
So, all games of current gen will look shitty?
Will? No, I really hope not.

What? How does RT forces or even helps devs to understand / optimize HW?
My belief is that the step change in per-pixel quality and consistency that ray tracing based techniques can bring will split devs into two camps: those that will continue as if nothing happened and those who are willing to go back to the drawing board. I expect that traditional fixed-function hardware will underlie this new perspective but it will be plugged-in to support ray-traced rendering. There'll be a lot of navel-gazing focused on what that hardware can really do when used properly.

I think we're exiting the "ray tracing is tacked-on" mode of graphics development. I'm hopeful that in a couple of years we'll play the fruits of this.

Ive been working through my Steam backlog. Mostly PS3 era games and just started getting into the PS4 generation. There are clear improvements but the visual upgrade is nowhere near the increase in shading horsepower in the same timeframe. I doubt it will be any different in the PS5 generation. Games will not take full advantage of PC hardware.
When I've finished the three Witchers, I have a few 2007 games I want to play: Bioshock, Call of Juarez, Crysis and STALKER. And then it'll be 2009...
 
I see. I just calculated the numbers and see you already subtracted CPU numbers out of it.
Isn't MI200 still going to be on 7nm though? I really doubt MI200 would achieve almost 5 times of FP64 performance of MI100 even if it is MCM of 2 dies if it is still on 7nm.
Almost 5 times compared to A100, not MI100. Exact number depends on final clocks.
 
it isnt 5x higher. AMD is using their Matrix engine for DP64. The same way nVidia is using TensorCores. 5x higher would be nearly 100TFLOPs.
 
Maxing out RT just to bring it to its knees is not efficient either i guess (we'll see if / how they'll improve).
I don't know what you find inefficient about RT. RT inefficiency is a myth. It's way more efficient than rasterization in too many cases.
As for stochastic sampling, there are great improvements in this area - importance sampling, shading caches, coherency sorting, all this improved by a lot in just 3 years.
Looking at how fast path tracing (and multi-bounce tracing) is right now in hundreds of millions polygons scenes in UE5, Omniverse, etc, it feels as if it is already here, just mix in all mentioned above improvements, add in denoising and push it to prod.

So we need to add something new not present in the console game we aim to port.
Honestly, the problem is that your average console developer treats PC as a third tier platform and unless there is an IHV involved, which would help with the workloads you've mentioned, the developer won't do anything for PC.

Which could be (summing up my previous proposals) volumetric stuff (fog simulation, lighting), layered framebuffer to address SS hacks shortcomings, fancy SM based area shadow techniques. And ofc. GI if compute can do this better than RT. What else?
I don't think that's the way to go. Those SS hacks, fancy SM based area shadow techniques, compute GI, etc are just another hacks with tons of drawbacks that are layered on top of the energy inefficient computations on general multiprocessors in a time when the Denard scaling is long dead, hence the 500W monsters on the horizon.
Why bother making more of those fragile and unmaintainable systems when people can't even use existing ones properly in most cases because there are millions of tweakable parameters (apparently many devs can't even setup DLSS properly with just a few parameters).
I'd rather prefer devs and IHVs putting more time and HW effort into something way more general/unifyed and maintainable, which requires minimum tweaking and has proven to beat any hacks in CG industry.
Other than this, general/unified algorithm can live up and scale with way more efficient specialized hardware, which is the way to go in future, at least if you want to avoid 1000W monster GPUs in near future, lol.
 
Status
Not open for further replies.
Back
Top