NVidia Ada Speculation, Rumours and Discussion

itsmydamnation · Aug 5, 2021

OlegSH said:
Government grants?
I always thought TOP 10 HPC was all about the dick metering on a global scale, no?
The one who promises longer fp64 flops at the smallest energy footprints wins.

Then why have University's like ANU who have only ever had Nvidia GPU's in there HPC clusters also going MI200? I guess not only is big government stupid but so is academia.

Which is something nice to brag about in press, but kind of useless even for the scientists who will use this supercomputer. DGEMM has little to do with real HPC tasks, which are mostly bandwidth and scaling bound.

But also what's your point MI200 does packed FP32 and has the most HBM / bandwidth of any GPU.

CarstenS · Aug 5, 2021

nAo said:
Is the FP64 theoretical peak rate ~5x higher when compared against A100 when it is using the tensor cores (~20TF/s) or without them (~10TF/s)?

Also what's MI200 TDP?

The rumors I know are based on a slide for the australian supercomputer Setonix as reported on by hpcwire.com; the data itself has been around for much longer though. If you do the math on that one, an MI200 will be at about 55 FP64-TFLOPS, so 5x A100 w/o Tensor cores.

pharma · Aug 6, 2021

CarstenS said:
The rumors I know are based on a slide for the australian supercomputer Setonix as reported on by hpcwire.com; the data itself has been around for much longer though. If you do the math on that one, an MI200 will be at about 55 FP64-TFLOPS, so 5x A100 w/o Tensor cores.

Which makes it that more mysterious since AMD did not release official specs when MI200 was announced shipping to customers on July 28, so still a lot of speculation. Unless something went wrong what's the point in keeping this from the public or potential customers?

Tarkin1977 · Aug 6, 2021

CarstenS said:
The rumors I know are based on a slide for the australian supercomputer Setonix as reported on by hpcwire.com; the data itself has been around for much longer though. If you do the math on that one, an MI200 will be at about 55 FP64-TFLOPS, so 5x A100 w/o Tensor cores.

https://www.hpcwire.com/2021/07/05/...-data-deluge-from-the-square-kilometre-array/

CarstenS · Aug 6, 2021

Tarkin1977 said:
https://www.hpcwire.com/2021/07/05/...-data-deluge-from-the-square-kilometre-array/

Yes, that's the article I linked to.

JasonLD · Aug 6, 2021

CarstenS said:
Yes, that's the article I linked to.

50PFlops is probably CPU+GPU combined though.

CarstenS · Aug 6, 2021

JasonLD said:
50PFlops is probably CPU+GPU combined though.

Yes, sure. At 200k Milan-cores, that's roughly 9 PFlops for the CPU (with 2,75 TFLops for a single 64core Epyc 7763) and leaves 41 for GPU.
(50-9)/750 = 0,05466 PFlops or 54,66 TFlops

JasonLD · Aug 6, 2021

CarstenS said:
Yes, sure. At 200k Milan-cores, that's roughly 9 PFlops for the CPU (with 2,75 TFLops for a single 64core Epyc 7763) and leaves 41 for GPU.
(50-9)/750 = 0,05466 PFlops or 54,66 TFlops

I see. I just calculated the numbers and see you already subtracted CPU numbers out of it.
Isn't MI200 still going to be on 7nm though? I really doubt MI200 would achieve almost 5 times of FP64 performance of MI100 even if it is MCM of 2 dies if it is still on 7nm.

Kaotik · Aug 6, 2021

JasonLD said:
I see. I just calculated the numbers and see you already subtracted CPU numbers out of it.
Isn't MI200 still going to be on 7nm though? I really doubt MI200 would achieve almost 5 times of FP64 performance of MI100 even if it is MCM of 2 dies if it is still on 7nm.

MI200 does fullrate FP64 which already quadruples the performance at the same clocks

Jawed · Aug 6, 2021

JoeJ said:
So, all games of current gen will look shitty?

Will? No, I really hope not.

What? How does RT forces or even helps devs to understand / optimize HW?

My belief is that the step change in per-pixel quality and consistency that ray tracing based techniques can bring will split devs into two camps: those that will continue as if nothing happened and those who are willing to go back to the drawing board. I expect that traditional fixed-function hardware will underlie this new perspective but it will be plugged-in to support ray-traced rendering. There'll be a lot of navel-gazing focused on what that hardware can really do when used properly.

I think we're exiting the "ray tracing is tacked-on" mode of graphics development. I'm hopeful that in a couple of years we'll play the fruits of this.

trinibwoy said:
Ive been working through my Steam backlog. Mostly PS3 era games and just started getting into the PS4 generation. There are clear improvements but the visual upgrade is nowhere near the increase in shading horsepower in the same timeframe. I doubt it will be any different in the PS5 generation. Games will not take full advantage of PC hardware.

When I've finished the three Witchers, I have a few 2007 games I want to play: Bioshock, Call of Juarez, Crysis and STALKER. And then it'll be 2009...

troyan · Aug 6, 2021

Over the Matrix engine the same like nVidia with TensorCores. So general compute doesnt increase by 5x.

JasonLD · Aug 6, 2021

Kaotik said:
MI200 does fullrate FP64 which already quadruples the performance at the same clocks

Fullrate FP64? That is kind of going opposite direction of where most of the industry is heading....hmm.

troyan · Aug 6, 2021

JasonLD said:
Fullrate FP64? That is kind of going opposite direction of where most of the industry is heading....hmm.

CDNA is paid by the goverment. Specific designed chips for one workload is nothing new. AMD doesnt need to provide better software support or even products like nVidia for selling CDNA products.

no-X · Aug 6, 2021

JasonLD said:
I see. I just calculated the numbers and see you already subtracted CPU numbers out of it.
Isn't MI200 still going to be on 7nm though? I really doubt MI200 would achieve almost 5 times of FP64 performance of MI100 even if it is MCM of 2 dies if it is still on 7nm.

Almost 5 times compared to A100, not MI100. Exact number depends on final clocks.

troyan · Aug 6, 2021

it isnt 5x higher. AMD is using their Matrix engine for DP64. The same way nVidia is using TensorCores. 5x higher would be nearly 100TFLOPs.

CarstenS · Aug 6, 2021

troyan said:
it isnt 5x higher. AMD is using their Matrix engine for DP64. The same way nVidia is using TensorCores. 5x higher would be nearly 100TFLOPs.

Is there a source for this?

troyan · Aug 6, 2021

4x more FP64 over the compute units with a vector unit wont happen. You have to scale everything with it and alone data movement off- and on-chip will kill effciency.

pharma · Aug 6, 2021

no-X said:
Almost 5 times compared to A100, not MI100. Exact number depends on final clocks.

Isn't MI200 dual chip?

OlegSH · Aug 6, 2021

JoeJ said:
Maxing out RT just to bring it to its knees is not efficient either i guess (we'll see if / how they'll improve).

I don't know what you find inefficient about RT. RT inefficiency is a myth. It's way more efficient than rasterization in too many cases.
As for stochastic sampling, there are great improvements in this area - importance sampling, shading caches, coherency sorting, all this improved by a lot in just 3 years.
Looking at how fast path tracing (and multi-bounce tracing) is right now in hundreds of millions polygons scenes in UE5, Omniverse, etc, it feels as if it is already here, just mix in all mentioned above improvements, add in denoising and push it to prod.

JoeJ said:
So we need to add something new not present in the console game we aim to port.

Honestly, the problem is that your average console developer treats PC as a third tier platform and unless there is an IHV involved, which would help with the workloads you've mentioned, the developer won't do anything for PC.

JoeJ said:
Which could be (summing up my previous proposals) volumetric stuff (fog simulation, lighting), layered framebuffer to address SS hacks shortcomings, fancy SM based area shadow techniques. And ofc. GI if compute can do this better than RT. What else?

I don't think that's the way to go. Those SS hacks, fancy SM based area shadow techniques, compute GI, etc are just another hacks with tons of drawbacks that are layered on top of the energy inefficient computations on general multiprocessors in a time when the Denard scaling is long dead, hence the 500W monsters on the horizon.
Why bother making more of those fragile and unmaintainable systems when people can't even use existing ones properly in most cases because there are millions of tweakable parameters (apparently many devs can't even setup DLSS properly with just a few parameters).
I'd rather prefer devs and IHVs putting more time and HW effort into something way more general/unifyed and maintainable, which requires minimum tweaking and has proven to beat any hacks in CG industry.
Other than this, general/unified algorithm can live up and scale with way more efficient specialized hardware, which is the way to go in future, at least if you want to avoid 1000W monster GPUs in near future, lol.

CarstenS · Aug 6, 2021

troyan said:
4x more FP64 over the compute units with a vector unit wont happen. You have to scale everything with it and alone data movement off- and on-chip will kill effciency.

So it's your assumption then.

pharma said:
Isn't MI200 dual chip?

Dual die* MCM from the looks of it.

*not counting HBM

NVidia Ada Speculation, Rumours and Discussion

itsmydamnation

CarstenS

Moderator

pharma

Tarkin1977

CarstenS

Moderator

JasonLD

CarstenS

Moderator

JasonLD

Kaotik

Drunk Member

Jawed

troyan

JasonLD

troyan

no-X

troyan

CarstenS

Moderator

troyan

pharma

OlegSH

CarstenS

Moderator

Similar threads