Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

CarstenS · Nov 18, 2017

Kaotik said:
They've had this before without any such distinction of 2 families or some such (FP32/64 CUDA-core configuration completely different within family)

None of those two families had yet-another-execution-ressource coupled to the data fetch and delivery. At some point you certainly will cross the threshold, where your functional overhead in the command-and-control section is a greater cost to carry than the effort to design a distinct one for lesser featured mass products. Maybe this point was just here?

Infinisearch · Nov 18, 2017

Kaotik said:
They've had this before without any such distinction of 2 families or some such (FP32/64 CUDA-core configuration completely different within family)

Thats true but wasn't the FP32/64 units still present in both products? Its just the ratio that was different. Having to no longer route data to an execution unit that is no longer there is different than just changing ratios. Well we'll find out eventually. FWIW on this page: https://devblogs.nvidia.com/parallelforall/inside-volta/ in the diagram for an SM the tensor cores take up a lot of space and seem to count 64 'squares'... don't know if it translates to large die size. Finally why throw out all that R&D money by not basing a product on the basic premise of Volta? I mean you already invested R&D, and claim gains over your previous architecture why skip a product based on it?

CSI PC · Nov 18, 2017

Also need to put into context that Tesla is made up of more than just a P100 and V100, both of which are much more distinct to the rest of the family that has a closer semblance between Tesla/Quadro/Geforce - exception being the Quadro GP100.

McHuj · Nov 18, 2017

Bondrewd said:
A11X/A12.
Yours, Apple.

I doubt it. I expect both Apple chips in 2018 to be 10nm.

Bondrewd · Nov 18, 2017

CSI PC said:
Also need to put into context that Tesla is made up of more than just a P100 and V100, both of which are much more distinct to the rest of the family that has a closer semblance between Tesla/Quadro/Geforce - exception being the Quadro GP100.

Yeah, there's also P40 and P4, but people forget about them.

McHuj said:
I doubt it. I expect both Apple chips in 2018 to be 10nm.

N7SOC goes HVM H1 2018.
So yeah, your expectations are weird.

Alexko · Nov 18, 2017

Bondrewd said:
It's not about die size, it's about performance.
10LPE is okay for server clocks, totally not okay for high-performance speed demon consumer GPUs.

2.2/2.6 GHz base/turbo on the Centriq 2460 also sounds (more than) adequate for a high-end GPU.

Bondrewd · Nov 18, 2017

Alexko said:
2.2/2.6 GHz base/turbo on the Centriq 2460 also sounds (more than) adequate for a high-end GPU.

That's a CPU with unknown number of pipeline stages (QC is pretty cagey on details of Falkor (or not, I forgot the HotChips slide deck)).
You need to fab something high-performance like desktop CPU to find the fmax.

Deleted member 87499 · Nov 18, 2017

Alexko said:
2.2/2.6 GHz base/turbo on the Centriq 2460 also sounds (more than) adequate for a high-end GPU.

But big GPUs/FPGAs and Phone SoC CPU cores are far different. Bigger GPUs/FPGAs begs for a true HP node.

Alexko · Nov 18, 2017

Bondrewd said:
That's a CPU with unknown number of pipeline stages (QC is pretty cagey on details of Falkor (or not, I forgot the HotChips slide deck)).
You need to fab something high-performance like desktop CPU to find the fmax.

el etro said:
But big GPUs/FPGAs and Phone SoC CPU cores are far different. Bigger GPUs/FPGAs begs for a true HP node.

Apparently Falkor has 10 to 15 stages. That's admittedly slightly vague, but Qualcomm provides some latency figures for the various caches, and they're quite tight.

I'm not saying the 10nm process (Samsung's, in this case), is appropriate for desktop GPUs, I'm saying I have yet to see some justification as to why it isn't that amounts to more than hand-waving.

silent_guy · Nov 18, 2017

Infinisearch said:
Thats true but wasn't the FP32/64 units still present in both products? Its just the ratio that was different.

A ratio of 1:2 vs 1:32 still means that the SM layout will be very different.

Having to no longer route data to an execution unit that is no longer there is different than just changing ratios.

It’s only a tiny little bit different, not very different.

Well we'll find out eventually. FWIW on this page: https://devblogs.nvidia.com/parallelforall/inside-volta/ in the diagram for an SM the tensor cores take up a lot of space and seem to count 64 'squares'... don't know if it translates to large die size.

The size of the multipliers are not linear. FP64 is not just 2x FP32 either.

Finally why throw out all that R&D money by not basing a product on the basic premise of Volta? I mean you already invested R&D, and claim gains over your previous architecture why skip a product based on it?

Agreed.

It’d be kinda fun to see the reactions of a bunch of people lose their minds about a 7nm chip still just being Maxwell based (Ampwell!) with some minor changes to address a few flaws and still see it beat the competition. The Maxwell design was really very good.

But I hope we’ll something based on Volta. It’s just more interesting.

Infinisearch · Nov 18, 2017

silent_guy said:
A ratio of 1:2 vs 1:32 still means that the SM layout will be very different.

Fair enough.

silent_guy said:
It’s only a tiny little bit different, not very different.

I just meant having a huge gap in middle of an SM of unused real estate would be quite silly, and moving every thing around to have maximum occupancy can be quite tricky.

silent_guy said:
The size of the multipliers are not linear. FP64 is not just 2x FP32 either.

I know it not linear although I haven't got around to studying dada multipliers yet.

silent_guy said:
It’d be kinda fun to see the reactions of a bunch of people lose their minds about a 7nm chip still just being Maxwell based (Ampwell!) with some minor changes to address a few flaws and still see it beat the competition. The Maxwell design was really very good.

Were the higher clocks in pascal purely due to process differences? If not I'd say maxwell was good and Pascal was very good.

Bondrewd · Nov 18, 2017

Infinisearch said:
Were the higher clocks in pascal purely due to process differences?

Ye. Moving from planar to FinFETs was surely nice.

silent_guy · Nov 18, 2017

Infinisearch said:
I know it not linear although I haven't got around to studying dada multipliers yet.

There are a lot of different multiplier architectures, but if you’re not doing iteratively, their number of gates is quadratic with the size of the mantissa.

Were the higher clocks in pascal purely due to process differences? If not I'd say maxwell was good and Pascal was very good.

Process is the biggest part of it. And, per Nvidia, “path optimization”. (See https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/6 )

Deleted member 87499 · Nov 18, 2017

silent_guy said:
But I hope we’ll something based on Volta. It’s just more interesting.

I expect Ampere based on Volta.

Infinisearch · Nov 19, 2017

BTW I remember reading that the Drive PX Pegasus was shown at GTC Europe 2017. It was said it was based on a post Volta architecture... so does anyone think that is/is related to Ampere?

iMacmatician · Dec 21, 2017

According to Fudzilla, the Volta successor is for machine learning while the GeForce line will be targeted by a separate architecture, both set for 2018.

Fudzilla said:
There have been many reports that Nvidia’s next generation technology will be codenamed Ampere and we want to set one thing straight. The successor of Volta is an AI/ML chip and not a GPU - just like Volta never was.

Fudzilla said:
Nvidia will have a GPU that will come in 2018 to replace the Pascal based GTX 1000 series and won’t be a cut down version of Volta. It will be a brand new unit designed from scratch. Jensen already announced the next generation Post Volta GPU in European GTC in October 2017.

Arun · Dec 22, 2017

Irrespective of codenames, I wonder how many new GPUs we're getting on 12nm, versus parts of the line-up waiting for 7nm? And whether 7nm will start on a monster GPU (like 16nm started on P100 and 12nm on V100) or if NVIDIA will handle that differently this time?

I think they believe deep learning and HPC will be a more competitive market than gaming, and more power sensitive (rather than area sensitive) which gives a greater incentive for 7nm on the monster GPU first. But I don't know whether it's worth to refresh the entire line-up on 12nm when 7nm is coming relatively soon?

CSI PC · Dec 22, 2017

I think Nvidia is possibly caught between nodes soon if they delay much longer with regards to 12nm for their next gen *shrug*.
The V100 and Titan needed launching within a specific window and still early enough to make sense; It could be Nvidia does not want to get caught splitting a model range between nodes for the whole Tesla/Quadro/Geforce that has synergy.
But tbh Volta was always presented primarily as an HPC-DL model, with the presentations I have anyway and doing a quick search all roadmaps on tech sites with Volta show it with or part of the SGEMM/DP context slides

May be semantics, one could argue the Tesla P100 and Quadro GP100 are a distinct architecture from the rest of the Pascal line; so with talk of a different architecture to Volta it may this differentiation and/or possibly a finfet node change, there is that looming shadow of the new architecture as well but feels too early to me (not seen any notable reference in Tesla presentations).

And any successor to Volta in HPC-scaled out DL space will IMO still be collaboration with IBM, albeit implemented in distinct platforms such as Tegra just as we see now.
Another change could be launch cycle for the platforms where in the recent past the Tegra followed the accelerator/GPU by 6 months and that usually the development kit-sampling with select clients.

CarstenS · Dec 22, 2017

Given the competitive landscape, a GP102 shrunk to 12nm (and possibly outfitted with GDDR6, depending on the rework necessary for memory controllers) should carry them just fine from a business perspective until 7nm is at a yield level viable for price sensitive consumer markets.

Bondrewd · Dec 22, 2017

CarstenS said:
until 7nm is at a yield level viable for price sensitive consumer markets.

I have a weird feeling that won't happen until EUV insertion.

Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

CarstenS

Moderator

Infinisearch

CSI PC

McHuj

Bondrewd

Alexko

Bondrewd

Deleted member 87499

Guest

Alexko

silent_guy

Infinisearch

Bondrewd

silent_guy

Deleted member 87499

Guest

Infinisearch

iMacmatician

Arun

Unknown.

CSI PC

CarstenS

Moderator

Bondrewd

Similar threads