Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
They've had this before without any such distinction of 2 families or some such (FP32/64 CUDA-core configuration completely different within family)
None of those two families had yet-another-execution-ressource coupled to the data fetch and delivery. At some point you certainly will cross the threshold, where your functional overhead in the command-and-control section is a greater cost to carry than the effort to design a distinct one for lesser featured mass products. Maybe this point was just here?
 
They've had this before without any such distinction of 2 families or some such (FP32/64 CUDA-core configuration completely different within family)
Thats true but wasn't the FP32/64 units still present in both products? Its just the ratio that was different. Having to no longer route data to an execution unit that is no longer there is different than just changing ratios. Well we'll find out eventually. FWIW on this page: https://devblogs.nvidia.com/parallelforall/inside-volta/ in the diagram for an SM the tensor cores take up a lot of space and seem to count 64 'squares'... don't know if it translates to large die size. Finally why throw out all that R&D money by not basing a product on the basic premise of Volta? I mean you already invested R&D, and claim gains over your previous architecture why skip a product based on it?
 
Also need to put into context that Tesla is made up of more than just a P100 and V100, both of which are much more distinct to the rest of the family that has a closer semblance between Tesla/Quadro/Geforce - exception being the Quadro GP100.
 
Also need to put into context that Tesla is made up of more than just a P100 and V100, both of which are much more distinct to the rest of the family that has a closer semblance between Tesla/Quadro/Geforce - exception being the Quadro GP100.
Yeah, there's also P40 and P4, but people forget about them.
I doubt it. I expect both Apple chips in 2018 to be 10nm.
N7SOC goes HVM H1 2018.
So yeah, your expectations are weird.
 
It's not about die size, it's about performance.
10LPE is okay for server clocks, totally not okay for high-performance speed demon consumer GPUs.

2.2/2.6 GHz base/turbo on the Centriq 2460 also sounds (more than) adequate for a high-end GPU.
 
2.2/2.6 GHz base/turbo on the Centriq 2460 also sounds (more than) adequate for a high-end GPU.
That's a CPU with unknown number of pipeline stages (QC is pretty cagey on details of Falkor (or not, I forgot the HotChips slide deck)).
You need to fab something high-performance like desktop CPU to find the fmax.
 
Last edited:
2.2/2.6 GHz base/turbo on the Centriq 2460 also sounds (more than) adequate for a high-end GPU.

But big GPUs/FPGAs and Phone SoC CPU cores are far different. Bigger GPUs/FPGAs begs for a true HP node.
 
That's a CPU with unknown number of pipeline stages (QC is pretty cagey on details of Falkor (or not, I forgot the HotChips slide deck)).
You need to fab something high-performance like desktop CPU to find the fmax.

But big GPUs/FPGAs and Phone SoC CPU cores are far different. Bigger GPUs/FPGAs begs for a true HP node.

Apparently Falkor has 10 to 15 stages. That's admittedly slightly vague, but Qualcomm provides some latency figures for the various caches, and they're quite tight.

I'm not saying the 10nm process (Samsung's, in this case), is appropriate for desktop GPUs, I'm saying I have yet to see some justification as to why it isn't that amounts to more than hand-waving.
 
Thats true but wasn't the FP32/64 units still present in both products? Its just the ratio that was different.
A ratio of 1:2 vs 1:32 still means that the SM layout will be very different.

Having to no longer route data to an execution unit that is no longer there is different than just changing ratios.
It’s only a tiny little bit different, not very different. :)

Well we'll find out eventually. FWIW on this page: https://devblogs.nvidia.com/parallelforall/inside-volta/ in the diagram for an SM the tensor cores take up a lot of space and seem to count 64 'squares'... don't know if it translates to large die size.
The size of the multipliers are not linear. FP64 is not just 2x FP32 either.

Finally why throw out all that R&D money by not basing a product on the basic premise of Volta? I mean you already invested R&D, and claim gains over your previous architecture why skip a product based on it?
Agreed.

It’d be kinda fun to see the reactions of a bunch of people lose their minds about a 7nm chip still just being Maxwell based (Ampwell!) with some minor changes to address a few flaws and still see it beat the competition. The Maxwell design was really very good.

But I hope we’ll something based on Volta. It’s just more interesting.
 
A ratio of 1:2 vs 1:32 still means that the SM layout will be very different.
Fair enough.
It’s only a tiny little bit different, not very different. :)
I just meant having a huge gap in middle of an SM of unused real estate would be quite silly, and moving every thing around to have maximum occupancy can be quite tricky.
The size of the multipliers are not linear. FP64 is not just 2x FP32 either.
I know it not linear although I haven't got around to studying dada multipliers yet.
It’d be kinda fun to see the reactions of a bunch of people lose their minds about a 7nm chip still just being Maxwell based (Ampwell!) with some minor changes to address a few flaws and still see it beat the competition. The Maxwell design was really very good.
Were the higher clocks in pascal purely due to process differences? If not I'd say maxwell was good and Pascal was very good.
 
I know it not linear although I haven't got around to studying dada multipliers yet.
There are a lot of different multiplier architectures, but if you’re not doing iteratively, their number of gates is quadratic with the size of the mantissa.

Were the higher clocks in pascal purely due to process differences? If not I'd say maxwell was good and Pascal was very good.
Process is the biggest part of it. And, per Nvidia, “path optimization”. (See https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/6 )
 
BTW I remember reading that the Drive PX Pegasus was shown at GTC Europe 2017. It was said it was based on a post Volta architecture... so does anyone think that is/is related to Ampere?
 
According to Fudzilla, the Volta successor is for machine learning while the GeForce line will be targeted by a separate architecture, both set for 2018.

Fudzilla said:
There have been many reports that Nvidia’s next generation technology will be codenamed Ampere and we want to set one thing straight. The successor of Volta is an AI/ML chip and not a GPU - just like Volta never was.
Fudzilla said:
Nvidia will have a GPU that will come in 2018 to replace the Pascal based GTX 1000 series and won’t be a cut down version of Volta. It will be a brand new unit designed from scratch. Jensen already announced the next generation Post Volta GPU in European GTC in October 2017.
 
Irrespective of codenames, I wonder how many new GPUs we're getting on 12nm, versus parts of the line-up waiting for 7nm? And whether 7nm will start on a monster GPU (like 16nm started on P100 and 12nm on V100) or if NVIDIA will handle that differently this time?

I think they believe deep learning and HPC will be a more competitive market than gaming, and more power sensitive (rather than area sensitive) which gives a greater incentive for 7nm on the monster GPU first. But I don't know whether it's worth to refresh the entire line-up on 12nm when 7nm is coming relatively soon?
 
I think Nvidia is possibly caught between nodes soon if they delay much longer with regards to 12nm for their next gen *shrug*.
The V100 and Titan needed launching within a specific window and still early enough to make sense; It could be Nvidia does not want to get caught splitting a model range between nodes for the whole Tesla/Quadro/Geforce that has synergy.
But tbh Volta was always presented primarily as an HPC-DL model, with the presentations I have anyway and doing a quick search all roadmaps on tech sites with Volta show it with or part of the SGEMM/DP context slides

May be semantics, one could argue the Tesla P100 and Quadro GP100 are a distinct architecture from the rest of the Pascal line; so with talk of a different architecture to Volta it may this differentiation and/or possibly a finfet node change, there is that looming shadow of the new architecture as well but feels too early to me (not seen any notable reference in Tesla presentations).

And any successor to Volta in HPC-scaled out DL space will IMO still be collaboration with IBM, albeit implemented in distinct platforms such as Tegra just as we see now.
Another change could be launch cycle for the platforms where in the recent past the Tegra followed the accelerator/GPU by 6 months and that usually the development kit-sampling with select clients.
 
Last edited:
Given the competitive landscape, a GP102 shrunk to 12nm (and possibly outfitted with GDDR6, depending on the rework necessary for memory controllers) should carry them just fine from a business perspective until 7nm is at a yield level viable for price sensitive consumer markets.
 
Status
Not open for further replies.
Back
Top