Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
They went to Samsung and that ended up as well as you'd expect it to end.
Not Intel bad, but still.

Interesting. I suppose it’s fortunate for Nvidia that Turing is keeping pace on 12nm so far.

A quick google turns up a few recent rumors of disastrous 7nm yields at Samsung but nothing Nvidia specific.
 
GTX 480 slightly faster than HD 5870

Don't forget that GTX 480 also came out half a year after Radeon 5870. When 5870 came out it was competing against the GTX 285. And even then, the GTX 480 and Radeon 5870 were basically even trading leadership depending on which games were tested.

Man, looking back at prior generations, GPU progress has slowed down a LOT for both companies.

Prior generations ~6-9 months per generation (counting mid gen as a generation as performance jumps for mid gen were generally similar to modern day new generations)
200 series to 400 series ~1.75 years
400 series to 500 series ~7 months
500 series to 600 series ~1.25 years
600 series to 700 series was ~1 year.
700 series to 900 series was ~1.5 years.
900 series to 1000 series was ~1.66 years
1000 series to 2000 series was ~2.33 years

Looking at that it also becomes quite evident that GTX 480 was significantly delayed. NV had problems getting the chip to where they wanted it be. And even then it was a HOT running chip that consumed a lot of electricity, especially when compared to the 5870.

Also, while NV were making really large chips, ATI/AMD were making relatively speaking much smaller chips. So while NV had the performance crown, AMD had the perf/mm and perf/watt crown up until Maxwell. Radeon 2900 really scarred ATI/AMD WRT building large chips for a long time.

AMD progress is even slower. And I don't expect things to get better from here on out. Then again as old as I am now, 2 years feels like how 1 year used to feel. :D

Regards,
SB
 
Last edited:
Don't forget that GTX 480 also came out half a year after Radeon 5870. When 5870 came out it was competing against the GTX 285.
The comparison is for gen vs gen, the HD 2900 XT also came out 6 months late, same for HD 290X which came out 10 months late.
And even then, the GTX 480 and Radeon 5870 were basically even trading leadership depending on which games were tested.
They were in several games, but the GTX 480 came out on top more, it was 10~15% faster according to Anandtech and TPU.

Also, while NV were making really large chips, ATI/AMD were making relatively speaking much smaller chips. So while NV had the performance crown, AMD had the perf/mm and perf/watt crown up until Maxwell. Radeon 2900 really scarred ATI/AMD WRT building large chips for a long time.
That's true. IMO, AMD could have out muscled NVIDIA during the HD 4870/GTX 280 era with a bigger chip. The problem is the HD 4870 was only 30w~40w lower than GTX 280 in average power consumption, it was also hotter in operating temps, and on a smaller node (55nm) vs NVIDIA's 65nm, so AMD might have feared pushing the chip too far, or the new process didn't allow them much headway in die size early in the life cycle of the node.

AMD then corrected course with the HD 4890 and tried pushing the chip harder with a clock uplift, they ended up jacking power consumption higher, while NVIDIA migrated to 55nm quickly and released the GTX 285 with lower power consumption which allowed them to increase clocks and maintain the lead, it was a close call between the two in power consumption at the end, which probably explains why AMD was fearful of a bigger die.

However the real wasted opportunity for AMD was the HD 5870, which was miles ahead of the power drunk Fermi arch (GTX 480), which failed hard to capitalize on the 40nm node, and was released with cut cores and reduced clocks, AMD most likely didn't expect the GTX 480 to fail so spectacularly in efficiency and to be castrated by this much, so they played their hand conservatively, they were also planning an architectural migration from VLIW5 to VLIW4 (as they were under pressure to correct their compute deficiency), however that didn't turn out to be good against the fixed Fermi (GTX 580 with full die and originally planned clocks). Thinking back, the fixed Fermi stood it's ground (performance wise) against two arch variations from AMD (HD 5870 and HD 6970).
 
Last edited:
With the numbers for Orin published we have first official performance numbers for Ampere, so let's have some fun with numbers to see what we can expect of Ampere and whether different infos fit together:
What do we know?
Official Info: Orin Plattform with 2 Orin is 400 Int8 TOP. 2GPU Version with 2 Orin ist 2000 TOPS. So one Ampere GPU is 800 TOPS INT8 at 300W power. With 300 W per GPU it's sure they use a big GPU in the car this time.
Then we have this one Twitteruser, which posted all the super infos and the codename hopper as the first person, it'S pretty sure he has some real insight. According to him in HPC it should be 8 GPC with 8 TPCs with doubled TCs per SM, 6 HBM.
128 SM, 8192 Shader. Each SM 64 FP32 Cudacores and 16 TCs (8 TCs per SM before).

New Process, big Chip -> ~120 SM in the end product. 1920 TC. 1 TC is 256 TOPs Int8. Leads us to 1627mhz for 800 TOPs, 11% higher clock as V100.

Sounds pretty reasonable for me and fits together with just one problem, 2 HPC big chips in one car computer? So far they used the consumer line, as they only need inference speed. Therefore i'm sceptical in this case.

So let's think of a AM102 configuration, which would fit in these numbers.
A smaller chip with 8 GPCs with 7 TPCs each. 7168 Shader, cut off 256 =6912 Shader, 54 TPC. 1728TC with 1800mhz we're at 800 TOPs. That's 11% more than TU102 FE, which seems alright and might fit in a car.
2080Ti FE 14,2 TFlops FP32. Speculated AM102 configuration: 24,8 Tflops, maybe 24 if you cut more SM for the consumer product. At 24 TFlops we would have 69% more shader power than a 2080Ti. Should hopefully translate to 50-60% more speed in games.

7168 Shader are 55% more than TU102. Add more tensors, small architecture improvements, more in RT (I think more like Maxwell->Pascal). But with the number of rops, memory interface not growing, just 2 more gpcs and caches also not growing much,everything beside SMs would increase in size less than 55%. So it might work out with 55% more transistors than TU102.
This GPU is 7nm EUV, i hope we get more like a 2x transistor density vs 16nm unlike the DUV process. 754mm²/2x1,55= 584mm²

So that's my educated guess, how a possible AM102 might look. Feel free to discuss and destroy my assumptions :)
 
With the numbers for Orin published we have first official performance numbers for Ampere, so let's have some fun with numbers to see what we can expect of Ampere and whether different infos fit together:
What do we know?
Official Info: Orin Plattform with 2 Orin is 400 Int8 TOP. 2GPU Version with 2 Orin ist 2000 TOPS. So one Ampere GPU is 800 TOPS INT8 at 300W power. With 300 W per GPU it's sure they use a big GPU in the car this time.
Then we have this one Twitteruser, which posted all the super infos and the codename hopper as the first person, it'S pretty sure he has some real insight. According to him in HPC it should be 8 GPC with 8 TPCs with doubled TCs per SM, 6 HBM.
128 SM, 8192 Shader. Each SM 64 FP32 Cudacores and 16 TCs (8 TCs per SM before).
Even though NVIDIA says Orin is using "next gen GPU architecture" it doesn't mean it's Ampere (assuming here that Ampere is the next gen desktop, which isn't a given) - in fact the little they told suggests it's heavy lifting Tensor-GPU more than traditional GPU and I doubt that will follow to desktop
 
Xavier is coupled with Turing, as Orin will come 2 years after Xavier, i'm pretty sure it'll be next years architecture, whatever it's called.

It's funny, as we had the exact same discussion with tensor cores and Turing. So many people were sure, that Turing won't have Tensor Cores. I wouldn't be surprised by more TCs also in the desktop lineup. We have EGX server based on GPUs also in desktops for inference, adobe and other software makers are exploring the possibilities of inference in their software, so Quadros might profit from it and we have win ml/direct ml which might also lead to DL inferencing in games (beside DLSS). The whole software world is researching the possibilities of DL, so it's not such a bad bet to try to be the best in it also in the consumer space.
 
Don't forget that GTX 480 also came out half a year after Radeon 5870. When 5870 came out it was competing against the GTX 285. And even then, the GTX 480 and Radeon 5870 were basically even trading leadership depending on which games were tested.

Man, looking back at prior generations, GPU progress has slowed down a LOT for both companies.

Prior generations ~6-9 months per generation (counting mid gen as a generation as performance jumps for mid gen were generally similar to modern day new generations)
200 series to 400 series ~1.75 years
400 series to 500 series ~7 months
500 series to 600 series ~1.25 years
600 series to 700 series was ~1 year.
700 series to 900 series was ~1.5 years.
900 series to 1000 series was ~1.66 years
1000 series to 2000 series was ~2.33 years

Looking at that it also becomes quite evident that GTX 480 was significantly delayed. NV had problems getting the chip to where they wanted it be. And even then it was a HOT running chip that consumed a lot of electricity, especially when compared to the 5870.

Also, while NV were making really large chips, ATI/AMD were making relatively speaking much smaller chips. So while NV had the performance crown, AMD had the perf/mm and perf/watt crown up until Maxwell. Radeon 2900 really scarred ATI/AMD WRT building large chips for a long time.

AMD progress is even slower. And I don't expect things to get better from here on out. Then again as old as I am now, 2 years feels like how 1 year used to feel. :D

Regards,
SB
400 to 500 and 600 to 700 shouldn't really be considered a generation jump because the architecture was the same. If you do, for consistency you should also consider 2000 Super a new generation
 
50% more performance at half the power would mean that Ampere is 3X more power efficient than Turing, which is almost unheard of in the industry, it's hard to believe that is true outside of some special cases, maybe in Ray Tracing for example.

Not to mention that NVIDIA will push this to the limit, and we can have a monstrous 7nm die that is 150% than a 2080Ti, still sounds hard to believe.

More plausible would be 50% more performance OR half the power.
 
I don't find it outside the realm of plausibility tbh, considering the 2 node jumps in between the two architectures. We've never had that before afair.

EDIT: Also by leveraging that (posible) advantage they may be able to set the chips much lower on the power curve.
 
I don't find it outside the realm of plausibility tbh, considering the 2 node jumps in between the two architectures. We've never had that before afair.

EDIT: Also by leveraging that (posible) advantage they may be able to set the chips much lower on the power curve.

You think a GPU 50% faster than a 2080ti at 125-150 watts is plausible?
 
You think a GPU 50% faster than a 2080ti at 125-150 watts is plausible?

Yes, if anything I would need to ask why not. I don't consider it a given, but it's completely plausible imho.

With 7nm they have over 3x the ransistor density and 40% more performance or 60% lower power. They have no competition in the higher-end so no pressure to offer a greater performance advantage, so that's great oportunity to clock it very low on the performance curve (also leaving a margin for later on). If you would tell me 100% performance uplift and same 3x power efficiency gain, I would say definitively less plausible, but a measly 50% increase all things considered? I don't see any problem in considering it plausible.
 
I don't find it outside the realm of plausibility tbh, considering the 2 node jumps in between the two architectures. We've never had that before afair.

EDIT: Also by leveraging that (posible) advantage they may be able to set the chips much lower on the power curve.
2 node jumps? AFAIK we don't know wether NVIDIA will use 7nm or 7nm+ and regardless of which they pick, it's really only 1 node jump (I wouldn't call 7nm > 7nm+ a node jump)
 
2 node jumps? AFAIK we don't know wether NVIDIA will use 7nm or 7nm+ and regardless of which they pick, it's really only 1 node jump (I wouldn't call 7nm > 7nm+ a node jump)

It's pretty sure they'll use 7nm+. But yes, of course it's more like 1 node jump and at least 7nm DUV isn't even a full node jump, as you see in AMDs product density improvements of x1,6.
 
It's pretty sure they'll use 7nm+. But yes, of course it's more like 1 node jump and at least 7nm DUV isn't even a full node jump, as you see in AMDs product density improvements of x1,6.
You can't compare different manufacturers nodes directly like that, we don't know what kind of density AMD would have offered on TSMC 16/12nm process, it could have been lower or higher than GloFo 14/12nm
 
Status
Not open for further replies.
Back
Top