Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
So, why would Sony and Microsoft support Raytracing (and at least Microsoft VRS) when these features "are [not] 'essential hardware features' at this point in time"? Wouldnt it make more sense for them to just use Navi as we know it? Raytracing is essential, because it will be used more and more from this point of time. I just found this free2play game through twitter with DXR support: https://davethefreak.itch.io/beyond-evolution

TensorCores are like the first iPhone. Introduction of something new which will be changing the way how graphics will be generated. Hardware has always come first.
Because they will be essential in future and consoles have long cycle.
Tensor-cores are there for AI-workloads (and by no means exclusive to NVIDIA). Also while "hardware always comes first" the hardware doesn't come if you don't know what it's coming for - if same chips weren't used for AI workloads there wouldn't be Tensor-cores in them
 
Tensor-cores are there for AI-workloads (and by no means exclusive to NVIDIA). Also while "hardware always comes first" the hardware doesn't come if you don't know what it's coming for - if same chips weren't used for AI workloads there wouldn't be Tensor-cores in them
Nothing wrong with repurposing one's strengths onto another segment and carving out a new frontier in the process, If NVIDIA succeeds in making DLSS widely available (and continues on the success of the latest version) it will change the industry forever, that will happen by reiterating on the process and trying different things, seeing what sticks and what doesn't, instead of slacking off and waiting for chips to fall into their places, that's generally how you become a leader in your field. That's how NVIDIA established itself with CUDA and AI anyway.

Also Tensors can be used for accelerating DirectML if that ever takes off.
 
Yeah gaming was certainly not the primary target for Nvidia’s AI efforts but give credit to them for finding a way to make the hardware relevant to gamers anyway.

They will have serious competition very soon though. JHH keeps touting the need for flexible AI accelerators which makes sense but that flexibility doesn’t need to be bundled with billions of useless graphics focused transistors. Someone will eventually build a cheaper and faster AI accelerator with the software to back it up. Only a matter of time.

One thing in their favor is that for AI applications where visualization or image processing is also important their products provide the total package. It’s a niche within a niche though.
 
They will have serious competition very soon though.
Right now, their greatest threat comes from the Alveo lineup from Xilinx, however FPGAs have a much harder programming curve and some potential latency problems, compile time is also significantly longer.

Other potential competition is of course Intel and it's multi front effort in AI: FPGAs, ASIC, and GPUs, but they seem to be lagging off a bit.

Intel canned their Xeon Phi in favor of solutions from Nervana, then canned Nervana's solutions in favor of Hebana's, their Altera FPGAs are also behind Xilinx, and their GPU initiative is yet to be proven, they have also yet to present a reliable software stack that goes toe to toe with CUDA, but they do remain the sleeping giant that can be awaken at any moment.

Google seems uninterested in commercializing their ASIC AI approach (TPUs) for unknown reasons. And AMD seems to be scattered on too many fronts at the moment and has a very weak footing in AI markets compared to even Intel.

There is also the potential startup from time to time, but the big ones seem to be snatched up quickly by either Google or Intel or maybe even NVIDIA?
 
Yeah gaming was certainly not the primary target for Nvidia’s AI efforts but give credit to them for finding a way to make the hardware relevant to gamers anyway.

They will have serious competition very soon though. JHH keeps touting the need for flexible AI accelerators which makes sense but that flexibility doesn’t need to be bundled with billions of useless graphics focused transistors. Someone will eventually build a cheaper and faster AI accelerator with the software to back it up. Only a matter of time.

One thing in their favor is that for AI applications where visualization or image processing is also important their products provide the total package. It’s a niche within a niche though.
We might have had those for some time already, but Google doesn't sell their TPUs out, just lease them in the cloud I think? (at least https://mlperf.org/training-results-0-6/ suggests the performance is about where it needs to be and they're not carrying the extra transistor load from GPU-parts, of course Google hasn't told how many transitors they're using for this either, but certainly less than NVIDIA since there's no GPU in the same chip)
 
Halo effect is important, that's one, NVIDIA has 4 GPUs above AMD's highest Navi choice, 2070S, 2080, 2080S, 2080Ti, not counting Titan RTX of course.

Secondly, looking at the market right now, that's not true, AMD still doesn't provide competition when their GPUs lack essential hardware features, NVIDIA has way options than them at every price point and is selling way more GPUs. The 5500XT had poor reception, same for the 5600XT, the 5700 series is being outmatched by the super series in sales, especially with the current driver woes, the 5700XT is the only successful Navi choice for AMD right now, but the recent driver problems have cast a big shadow over it.

Thirdly, on the process front AMD is a node behind NVIDIA as well, that doesn't matter to consumers right now, but it matters a hell of a lot more next gen. It gives NVIDIA headroom to experiment and push their advantage further.

All of that^ is about to pass.

AMD provides enough price/performance gap, to make Radeon the best mainstream choice. And Nvidia will never be able to compete with navi until they are on 7nm. AMD is making mega profits, because of the margins on these navi chips. They have room to go lower in price, where Nvidia can't compete.


Secondly, we know that Nvidia is years behind AMD.

AMD itself has been using 7nm for over 2 years and is TSMC's largest 7nm partner. Nvidia was essentially locked out of 7nm and still do not have a 7nm product. And we all know that rdna2 is on 7nm+, so how do you even suggest that Nvidia is a node ahead of AMD? It's confusing.

Ampere itself, will undoubtedly be a great success, but Nvidia's dominance at high-end gaming (2070s/280s/2080ti,etc) is about to come to and end. Because Nvidia can not keep using the hand-me-down/trickle-down business model, of selling low binned a.i. cards to Gamer's. Nvidia is going to have to come out with their own gaming architecture to combat rdna2.

I suggest, that will take a few years for Nvidia/Huang to scheme something up. Given the whitepapers, rdna1 wet our lips, rdna2 white papers might require a whole re balancing of the gaming industry...


...since we are all speculating.
 
All of that^ is about to pass.

AMD provides enough price/performance gap, to make Radeon the best mainstream choice. And Nvidia will never be able to compete with navi until they are on 7nm. AMD is making mega profits, because of the margins on these navi chips. They have room to go lower in price, where Nvidia can't compete.


Secondly, we know that Nvidia is years behind AMD.

AMD itself has been using 7nm for over 2 years and is TSMC's largest 7nm partner. Nvidia was essentially locked out of 7nm and still do not have a 7nm product. And we all know that rdna2 is on 7nm+, so how do you even suggest that Nvidia is a node ahead of AMD? It's confusing.

Ampere itself, will undoubtedly be a great success, but Nvidia's dominance at high-end gaming (2070s/280s/2080ti,etc) is about to come to and end. Because Nvidia can not keep using the hand-me-down/trickle-down business model, of selling low binned a.i. cards to Gamer's. Nvidia is going to have to come out with their own gaming architecture to combat rdna2.

I suggest, that will take a few years for Nvidia/Huang to scheme something up. Given the whitepapers, rdna1 wet our lips, rdna2 white papers might require a whole re balancing of the gaming industry...


...since we are all speculating.
lol
 
All of that^ is about to pass.

AMD provides enough price/performance gap, to make Radeon the best mainstream choice. And Nvidia will never be able to compete with navi until they are on 7nm. AMD is making mega profits, because of the margins on these navi chips. They have room to go lower in price, where Nvidia can't compete.
I wonder if RTG gets different pricing on TSMC 7nm wafers than AMDs CPU group. This is fresh from ISSCC:
https://pc.watch.impress.co.jp/img/pcw/docs/1236/258/html/photo018_o.jpg.html
From fiddling with lines in that diagram, it looks to me as if the yielded sqmm is about 70% more expensive in 7 nm. And going from the densities given at TPU.com's database, Navi10 achieves about a 64-68% higher density in 7 nm than TU104-TU106 in 12 nm. Now, of course that doesn't take into account the respective rebates each company is getting, the possible pricing difference between 16 and 12 nm at TSMC, the yield recovery in each chip etc.

But generally, I'm not inclined to see Nvidia at a massive financial disadavantage right now.

And when they can switch to 7 or 5 nm successfully, they probably will have a larger window for improving frequency or power consumption – even without taking into account a new µarch.

Possibly – since we're all speculating – they could announce Ampere as a Volta successor, while keeping back Hopper for a later date and just shrinking Turing to 7nm to exploit said process, which would go along the lines of backdrilling-rumors for improving signal integrity on the PCBs, which is needed for even higher freqs.
 
AMD provides enough price/performance gap, to make Radeon the best mainstream choice. And Nvidia will never be able to compete with navi until they are on 7nm. AMD is making mega profits, because of the margins on these navi chips. They have room to go lower in price, where Nvidia can't compete.
well from financial results, its totally the opposite. AMD barely breaks even on their GPU business where Nvidia has 65% gross margin...
 
RUMORS: NVIDIA Ampere GPU Massive Die Size, Specifications, Architecture and More!

https://wccftech.com/rumors-nvidia-ampere-gpu-massive-die-size-specifications-architecture-and-more

Could be fake, could be real or someway in between the two.

Fun none the less.
hm...

Turing Whitepaper said:
per 100 instructions average 36 INT PIPE instructions

So... would it follow for them to double up the FP32 cores vs INT :?: or would that just be coincidental

Also, wouldn't they need to increase the register file to match the increase in ALUs?
 
hm...

So... would it follow for them to double up the FP32 cores vs INT :?: or would that just be coincidental

Also, wouldn't they need to increase the register file to match the increase in ALUs?

Do they really need to increase the register file? They doubled the register files for Turing compared to Pascal, because of it's concurrent FP32+INT32 excecution.
Each Sub-SM has 16 FP32 and 16 INT32 cores it can use concurrently with 16kb register file. Pascal was 32 FP32 cores with 16kb.

Now add 16 FP32 Cores for Ampere and you can use either 32 FP32 Cores or 16 FP32+16INT32 Cores? Maybe adding Ld/St and SFU would be enough? Of course not everyone would be happy to go back to the same register file size /FP32 ratio as in pascal, but if you can pack many more units and just loose some efficiency.

Just pure speculation, but maybe someone with more knowleadge could explain what would be needed if they double FP32 per SM.

With Mesh Shaders in DX12 it's at least clear, that future chips should be more compute/RT/TC focused. Nvidia doesn't need many more GPCs with more Rasterization/Tesselation Speed, as the importance of this stuff should go down.
 
Last edited:
Do they really need to increase the register file? They doubled the register files for Turing compared to Pascal, because of it's concurrent FP32+INT32 excecution.
Each Sub-SM has 16 FP32 and 16 INT32 cores it can use concurrently with 16kb register file. Pascal was 32 FP32 cores with 16kb.

Oh, I was looking through the Pascal whitepaper, and it showed 128KB per sub-ALU grouping (32768*4B) for 32 FP32. Maxwell has 64KB w/ 32*FP32 in the sub-SM, but I figured the Pascal amount was the progression. Volta/Turing both show 16*FP32 w/ 64KB RF in the groupings.

idk, hence the question :p
 
So... would it follow for them to double up the FP32 cores vs INT :?: or would that just be coincidental

Volta was built for full speed INT+FP at the cost of maximum FP throughput. If this random twitter rumor is true then it's just Nvidia prioritizing raw FP throughput.

Also, wouldn't they need to increase the register file to match the increase in ALUs?

Register file size or bandwidth? Register file size doesn't matter but bandwidth does. If they don't increase bandwidth then the scheduler can't gather all the operands for 32 FP32 FMAs + 16 INT32 ops in one cycle.

So issuing an INT operation will cause single cycle bubbles in the FP32 execution pipeline. It's no worse than Pascal. Would be interesting to know how much a 16 wide INT32 pipeline costs vs the dual purpose FP32/INT32 pipes in Pascal and Navi.
 
Status
Not open for further replies.
Back
Top