Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

CSI PC · Mar 20, 2018

CarstenS said:
He is not inferring anything, but giving a marketing answer that's within the scope of his briefing and at the same time giving the impression of having adressed the question asked.

Well going down that path, when did Nvidia state "There’s definitely functionality in Volta that accelerates" for Amber?
I have posted the results in the past for FP32 Amber and they are quite shockingly good relative to Pascal, partially due to cache and V100 structure.
They do state it accelerates Amber.
But they do not say functionality in Volta.

CarstenS · Mar 20, 2018

CSI PC said:
Well going down that path, when did Nvidia state "There’s definitely functionality in Volta that accelerates" for Amber?

While having nothing to do with a broader press briefing, in which's context Nvidias representative said the part in question, but apparently was not asked about Amber and Volta....
edit: That sentence made no sense. What I meant was: Amber is not related to the question asked and debated. Tamasi said this in a broader press briefing (maybe not exclusively, but also) in order to give an answer to the question without being specific. I was merely pointing out that a fulfilling condition of this quote is such a very basic bit of information, that it does not bring the discussion forward.

CSI PC · Mar 20, 2018

CarstenS said:
While having nothing to do with a broader press briefing, in which's context Nvidias representative said the part in question, but apparently was not asked about Amber and Volta....

OK maybe I am missing something.
You stated the quote should be ignored because as part of marketing when a technical VP says "functionality in Volta to accelerate Raytracing" he probably means cache.
However for the various marketing of V100 with Amber and some other applications that benefit strongly also from the V100, they only say accelerated.
There is no reference to functionality in Volta in that instance, and that is marketing documentation and also Volta related documentation.

So from what I understand marketing to date has not used the term "functionality in Volta" in the context of cache and acceleration to application/solutions.

CarstenS · Mar 20, 2018

Just to end this from my side: When an IHV talks to press, it's always marketing.

CSI PC · Mar 20, 2018

Well that would mean we could not take anything Timothy Prickett Morgan at TheNextPlatform says is worthwhile, considering his contact and sources at IHVs when he has spoken to them about their launch technologies, I could name other reputable tech journalists/analysts as well.
Seems a flippant response Carsten.

CSI PC · Mar 20, 2018

Last post looking at this objectively
The gains from DGX-1P100 GPU to a DGX-1 V100 GPU in their early raytracing demo that I posted without AI is comparable to the gains seen with Amber going from DGX-1 P100 to TitanV, gains is around 1.8x for both tests-demos.
Amber is not the only application well suited to the acceleration V100 design provides in part due to its architecture in terms of cache-SM-register, but in all the various Nvidia launch snippets/interviews I have looked at I cannot find in any way marketing putting this in the context of "functionality in Volta".

The context is with the AI-tensor disabled for raytracing.
If it is a marketing gimmick in terms of "functionality in Volta", then it is one they have not done in the past for Volta.

CarstenS · Mar 20, 2018

CSI PC said:
Well that would mean we could not take anything Timothy Prickett Morgan at TheNextPlatform says is worthwhile, considering his contact and sources at IHVs when he has spoken to them about their launch technologies, I could name other reputable tech journalists/analysts as well.
Seems a flippant response Carsten.

Well, since you seem to like to go word by word on everything, let's start with this:

CSI PC said:
If it was Cache, then they would not be so hesitant to comment on that being the functionality as it is already a known factor.

He is dodging the question, filling in something that is cleared to state publicly while giving the impression to have positively answered the question. Media training, single-digit lesson (I guess).

CSI PC said:
Well that would mean we could not take anything Timothy Prickett Morgan at TheNextPlatform says is worthwhile, considering his contact and sources at IHVs when he has spoken to them about their launch technologies, […]

I seem to not have made it clear enough, so: I was referring to broader press briefings in the context of this discussion - which i wrote in the post before. Tamasi said as much in the one I was listening in as well. Further, „an IHV“ was meant in the sense of official communication via aforementioned broad press briefings and the like, not individuals talking over a beer, maybe reveling confidential information. The latter I would strongly assume, is not the case here, since a) no good journalist would expose his sources in this way were they inofficial and b) no friendly contact at an IHV would give you such an obvious marketing line for an answer if he wants to keep you as a friend.

Apart from that, I don't know Timothy Prickett Morgan, so I cannot comment on what he says, has said or will say in any qualified manner.

edit: spelling.

CSI PC · Apr 4, 2018

So "functionality in Volta" seems to be directed to a recent update with regards to Raytracing and the originally shown chart with context now around OptiX for CUDA that was not part of the the earlier slides (nor if interested the Ray tracing extensions Vulkan):

Reason being it fits the earlier narrative is that Nvidia states:

The core of OptiX is a domain-speciﬁc just-in-time compiler. The compiler generates custom ray-tracing kernels by combining user-supplied programs for ray generation, material shading, object intersection, and scene traversal. High performance is achieved by using a compact object model and ray-tracing compiler optimizations that map efficiently to the new RTX Technology and Volta GPUs.

So an evolution of Optix, that while working with older architectures Nvidia says works best with Volta not just in performance but also other enhancements, including traversal being integral and controlled by OptiX.
https://devblogs.nvidia.com/nvidia-optix-ray-tracing-powered-rtx/

Separately, with the Ray tracing extension for Vulkan coming soon, it will be interesting to see if any games will be looking to use this with that API albeit with tempered expectations as early days.

Edit:
This was the original slide presented back mid-March when there was mention of further functionality within Volta.
Makes it a bit clearer to see how this narrative seems to be with regards to OptiX in the April slides and the detail about compiler optimisation/traversal aligning strongly with Volta.

Voxilla · Sep 16, 2018

Was about to start a new thread for "Next HPC Nvidia GPU speculation (Ampere?)", but this existing thread could serve just as well.
Ampere name would not be unlogical, as Volt and Ampere are closely related being units of electric potential/current.
Some speculation
- 7 nm (not much doubt about that)
- huge increase in tensor cores like x2 (hardware for AI NN training is still much in demand)
- 6 stacks of HBM2, for 48 GB (can not be less as max for Turing can it ?)
- no RT cores (not much use beyond raytracing)

trinibwoy · Sep 16, 2018

I expect nVidia’s first foray into 7nm will be a straight shrink of Volta/Turing + higher clocks similar to Maxwell -> Pascal. 12nm Volta and Turing chips are huge and they’ll want to scale that back on an expensive new process, especially for gaming skus.

The monster compute chip will likely still be very large though as the target market supports the required pricing.

Voxilla · Sep 16, 2018

trinibwoy said:
I expect nVidia’s first foray into 7nm will be a straight shrink of Volta/Turing + higher clocks similar to Maxwell -> Pascal. 12nm Volta and Turing chips are huge and they’ll want to scale that back on an expensive new process, especially for gaming skus.

The monster compute chip will likely still be very large though as the target market supports the required pricing.

Not much room for higher clocks this time it looks, the jump in clock from 28nm to 16nm was unusual, also related due to use of finFet.
A 600mm2, 7 nm GPU would be reasonable, that was also the size of the P100.
7nm SoCs like the Kirin 980 have 7 billion transistors on ~100mm2
So that would be ~42 billion transistors for the new HPC GPU, twice that of the V100.
(Suddenly big Turing TU102 18.6 billion transistor on 754mm2, looks not that 'big')

Bondrewd · Sep 16, 2018

Voxilla said:
7nm SoCs like the Kirin 980 have 7 billion transistors on ~100mm2
So that would be ~42 billion transistors for the new HPC GPU, twice that of the V100.

N7 HPC is considerably less dense than the SoC variant.

Voxilla · Sep 18, 2018

Bondrewd said:
N7 HPC is considerably less dense than the SoC variant.

Any more hard data/references to backup that statement ?
The best I could find:
"N7 PPA (versus 16) is 3X density improvement, 35% speed gain, 65% power reduction. The N7 HPC track provides a further 13+% speed gain over N7 mobile."

Bondrewd · Sep 18, 2018

Voxilla said:
Any more hard data/references to backup that statement ?
The best I could find:
"N7 PPA (versus 16) is 3X density improvement, 35% speed gain, 65% power reduction. The N7 HPC track provides a further 13+% speed gain over N7 mobile."

https://twitter.com/x/status/1035716476836229120

Voxilla · Sep 28, 2018

Bondrewd said:
https://twitter.com/x/status/1035716476836229120

67 MTr/mm2 (for HPC as you believe) equates to 6.7 B transistors for 100mm2, that is close to my previous statement of 7nm GPUs having 7 billion transistors on ~100mm2

Samwell · Sep 28, 2018

Voxilla said:
67 MTr/mm2 (for HPC as you believe) equates to 6.7 B transistors for 100mm2, that is close to my previous statement of 7nm GPUs having 7 billion transistors on ~100mm2

You can't take it this way. These are marketing numbers for special cases, from the great 96 MTr/mm² for N7 Mobile, just 70 MTr/mm² are reached with real products Huawei Kirin). If we take the same scaling, marketing to real world into account, we get just 4,9 billion transistors on 100mm². We shouldn't calculate with much more at the moment. 7nm HPC is just a doubling of transistors/mm vs 16nm.

Voxilla · Sep 28, 2018

Samwell said:
You can't take it this way. These are marketing numbers for special cases, from the great 96 MTr/mm² for N7 Mobile, just 70 MTr/mm² are reached with real products Huawei Kirin). If we take the same scaling, marketing to real world into account, we get just 4,9 billion transistors on 100mm². We shouldn't calculate with much more at the moment. 7nm HPC is just a doubling of transistors/mm vs 16nm.

Apple A12 is 6.9 B transistors on 83.27 mm2 or 8.3 B transistors / 100mm2 or 83 Mtr/mm2.
Turing is only 2.47 B transistors / 100mm2, for a 7 nm GPU 6 B tr/100mm2 should not be too far off.

ECH · Sep 28, 2018

Has there been any benchmarks done between Turing and Volta?
How would one gauge new arch that replaces Volta if there hasn't been any recent tests done?

del42sa · Jan 5, 2019

https://www.pcgamesn.com/nvidia/nvidia-7nm-euv-graphics-card-samsung

next gen will use 7nm EUV from Samsung Foundries. Why they did not choose TSMC, despite having free capacity ?

Rootax · Jan 5, 2019

del42sa said:
https://www.pcgamesn.com/nvidia/nvidia-7nm-euv-graphics-card-samsung

next gen will use 7nm EUV from Samsung Foundries. Why they did not choose TSMC, despite having free capacity ?

Maybe they had a better deal ? I guess Samsung and TSMC are fighting to get big clients... EDIT : I mean, from each other.

Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

CSI PC

CarstenS

Moderator

CSI PC

CarstenS

Moderator

CSI PC

CSI PC

CarstenS

Moderator

CSI PC

Voxilla

trinibwoy

Meh

Voxilla

Bondrewd

Voxilla

Bondrewd

Voxilla

Samwell

Voxilla

ECH

del42sa

Rootax

Similar threads