Nvidia Turing Speculation thread [2018]

Voxilla · Jun 3, 2018

Samwell said:
For Turing? Like Xavier Int8 because Int8 is enough for inference. You don't need high accuracy for inference, Ampere as next gen training architecture will get higher precision cores as Volta is using. These are 2 different workloads with 2 different requirements.

We are speculating here that Turing is the next gaming GPU architecture. (Ampere already more or less has been declared a false/changed rumour).
Say Turing is the next generation of Pascal, it doesn't need hardware for NN training, just inferencing.

What will be called the next generation (next Volta) of HPC GPU (ie NN training/ double precision/ HMB2/NV link etc), we don't know or have not speculated.
It might also be called Turing or something else.

if you think the next generation of Volta HPC GPU will be called Ampere, how do you get to this and why do you think it needs higher precision training hardware (Volta is FP16/FP32 mixed precision) ?

CSI PC · Jun 3, 2018

CarstenS said:
My point is: What will a tensor core for the next generation look like?

I am wondering because 4x rate for "DL-Ops" i.e. in GP104 apparently is purely integer-based (l). The GV100 Tensor cores are using mixed floating point precisions (2). WRT Xavier, They explicitly say INT8 TOPs at one point, while at another they talk about mixed precision for the Volta-Cores (+Tensors?).

That raises the question is the Int based operations done purely on Int Cuda cores or the Cuda cores support both fp and int; I appreciate some would break it down to ALU but keeping it within Nvidia's context.
I tend to think it is the latter with the "CUDA core" supporting both, just based on SM (such as SM_60 or SM_61)/Compute capable version.

CarstenS · Jun 3, 2018

Samwell said:
For Turing? Like Xavier Int8 because Int8 is enough for inference. You don't need high accuracy for inference, Ampere as next gen training architecture will get higher precision cores as Volta is using. These are 2 different workloads with 2 different requirements.

I am aware of that, coming from the assumption, they are going with the split you propose. About that, I am not so adamant as you seem to be.
Additionally, Xavier being INT8 only is reinforced through Nvidia stating „5 TFLOPS FP16 10 TOPS INT8“ and „5 FP16/10 INT8 TOPS DL“ for the DLA.

Samwell said:
I can't think of any talk, where they published so many details about Xavier as in the GTC Automotive talk where the slide i posted is from, so i'm sure they just have Int8 Tensor Cores. It's just the in detail term vs marketing term "mixed precision". 2nd, everything else makes no sense for an inference platform. It would just be a waste of space and nvidia is very careful in just adding the stuff to their chips, which is needed. Inference is low precision computing, some people even try inference with lower precision than Int8, so we might even see such stuff in the gen after.

So, the next question is: Is INT8 enough for the tasks scheduled in DXR/RTX for the Gameworks modules and possibly more to come? After all, there has to be a gaming oriented architecture. Or will this be a third one besides Turing and Ampere?

CSI PC said:
That raises the question is the Int based operations done purely on Int Cuda cores or the Cuda cores support both fp and int; I appreciate some would break it down to ALU but keeping it within Nvidia's context.
I tend to think it is the latter with the "CUDA core" supporting both, just based on SM (such as SM_60 or SM_61)/Compute capable version.

IIRC it was mentioned somewhere (here on the forums) that INT8 were done on separate units.

silent_guy · Jun 3, 2018

CarstenS said:
IIRC it was mentioned somewhere (here on the forums) that INT8 were done on separate units.

That wouldn’t make a lot of sense. They seem to be just another operation of the integer ALU.

CSI PC · Jun 4, 2018

Yeah Int8 is part of the cublasGemmEx that makes up 32-bit.
From what can be seen dp4a kinda falls into this set, albeit reliant upon CUDA compute core version capability.

CarstenS · Jun 4, 2018

silent_guy said:
That wouldn’t make a lot of sense. They seem to be just another operation of the integer ALU.

Gonna take a look at it in the evening, maybe I remembered incorrectly.

CSI PC · Jun 4, 2018

OK just found this, yep Int8 and d4pa is just another operation and part of 32-bit CUDA core/ALU (back to interpretation and design of a CUDA core and ALUs and whether the individual CUDA core supports both operations or requires separate CUDA core for Int - personally I feel the single CUDA core supports both but depends upon Compute Capability version with what is achievable).
https://devtalk.nvidia.com/default/...ilizing-__dp4a-instruction-on-nvidia-1080ti-/
Read TXBobs (moderator with a lot of knowledge and technical experience pertaining to CUDA and the Tesla accelerators) responses lower down:
OP said

does it use __dp4a instruction with CUDA_R_32I compute type?

TXBob responds

Yes, it should, that is the whole point.

Deleted member 13524 · Jun 4, 2018

I don't really know where to put this because there are now 2 different threads about post-Volta because rumors have been pointing out that the next consumer GPUs won't be Volta, but nvidia didn't show or hint at new GPUs at Computex.
Jen-Hsun Huang did answer a few questions about the next Geforce line:

Q: When is Volta coming to gaming? A: So many gamers were deprived of GeForces due to mining, but now pricing is coming down. 1080/Ti are the best cards right now, we suggest they buy GeForce ASAP

Q: When is the next GeForce? A: I'll invite you. There will be lunch. But it's a long time.

By "lunch" I assume "lAunch".

I don't think all those rumors about nvidia bringing post-Pascal cards during Q2 with a hard launch in Summer will come true.

Xmas · Jun 4, 2018

CarstenS said:
Are those by chance beefed up GP104-style "tensor cores"? And can you even call Tensor Cores with only INT-capability (dp2a/dp4a)?

According to https://developer.nvidia.com/embedded/jetson-xavier-faq, "The 512-core Volta GPU with support for Tensor Cores and mixed-precision compute is capable of up to 10 TFLOPS FP16 and 20 TOPS INT8."

These numbers only make sense for Tensor Cores that support both FP16 and INT8 precision. It's also interesting that this is 8x the stated FP32 performance (standard multiply-add ALU) at FP16 precision. What's left unanswered is if accumulator precision is FP16 or FP32.

Samwell · Jun 4, 2018

Xmas said:
According to https://developer.nvidia.com/embedded/jetson-xavier-faq, "The 512-core Volta GPU with support for Tensor Cores and mixed-precision compute is capable of up to 10 TFLOPS FP16 and 20 TOPS INT8."

These numbers only make sense for Tensor Cores that support both FP16 and INT8 precision. It's also interesting that this is 8x the stated FP32 performance (standard multiply-add ALU) at FP16 precision. What's left unanswered is if accumulator precision is FP16 or FP32.

Ok, somehow i missed that and was wrong. So it seems Xavier hasn't got lower precision tensor cores, but the tensor cores have more funcionality and can do Int8 with double speed.

Voxilla said:
if you think the next generation of Volta HPC GPU will be called Ampere, how do you get to this and why do you think it needs higher precision training hardware (Volta is FP16/FP32 mixed precision) ?

Because people like Tomshardware were pretty sure about it and Erinyes also mentioned it. It's not a false rumour, just a different product it seems.

CarstenS said:
I am aware of that, coming from the assumption, they are going with the split you propose. About that, I am not so adamant as you seem to be.
Additionally, Xavier being INT8 only is reinforced through Nvidia stating „5 TFLOPS FP16 10 TOPS INT8“ and „5 FP16/10 INT8 TOPS DL“ for the DLA.

So, the next question is: Is INT8 enough for the tasks scheduled in DXR/RTX for the Gameworks modules and possibly more to come? After all, there has to be a gaming oriented architecture. Or will this be a third one besides Turing and Ampere?

Yeah, probably i was wrong there and as posted above it's double speed Int8 and not lower precision tensor cores. As for RTX, Int8 shouldn't be a problem for that. AI denoising is also just inference.

CarstenS · Jun 4, 2018

Interestingly, it's now dual DLA engines with combined 5 FP16/10 INT8 OPs, not a single one as in the slides above. Also, I completely missed the apparent fact that Xavier has PCIe 4.

Geeforcer · Jun 5, 2018

ToTTenTranz said:
I don't really know where to put this because there are now 2 different threads about post-Volta because rumors have been pointing out that the next consumer GPUs won't be Volta, but nvidia didn't show or hint at new GPUs at Computex.
Jen-Hsun Huang did answer a few questions about the next Geforce line:

By "lunch" I assume "lAunch".

I don't think all those rumors about nvidia bringing post-Pascal cards during Q2 with a hard launch in Summer will come true.

Smugly sitting on the same architecture for the 3rd year while thinking that whatever you have in the pipeline will be sufficient to deal with emerging competitive threats?

You want to get R300ed/Maxwelled ? Because that’s how you get R300ed/Maxwelled.

Deleted member 13524 · Jun 5, 2018

Geeforcer said:
Smugly sitting on the same architecture for the 3rd year while thinking that whatever you have in the pipeline will be sufficient to deal with emerging competitive threats?

They have one competitor with public-ish roadmaps showing no new consumer cards either, their 2-year-old graphics cards are now starting to sell at MSRP and gamers are happy about it.
Traditionally, after 1.5 - 2 years the cards would be selling for 50-60% the MSRP. Mining gave them a huge opportunity to just scrap a new architecture using larger dies on 16/12FF (like the 28nm Kepler -> Maxwell transition) and just keep selling the now-cheaper-to-produce GPUs at their initial price.

It's not really a surprise that ~16 months of graphics card draughts and lack of high-end competition from AMD would change nVidia's traditional release schedule.

silent_guy · Jun 5, 2018

Geeforcer said:
You want to get R300ed/Maxwelled ? Because that’s how you get R300ed/Maxwelled.

I think it’s still a better strategy than the one of AMD: the past half decade, theirs has been one of being very open about their roadmap and then not delivering.

Voxilla · Jun 5, 2018

CarstenS said:
Interestingly, it's now dual DLA engines with combined 5 FP16/10 INT8 OPs, not a single one as in the slides above. Also, I completely missed the apparent fact that Xavier has PCIe 4.

Good remark, so we can expect gen4 PCIe too on next gen NV GPUs, for ~32 GB/s transfer on a 16-bit bus.

source

CSI PC · Jun 5, 2018

Well since Jensen had to keep fielding the question when the next Geforce model will launch and said what he did, they now have changed the schedule listing at HotChips to TBD.
In other words, Nvidia does not want the spotlight on the next architecture while they are still selling and just as importantly the narrative for certain solutions/suites available or pushing to raise tech profile, no surprises there as Nvidia did the same previously; cannot really conclude either way from anything public recently with what will happen in next 3-6/9-12 months beyond certain core foundations.

https://www.hotchips.org/program/
Look at Day1 11:30.

Kaotik · Jun 5, 2018

CSI PC said:
Well since Jensen had to keep fielding the question when the next Geforce model will launch and said what he did, they now have changed the schedule listing at HotChips to TBD.
In other words, Nvidia does not want the spotlight on the next architecture while they are still selling and just as importantly the narrative for certain solutions/suites available or pushing to raise tech profile, no surprises there as Nvidia did the same previously; cannot really conclude either way from anything public recently with what will happen in next 3-6/9-12 months beyond certain core foundations.

https://www.hotchips.org/program/
Look at Day1 11:30.

Actually they changed it before the Computex event and Jensen was asked about that change in particular, to which he responded just "Live in the present. We will invite you to our l(a)unch events"

CSI PC · Jun 5, 2018

Kaotik said:
Actually they changed it before the Computex event and Jensen was asked about that change in particular, to which he responded just "Live in the present. We will invite you to our l(a)unch events"

Ah ok.
Still that reinforces my point with his response "Live in the present", fact is they removed it for the reason I gave and unfortunately difficult to reach any conclusions on what Nvidia strategy is beyond certain core foundations; clear narrative (product-tech-solution rather than other aspects people can be critical of) is one fundamental approach Nvidia pushes and why they always try to lessen news-information overlap with regards to generations or product tech/solution; I would not like to predict when Geforce is launching with everything said so far in public; could be 2-3 months (FE type launch) or could be 6-9 months.

Voxilla · Jun 5, 2018

CSI PC said:
Ah ok.
Still that reinforces my point with his response "Live in the present", fact is they removed it for the reason I gave and unfortunately difficult to reach any conclusions on what Nvidia strategy is beyond certain core foundations; clear narrative (product-tech-solution rather than other aspects people can be critical of) is one fundamental approach Nvidia pushes and why they always try to lessen news-information overlap with regards to generations or product tech/solution; I would not like to predict when Geforce is launching with everything said so far in public; could be 2-3 months (FE type launch) or could be 6-9 months.

Of course NVidia wants to avoid at all cost that people stop buying GPUs in anticipation of next generation GPUs, while they are still able to sell 2 year old GPUs at a premium price in lack of serious competition.

entity279 · Jun 5, 2018

Yes, but they wouldn't be that cheap to say "it's a long time" and then launch in the next 2-3 months

Nvidia Turing Speculation thread [2018]

Voxilla

CSI PC

CarstenS

Moderator

silent_guy

CSI PC

CarstenS

Moderator

CSI PC

Deleted member 13524

Guest

Xmas

Porous

Samwell

CarstenS

Moderator

Geeforcer

Harmlessly Evil

Deleted member 13524

Guest

silent_guy

Voxilla

CSI PC

Kaotik

Drunk Member

CSI PC

Voxilla

entity279

Similar threads