Nvidia Turing Product Reviews and Previews: (Super, TI, 2080, 2070, 2060, 1660, etc)

So what exactly is the point of GT116 over GP104+GDDR5 (1070 Ti)?
It's a 10% smaller chip that consumes 10% less power and performs up to 10% worse the same.

Why not just reduce the price of the 1070 Ti instead of making a whole new chip with practically the same power, area and performance characteristics?
 
So what exactly is the point of GT116 over GP104+GDDR5 (1070 Ti)?
It's a 10% smaller chip that consumes 10% less power and performs up to 10% worse the same.

Why not just reduce the price of the 1070 Ti instead of making a whole new chip with practically the same power, area and performance characteristics?

All the other new turing features besides ray tracing and tensors? Mesh shaders, fp16, better compute perf, coarse shading, vr rendering improvements etc?
 
All the other new turing features besides ray tracing and tensors? Mesh shaders, fp16, better compute perf, coarse shading, vr rendering improvements etc?
Which as we see in reviews does not translate into better performance nor new features nor better power efficiency? At best, the 1660Ti performs close to a 1070 Ti in games where those features are used (Far Cry 5 on FP16, Ashes on async compute). On DX11 games like witcher 3 it's over 20% slower.
Why not just "shrink" GP104 to 12FFN?

It can't even be considered a pipe cleaner, since this is the 4th turing chip and the 5th 12FFN chip.

Either this GT116 runs great on laptops or developing new chips had better be super cheap for nvidia.
 
Last edited by a moderator:
Which as we see in reviews does not translate into better performance nor new features nor better power efficiency? At best, the 1660Ti performs close to a 1070 Ti in games where those features are used (Far Cry 5 on FP16, Ashes on async compute). On DX11 games like witcher 3 it's over 20% slower.
Why not just "shrink" GP104 to 12FFN?

It can't even be considered a pipe cleaner, since this is the 4th turing chip and the 5th 12FFN chip.

Either this GT116 runs great on laptops or developing new chips had better be super cheap for nvidia.

Fine wine effect once the features get used? Turing seems to be very forward looking architecture.
 
But it does have honest to goodness dedicated FP16 CUDA cores, something that TU102/104/106 don't have.:eek:

With the TU102/104/106 the Tensor Cores are used for FP16 so there is no need to duplicate them as dedicated ones.

The Curious Case of FP16: Tensor Cores vs. Dedicated Cores

Something that escaped my attention with the original TU102 GPU and the RTX 2080 Ti was that for Turing, NVIDIA changed how standard FP16 operations were handled. Rather than processing it through their FP32 CUDA cores, as was the case for GP100 Pascal and GV100 Volta, NVIDIA instead started routing FP16 operations through their tensor cores.

https://www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2
 
Yah, I'm really not liking this 6GB thing they did with the 2060 and the 1660. To buy a gpu right now, and spend like $400-600 CAD on it, I'm not buying something that's only 6GB. 8GB is probably the minimum I want.
 
With the TU102/104/106 the Tensor Cores are used for FP16 so there is no need to duplicate them as dedicated ones.

The Curious Case of FP16: Tensor Cores vs. Dedicated Cores

Dude did you just comment on a quote from @Ryan Smith using a quote from @Ryan Smith ?


Yah, I'm really not liking this 6GB thing they did with the 2060 and the 1660. To buy a gpu right now, and spend like $400-600 CAD on it, I'm not buying something that's only 6GB. 8GB is probably the minimum I want.
They're doing it because GDDR6 is more expensive and apparently a lot harder to implement on a PCB than GDDR5.
My question is if this will be a good compromise for the end user in the long run, instead of just being good for nVidia.
 
I presume the function of dedicated FP16 cores is to double the rate of FP16, right? But big Turing is capable of doing double rate FP16 without those dedicated cores, so what gives?
Normal Turing uses Tensor cores for double rate FP16.
 
Normal Turing uses Tensor cores for double rate FP16.
I seem to lose myself in this, allow me to explain:

-Vega has Rapid Backed Math, which is essentially running two FP16 ops on a single FP32 ALU. All Turing GPUs can do the same thing too. However, there are two additional caveats:

-Big Turing has Tensor Cores which allow it to run a single FP16 op on a single Tensor Core, while FP32 ALUs do something else.
-Small Turing has dedicated FP16 cores, which allow it to run a single FP16 op on a single FP16 core, while the FP32 ALUs do something else.

Did I get all of these right? Which is the better overall implementation?
 
Last edited:
I seem to lose myself in this, allow me to explain:

-Vega has Rapid Backed Math, which is essentially running two FP16 ops on a single FP32 ALU. All Turing GPUs can do the same thing too. However, there are two additional caveats:

-Bug Turing has Tensor Cores which allow it to run a single FP16 op on a single Tensor Core, while FP32 ALUs do something else.
-Small Turing has dedicated FP16 cores, which allow it to run a single FP16 op on a single FP16 core, while the FP32 ALUs do something else.

Did I get all of these right? Which is the better overall implementation?


Vega has Rapid Backed Math, which is essentially running two FP16 ops on a single FP32 ALU. All Turing GPUs can do the same thing too

No I don't believe that Turing GPUs can run two FP16 ops on a single FP32 ALU.

Bug Turing has Tensor Cores

I assume you meant Big not Bug?
 
Last edited:
No I don't believe that Turing GPUs can run two FP16 ops on a single FP32 ALU.
FP16 output is 2X FP32, there must be a connection. Anandtech says the same thing:

Like all other Turing parts, TU116 get NVIDIA’s fast FP16 path. This means that these GPUs can process FP16 operations at twice the rate of FP32 operations,
https://www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2


I assume you meant Big not Bug?
Yeah, corrected. Thanks.
 
Peak fp16 tflops for Turing are double fps32 according to Nvidia's turing whitepaper. That's from the SMs, not tensor cores.
 
I'm guessing the advantage of having dedicated FP16 units instead of having the FP32 ALUs doing RPM is that IIRC Vega and GP100 (and maybe GV100?) only get 2xFP16 throughput if the two FP16 calculations going for that ALU are using the same operation. With dedicated FP16 ALUs they don't have that dependence, so the real-life throughput should be higher.

The article's authors are Nate Oh & Ryan Smith and it is unclear who the quote belongs to.

My post was to elaborate that FP16 is done on the Tensor Cores on the Big Turing GPU's and dedicated FP16 cores on the Turing GTX GPU's.

So you didn't assume the co-author of the article you took that information from was aware of it?
 
Back
Top