But it does have honest to goodness dedicated FP16 CUDA cores, something that TU102/104/106 don't have.So it has no Tensor or RT cores, so no RTX or DLSS.
But it does have honest to goodness dedicated FP16 CUDA cores, something that TU102/104/106 don't have.So it has no Tensor or RT cores, so no RTX or DLSS.
So what exactly is the point of GT116 over GP104+GDDR5 (1070 Ti)?
It's a 10% smaller chip that consumes 10% less power and performs up to 10% worse the same.
Why not just reduce the price of the 1070 Ti instead of making a whole new chip with practically the same power, area and performance characteristics?
Which as we see in reviews does not translate into better performance nor new features nor better power efficiency? At best, the 1660Ti performs close to a 1070 Ti in games where those features are used (Far Cry 5 on FP16, Ashes on async compute). On DX11 games like witcher 3 it's over 20% slower.All the other new turing features besides ray tracing and tensors? Mesh shaders, fp16, better compute perf, coarse shading, vr rendering improvements etc?
Which as we see in reviews does not translate into better performance nor new features nor better power efficiency? At best, the 1660Ti performs close to a 1070 Ti in games where those features are used (Far Cry 5 on FP16, Ashes on async compute). On DX11 games like witcher 3 it's over 20% slower.
Why not just "shrink" GP104 to 12FFN?
It can't even be considered a pipe cleaner, since this is the 4th turing chip and the 5th 12FFN chip.
Either this GT116 runs great on laptops or developing new chips had better be super cheap for nvidia.
Finewine with less VRAM?Fine wine effect once the features get used? Turing seems to be very forward looking architecture.
But it does have honest to goodness dedicated FP16 CUDA cores, something that TU102/104/106 don't have.
Something that escaped my attention with the original TU102 GPU and the RTX 2080 Ti was that for Turing, NVIDIA changed how standard FP16 operations were handled. Rather than processing it through their FP32 CUDA cores, as was the case for GP100 Pascal and GV100 Volta, NVIDIA instead started routing FP16 operations through their tensor cores.
https://www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2
Finewine with less VRAM?
With the TU102/104/106 the Tensor Cores are used for FP16 so there is no need to duplicate them as dedicated ones.
The Curious Case of FP16: Tensor Cores vs. Dedicated Cores
They're doing it because GDDR6 is more expensive and apparently a lot harder to implement on a PCB than GDDR5.Yah, I'm really not liking this 6GB thing they did with the 2060 and the 1660. To buy a gpu right now, and spend like $400-600 CAD on it, I'm not buying something that's only 6GB. 8GB is probably the minimum I want.
I presume the function of dedicated FP16 cores is to double the rate of FP16, right? But big Turing is capable of doing double rate FP16 without those dedicated cores, so what gives?But it does have honest to goodness dedicated FP16 CUDA cores, something that TU102/104/106 don't have.
Normal Turing uses Tensor cores for double rate FP16.I presume the function of dedicated FP16 cores is to double the rate of FP16, right? But big Turing is capable of doing double rate FP16 without those dedicated cores, so what gives?
I seem to lose myself in this, allow me to explain:Normal Turing uses Tensor cores for double rate FP16.
I seem to lose myself in this, allow me to explain:
-Vega has Rapid Backed Math, which is essentially running two FP16 ops on a single FP32 ALU. All Turing GPUs can do the same thing too. However, there are two additional caveats:
-Bug Turing has Tensor Cores which allow it to run a single FP16 op on a single Tensor Core, while FP32 ALUs do something else.
-Small Turing has dedicated FP16 cores, which allow it to run a single FP16 op on a single FP16 core, while the FP32 ALUs do something else.
Did I get all of these right? Which is the better overall implementation?
Vega has Rapid Backed Math, which is essentially running two FP16 ops on a single FP32 ALU. All Turing GPUs can do the same thing too
Bug Turing has Tensor Cores
Dude did you just comment on a quote from @Ryan Smith using a quote from @Ryan Smith ?
FP16 output is 2X FP32, there must be a connection. Anandtech says the same thing:No I don't believe that Turing GPUs can run two FP16 ops on a single FP32 ALU.
https://www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2Like all other Turing parts, TU116 get NVIDIA’s fast FP16 path. This means that these GPUs can process FP16 operations at twice the rate of FP32 operations,
Yeah, corrected. Thanks.I assume you meant Big not Bug?
The article's authors are Nate Oh & Ryan Smith and it is unclear who the quote belongs to.
My post was to elaborate that FP16 is done on the Tensor Cores on the Big Turing GPU's and dedicated FP16 cores on the Turing GTX GPU's.