Nvidia Turing Product Reviews and Previews: (Super, TI, 2080, 2070, 2060, 1660, etc)

I think TU117 is cheaper/better for NVIDIA than GP106 because 128-bit/4GB is going to be a lot cheaper than 192-bit/6GB but the pricing doesn’t really reflect that right now.
Cheaper? Yes.
A lot cheaper? Nah. Like @Kaotik wrote, 2x8Gb GDDR5 chips should be lower than $13 right now, and the price difference in the PCB for the additional traces should be almost negligible.


Nvidia have built a huge brand and don't need to worry about competing heavily on pricing even at the low end. Consumers will buy them anyway even if it's a worse product.
Yes, and this is exactly why nVidia tried to block informed opinions from reaching the consumer, as much as they could.
Monopolies are terrible.
 
What puzzles me the most is that this new chip TU117 is 200mm^2. That's the exact same size as the GP106 GTX1060, which performs significantly better.
I don't think it has to be THAT bad though. Yes turing doesn't quite reach the same perf/area as pascal on average, but on some compute workloads it can easily exceed it.
And for this particular chip, I believe it desperately needs gddr6 memory, it is just too bandwidth constrained with gddr5, minimally better bandwidth savings or not, the GTX 1060 has 50% more bandwidth than the GTX 1650. (The rumors are already saying there's going to be a GTX 1650 Ti with a full chip configuration, but still only with gddr5 memory - well if that's the case it's not going anywhere and I don't know why nvidia would even bother with this configuration, as that would at most still be barely competitive with the RX 570.)
The GTX 1650 also looks so bad because the RX 570 is very cheap compared with anything else (even from AMD) comparatively. Against the GTX 1050 Ti it's pretty decent considering price and performance (of course the 1050Ti looks like a complete joke against the RX 570 there...).
 
2) Forward-looking compute features, e.g. the new SIMT model which I genuinely think is brilliant (but useless for existing workloads).
3) Forward-looking graphics features, including mesh shaders and FP16 ALUs. It’s pretty obvious that FP16 is a perf/mm2 loss on existing games but hopefully a gain in future content.
Yet there's no actual proof that those features will ever put the Turing chips with better performance than the Pascal chips with similar die size.
How many "forward-looking" features fell flat on expected results so far, in the PC space?​


The rumors are already saying there's going to be a GTX 1650 Ti with a full chip configuration, but still only with gddr5 memory - well if that's the case it's not going anywhere and I don't know why nvidia would even bother with this configuration, as that would at most still be barely competitive with the RX 570.
There may never be a fully enabled desktop TU117.
That GPU already exists as the laptop GTX 1650 which is fully enabled but clocks lower than the desktop version.
 
Last edited by a moderator:
Yet there's no actual proof that those features will ever put the Turing chips with better performance than the Pascal chips with similar die size.

Wolfenstein II kinda proves it I think, with 2070 faster than 1080 Ti and 1660Ti equal to 1080. That's before using adaptive shading, where Turing gains an aditional 5%.

Regardless, absence of evidence is not evidence of absence.

It's far from unrealistic that an architecture with 2x FP16 throuput would benefit from more FP16 math being used, for example.

Finally, why is perf/mm2 so important? It's clear that Nvidia went for perf/watt with Turing and looking at future nodes which come with greater density benefits than efficiency gains, I can't say I disagree with the move.

How many "forward-looking" features fell flat on expected results so far, in the PC space?

Idk, how many succeded? And how does past failure/success affect future developments exactly?
 
Finally, why is perf/mm2 so important? It's clear that Nvidia went for perf/watt with Turing and looking at future nodes which come with greater density benefits than efficiency gains, I can't say I disagree with the move.
I'm really wondering though if the better perf/watt isn't mostly from the tweaked 12nm tech. (This is definitely unlike maxwell, which had both much better perf/watt as well as perf/area on the same node.)
Overall, I don't think Turing has really raised the bar there much (that's not to say the architecture isn't quite different), nvidia claiming it's their most innovative change since the g80 or not (if you ask me, that's not even close). This is not necessarily a bad thing though, considering the gcn iterations amd did in the timeframe when nvidia was doing kepler->turing, none of them really was all that much of an improvement.
 
When comparing the 1660 Ti to the 1060 the efficiency is like 30% higher according to the reviews I've seen. That seems too much for just the node, especially when it's mostly a refinement with a name change. Anyway, I was speking more in the sense of having separate FP32, FP16 and INT ALUs. That seems the kind of move to reduce power at the expense of area.

In regards to innovation, I don't know the context in which that was said, but in my opinion it is the most innovative since G80. I mean, since G80, in terms of features and changes to the SM/TPC which other architecture makes Turing be "not even close" to being "the most innovative since G80"?
 
In regards to innovation, I don't know the context in which that was said, but in my opinion it is the most innovative since G80. I mean, since G80, in terms of features and changes to the SM/TPC which other architecture makes Turing be "not even close" to being "the most innovative since G80"?
Fermi. Yes the actually released chips had, to put it mildly, some issues so the cards weren't all that great, but from an architecture point of view it was huge imho. Forget SMs, the whole workload distribution (with multiple rasterizers etc., all backed by a fully unified L2 including ROPs, something AMD didn't achieve until Vega) was quite innovative and imho very important in retrospect (I kind of thought it was a bit overengineered at that time), although of course it's not really a user-visible feature as such.
Kepler and Maxwell both did more for perf/w and perf/area than Turing (difficult to say for sure for Kepler due to 40->28nm tech, I'm excluding Pascal here since it is a (very successfully) tuned for higher frequencies maxwell mostly, and without 14nm probably wouldn't really improve things much (if it would be even possible at 28nm)).
Yes Turing is architecture wise quite a change from pascal, but the SMs are mostly borrowed from Volta anyway, with some more features bolted on (so if you count it as 2 steps from pascal, volta is imho the much bigger change than turing), and what counts is the end result, and other than the new features (which the low end turings don't get) it just isn't that impressive imho compared to pascal. Not that it's bad mind you, amd still has a lot of catchup to do...
 
When comparing the 1660 Ti to the 1060 the efficiency is like 30% higher according to the reviews I've seen. That seems too much for just the node, especially when it's mostly a refinement with a name change. Anyway, I was speking more in the sense of having separate FP32, FP16 and INT ALUs. That seems the kind of move to reduce power at the expense of area.

30% efficiency improvement would seem a bit on the high side for a refinement node change. But I think gddr6 helps a bit there (with the gddr5 cards, the improvement tends to be less). And as a counter point, amd got about a 20% efficiency improvement with polaris 30 with such a node refinement (albeit yes that's samsung, not tsmc), even with an otherwise completely unaltered chip (so no tweaked design rules or anything). (Of course AMD did not actually release a card with 20% higher efficiency, instead they cranked up the clocks some more on the RX 590 until it had the same terrible efficiency as the RX 580, but that's another story...)
And depending on which numbers you look at, I don't think the improvement is really quite 30% in any case (on average): https://www.techpowerup.com/reviews/MSI/GeForce_GTX_1650_Gaming_X/28.html - non-oc cards (without increased TDP) will fare significantly better than oc ones, be that pascal or turing. So the GTX 1650 there (with an increased TDP) fares just 10% better than the very frugal (never exceeding 60W) GTX 1050TI this site used, but roughly 30% better than the GTX 1060 (which are ranked worst in that metric for all pascal cards in this particular test).
So I'm thinking that it could indeed be mostly process refinement which help with efficiency. But indeed it will have higher efficiency if you can put fp16 to good use. (As for concurrent int/fp, I'm not entirely convinced, since apps are already using that - yes if you've got the really right mix of instructions there you should see more gains, but clearly this is also already contributing to the SMs being faster (but bigger) per clock than on pascal.)
 
Last edited:
Fermi. Yes the actually released chips had, to put it mildly, some issues so the cards weren't all that great, but from an architecture point of view it was huge imho. Forget SMs, the whole workload distribution (with multiple rasterizers etc., all backed by a fully unified L2 including ROPs, something AMD didn't achieve until Vega) was quite innovative and imho very important in retrospect (I kind of thought it was a bit overengineered at that time), although of course it's not really a user-visible feature as such.
Kepler and Maxwell both did more for perf/w and perf/area than Turing (difficult to say for sure for Kepler due to 40->28nm tech, I'm excluding Pascal here since it is a (very successfully) tuned for higher frequencies maxwell mostly, and without 14nm probably wouldn't really improve things much (if it would be even possible at 28nm)).
Yes Turing is architecture wise quite a change from pascal, but the SMs are mostly borrowed from Volta anyway, with some more features bolted on (so if you count it as 2 steps from pascal, volta is imho the much bigger change than turing), and what counts is the end result, and other than the new features (which the low end turings don't get) it just isn't that impressive imho compared to pascal. Not that it's bad mind you, amd still has a lot of catchup to do...

It's true that a lot of the SM changes came with Volta, I just tend to put Volta and Turing together since I see them more as a parallel to P100 and the rest of Pascal, rather than as a completely different architecture.

Also it's true that Fermi was quite innovative, and a good candidate, but even if I picked it above Turing, I wouldn't say that Turing is not even close, which is why I asked for your opinion. I guess that it depends a little bit on what you focus on.

However, the fact that you said "new features (which the low end turings don't get)" makes me question if you really are understanding all the actual new features that Turing brings. Tensor cores and Ray-tracing are the flashy ones, but at least IMO Mesh Shaders and Texture Space Shading, along with all the under the hood changes that were required to make them posible, are far more relevant and the reason I believe Turing wins. I also believe that the architectural changes that bringing those features required could also enable some other future things.
 
30% efficiency improvement would seem a bit on the high side for a refinement node change. But I think gddr6 helps a bit there (with the gddr5 cards, the improvement tends to be less). And as a counter point, amd got about a 20% efficiency improvement with polaris 30 with such a node refinement (albeit yes that's samsung, not tsmc), even with an otherwise completely unaltered chip (so no tweaked design rules or anything). (Of course AMD did not actually release a card with 20% higher efficiency, instead they cranked up the clocks some more on the RX 590 until it had the same terrible efficiency as the RX 580, but that's another story...)
And depending on which numbers you look at, I don't think the improvement is really quite 30% in any case (on average): https://www.techpowerup.com/reviews/MSI/GeForce_GTX_1650_Gaming_X/28.html - non-oc cards (without increased TDP) will fare significantly better than oc ones, be that pascal or turing. So the GTX 1650 there (with an increased TDP) fares just 10% better than the very frugal (never exceeding 60W) GTX 1050TI this site used, but roughly 30% better than the GTX 1060 (which are ranked worst in that metric for all pascal cards in this particular test).

FWIW I don't think Polaris gains came all from just a node change either and since it's all speculation, we might aswell leave it at that.

(As for concurrent int/fp, I'm not entirely convinced, since apps are already using that - yes if you've got the really right mix of instructions there you should see more gains, but clearly this is also already contributing to the SMs being faster (but bigger) per clock than on pascal.)

The power efficiency gains come arguably from the separate simplified ALUs requiring less power than a "fat" do-it-all ALU, not from the concurrent execution itself. When concurrent execution occurs the combined power is probably higher, but since Int ops are far less common than FP ones the average power is likely lower (plus there's extra performance of course).
 
However, the fact that you said "new features (which the low end turings don't get)" makes me question if you really are understanding all the actual new features that Turing brings. Tensor cores and Ray-tracing are the flashy ones, but at least IMO Mesh Shaders and Texture Space Shading, along with all the under the hood changes that were required to make them posible, are far more relevant and the reason I believe Turing wins. I also believe that the architectural changes that bringing those features required could also enable some other future things.
Alright, I agree mesh and task shaders could be an important new feature. I'm not entirely sure how much of an architectural change this actually required (clearly this seems to go in a similar direction than Vega's Primitive Shaders, though more fleshed out).
And indeed if you take Volta and Turing as one step, I could agree it's definitely a big change overall - Volta got the revamped SMs (including Tensor Cores), Turing Mesh/Task Shaders, RTX, variable rate shading and some more. Obviously though Fermi also brought a ton of new features (everything required for d3d11), I'd still pick that as more innovative, but nevertheless I guess I exaggerated a bit with the "not even close" part :).
 
Alright, I agree mesh and task shaders could be an important new feature. I'm not entirely sure how much of an architectural change this actually required (clearly this seems to go in a similar direction than Vega's Primitive Shaders, though more fleshed out).
And indeed if you take Volta and Turing as one step, I could agree it's definitely a big change overall - Volta got the revamped SMs (including Tensor Cores), Turing Mesh/Task Shaders, RTX, variable rate shading and some more. Obviously though Fermi also brought a ton of new features (everything required for d3d11), I'd still pick that as more innovative, but nevertheless I guess I exaggerated a bit with the "not even close" part :).

Yeah it was definitely the "not even close" part that I found curious.
 
I agree Fermi was a much bigger change architecturally than Volta/Turing (although Volta/Turing arguably have the potential to change what kind of software can be and will be commonly executed on GPUs more than Fermi did, despite the hardware changes being smaller).

AFAIK, Fermi was a full RTL rewrite - obviously the engineers had access to G80/GT200's RTL to base things on, they weren't left on a desert island and asked to design a GPU from scratch (as fun as that might sound), but it was still a full rewrite rather than "just" significant changes.

And AFAIK (but I could be wrong), NVIDIA hasn't done a full rewrite since Fermi, just one big incremental change after another; some modules were fully rewritten, but not the whole thing. So in that sense, it's clear that Turing isn't the biggest change since G80 RTL-wise, since Fermi was a much much bigger change in terms of the hardware codebase.

Raytracing is a big deal in terms of software, but in terms of hardware, the changes for what NVIDIA is doing don't seem anywhere near as complex as the changes they made in Fermi. What they're doing HW-wise also seems simpler than what we did at PowerVR - that's possibly a good thing as their solution probably takes a lot less area, but I guess we'll never know.
 
Another fact for the 1650 that some reviewers have brought up is that it's using the Volta NVENC, not the Turing NVENC... for reasons?
 
According to AnandTech's review, the main difference between Volta and Turin's NVENC is support for HVEC B-frames. I'm not familiar with HVEC enough to know how complex is it to support its B-frames, so it's possible that cost is one of the concern, though I suspect that royalty cost is probably much more likely.
 
According to AnandTech's review, the main difference between Volta and Turin's NVENC is support for HVEC B-frames. I'm not familiar with HVEC enough to know how complex is it to support its B-frames, so it's possible that cost is one of the concern, though I suspect that royalty cost is probably much more likely.
The article mentioned that Nvidia stated die size impact was the main factor in the decision.
 
The article mentioned that Nvidia stated die size impact was the main factor in the decision.

While it's possible that die size was the main factor, but apparently Intel Quicksync Video supports encoding HVEC B-frames, though I'm not sure if its a "real" B-frame. If it is, then the die size can't be too large if Intel was able to put it into an IGPU.
 
Two rumors circulating about the Nvidia"SUPER" video. It's either a 100$ price reduction for current RTX 2060, RTX 2070 and RTX 2080, or it is a refreshed RTX 20-series with higher clock rates and fast GDDR6 memory.

Rumor: NVIDIA Super is a Refreshed GeForce RTX 2060 ($249), 2070 ($399), 2080 ($599)
Word right now is that NVIDIA will be releasing alternatives for the RTX 2060, RTX 2070 and RTX 2080. The cards would get faster GDDR6 memory, but also higher clock rates due to binning the GPUs. Naturally, when things get a notch faster you expect them to be priced higher, now another rumor contradicts that claim as Nvidia would be planning to undercut the price by $100. If you tally things up then in chronological order the MSRP would be $249, $399 and $599 respectively.

A $399 price would at the very least bring the 2070 SUPER in the NAVI playing field pricing and likely performance wise. The timing of all this is just silly though, in the middle of the summer vacation time. It is now suggested that NVIDIA will present the new Super products (super charged) at E3 2019, trying to steal some thunder from AMD, who is expected to make announcements regarding Navi at E3.
https://www.guru3d.com/news-story/r...force-rtx-2060-(249)2070-(399)2080-(599).html
 
Back
Top