Nvidia Turing Product Reviews and Previews: (2080TI, 2080, 2070, 2060, 1660, etc)

Discussion in 'Architecture and Products' started by Ike Turner, Aug 21, 2018.

  1. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,544
    Likes Received:
    4,203
    Cheaper? Yes.
    A lot cheaper? Nah. Like @Kaotik wrote, 2x8Gb GDDR5 chips should be lower than $13 right now, and the price difference in the PCB for the additional traces should be almost negligible.


    Yes, and this is exactly why nVidia tried to block informed opinions from reaching the consumer, as much as they could.
    Monopolies are terrible.
     
    Kaotik likes this.
  2. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,004
    Likes Received:
    109
    I don't think it has to be THAT bad though. Yes turing doesn't quite reach the same perf/area as pascal on average, but on some compute workloads it can easily exceed it.
    And for this particular chip, I believe it desperately needs gddr6 memory, it is just too bandwidth constrained with gddr5, minimally better bandwidth savings or not, the GTX 1060 has 50% more bandwidth than the GTX 1650. (The rumors are already saying there's going to be a GTX 1650 Ti with a full chip configuration, but still only with gddr5 memory - well if that's the case it's not going anywhere and I don't know why nvidia would even bother with this configuration, as that would at most still be barely competitive with the RX 570.)
    The GTX 1650 also looks so bad because the RX 570 is very cheap compared with anything else (even from AMD) comparatively. Against the GTX 1050 Ti it's pretty decent considering price and performance (of course the 1050Ti looks like a complete joke against the RX 570 there...).
     
    Jozape likes this.
  3. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,544
    Likes Received:
    4,203
    Yet there's no actual proof that those features will ever put the Turing chips with better performance than the Pascal chips with similar die size.
    How many "forward-looking" features fell flat on expected results so far, in the PC space?​


    There may never be a fully enabled desktop TU117.
    That GPU already exists as the laptop GTX 1650 which is fully enabled but clocks lower than the desktop version.
     
    #783 ToTTenTranz, Apr 24, 2019
    Last edited by a moderator: Apr 24, 2019
  4. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    216
    Likes Received:
    131
    Wolfenstein II kinda proves it I think, with 2070 faster than 1080 Ti and 1660Ti equal to 1080. That's before using adaptive shading, where Turing gains an aditional 5%.

    Regardless, absence of evidence is not evidence of absence.

    It's far from unrealistic that an architecture with 2x FP16 throuput would benefit from more FP16 math being used, for example.

    Finally, why is perf/mm2 so important? It's clear that Nvidia went for perf/watt with Turing and looking at future nodes which come with greater density benefits than efficiency gains, I can't say I disagree with the move.


    Idk, how many succeded? And how does past failure/success affect future developments exactly?
     
    Florin and pharma like this.
  5. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,004
    Likes Received:
    109
    I'm really wondering though if the better perf/watt isn't mostly from the tweaked 12nm tech. (This is definitely unlike maxwell, which had both much better perf/watt as well as perf/area on the same node.)
    Overall, I don't think Turing has really raised the bar there much (that's not to say the architecture isn't quite different), nvidia claiming it's their most innovative change since the g80 or not (if you ask me, that's not even close). This is not necessarily a bad thing though, considering the gcn iterations amd did in the timeframe when nvidia was doing kepler->turing, none of them really was all that much of an improvement.
     
  6. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    216
    Likes Received:
    131
    When comparing the 1660 Ti to the 1060 the efficiency is like 30% higher according to the reviews I've seen. That seems too much for just the node, especially when it's mostly a refinement with a name change. Anyway, I was speking more in the sense of having separate FP32, FP16 and INT ALUs. That seems the kind of move to reduce power at the expense of area.

    In regards to innovation, I don't know the context in which that was said, but in my opinion it is the most innovative since G80. I mean, since G80, in terms of features and changes to the SM/TPC which other architecture makes Turing be "not even close" to being "the most innovative since G80"?
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,004
    Likes Received:
    109
    Fermi. Yes the actually released chips had, to put it mildly, some issues so the cards weren't all that great, but from an architecture point of view it was huge imho. Forget SMs, the whole workload distribution (with multiple rasterizers etc., all backed by a fully unified L2 including ROPs, something AMD didn't achieve until Vega) was quite innovative and imho very important in retrospect (I kind of thought it was a bit overengineered at that time), although of course it's not really a user-visible feature as such.
    Kepler and Maxwell both did more for perf/w and perf/area than Turing (difficult to say for sure for Kepler due to 40->28nm tech, I'm excluding Pascal here since it is a (very successfully) tuned for higher frequencies maxwell mostly, and without 14nm probably wouldn't really improve things much (if it would be even possible at 28nm)).
    Yes Turing is architecture wise quite a change from pascal, but the SMs are mostly borrowed from Volta anyway, with some more features bolted on (so if you count it as 2 steps from pascal, volta is imho the much bigger change than turing), and what counts is the end result, and other than the new features (which the low end turings don't get) it just isn't that impressive imho compared to pascal. Not that it's bad mind you, amd still has a lot of catchup to do...
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  8. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,004
    Likes Received:
    109
    30% efficiency improvement would seem a bit on the high side for a refinement node change. But I think gddr6 helps a bit there (with the gddr5 cards, the improvement tends to be less). And as a counter point, amd got about a 20% efficiency improvement with polaris 30 with such a node refinement (albeit yes that's samsung, not tsmc), even with an otherwise completely unaltered chip (so no tweaked design rules or anything). (Of course AMD did not actually release a card with 20% higher efficiency, instead they cranked up the clocks some more on the RX 590 until it had the same terrible efficiency as the RX 580, but that's another story...)
    And depending on which numbers you look at, I don't think the improvement is really quite 30% in any case (on average): https://www.techpowerup.com/reviews/MSI/GeForce_GTX_1650_Gaming_X/28.html - non-oc cards (without increased TDP) will fare significantly better than oc ones, be that pascal or turing. So the GTX 1650 there (with an increased TDP) fares just 10% better than the very frugal (never exceeding 60W) GTX 1050TI this site used, but roughly 30% better than the GTX 1060 (which are ranked worst in that metric for all pascal cards in this particular test).
    So I'm thinking that it could indeed be mostly process refinement which help with efficiency. But indeed it will have higher efficiency if you can put fp16 to good use. (As for concurrent int/fp, I'm not entirely convinced, since apps are already using that - yes if you've got the really right mix of instructions there you should see more gains, but clearly this is also already contributing to the SMs being faster (but bigger) per clock than on pascal.)
     
    #788 mczak, Apr 25, 2019
    Last edited: Apr 25, 2019
  9. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    216
    Likes Received:
    131
    It's true that a lot of the SM changes came with Volta, I just tend to put Volta and Turing together since I see them more as a parallel to P100 and the rest of Pascal, rather than as a completely different architecture.

    Also it's true that Fermi was quite innovative, and a good candidate, but even if I picked it above Turing, I wouldn't say that Turing is not even close, which is why I asked for your opinion. I guess that it depends a little bit on what you focus on.

    However, the fact that you said "new features (which the low end turings don't get)" makes me question if you really are understanding all the actual new features that Turing brings. Tensor cores and Ray-tracing are the flashy ones, but at least IMO Mesh Shaders and Texture Space Shading, along with all the under the hood changes that were required to make them posible, are far more relevant and the reason I believe Turing wins. I also believe that the architectural changes that bringing those features required could also enable some other future things.
     
    OCASM, pharma and DavidGraham like this.
  10. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    216
    Likes Received:
    131
    FWIW I don't think Polaris gains came all from just a node change either and since it's all speculation, we might aswell leave it at that.

    The power efficiency gains come arguably from the separate simplified ALUs requiring less power than a "fat" do-it-all ALU, not from the concurrent execution itself. When concurrent execution occurs the combined power is probably higher, but since Int ops are far less common than FP ones the average power is likely lower (plus there's extra performance of course).
     
  11. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,004
    Likes Received:
    109
    Alright, I agree mesh and task shaders could be an important new feature. I'm not entirely sure how much of an architectural change this actually required (clearly this seems to go in a similar direction than Vega's Primitive Shaders, though more fleshed out).
    And indeed if you take Volta and Turing as one step, I could agree it's definitely a big change overall - Volta got the revamped SMs (including Tensor Cores), Turing Mesh/Task Shaders, RTX, variable rate shading and some more. Obviously though Fermi also brought a ton of new features (everything required for d3d11), I'd still pick that as more innovative, but nevertheless I guess I exaggerated a bit with the "not even close" part :).
     
  12. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    216
    Likes Received:
    131
    Yeah it was definitely the "not even close" part that I found curious.
     
  13. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    I agree Fermi was a much bigger change architecturally than Volta/Turing (although Volta/Turing arguably have the potential to change what kind of software can be and will be commonly executed on GPUs more than Fermi did, despite the hardware changes being smaller).

    AFAIK, Fermi was a full RTL rewrite - obviously the engineers had access to G80/GT200's RTL to base things on, they weren't left on a desert island and asked to design a GPU from scratch (as fun as that might sound), but it was still a full rewrite rather than "just" significant changes.

    And AFAIK (but I could be wrong), NVIDIA hasn't done a full rewrite since Fermi, just one big incremental change after another; some modules were fully rewritten, but not the whole thing. So in that sense, it's clear that Turing isn't the biggest change since G80 RTL-wise, since Fermi was a much much bigger change in terms of the hardware codebase.

    Raytracing is a big deal in terms of software, but in terms of hardware, the changes for what NVIDIA is doing don't seem anywhere near as complex as the changes they made in Fermi. What they're doing HW-wise also seems simpler than what we did at PowerVR - that's possibly a good thing as their solution probably takes a lot less area, but I guess we'll never know.
     
    Ike Turner and fuboi like this.
  14. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,771
    Likes Received:
    2,819
    Location:
    Pennsylvania
    Another fact for the 1650 that some reviewers have brought up is that it's using the Volta NVENC, not the Turing NVENC... for reasons?
     
  15. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    216
    Likes Received:
    131
    Size I gues? Maybe some not so obvious cost, i.e. royalties?
     
  16. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,771
    Likes Received:
    2,819
    Location:
    Pennsylvania
    That could be a possibility if Turing NVENC is using some 3rd party tech.
     
  17. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,731
    Likes Received:
    99
    Location:
    Taiwan
    According to AnandTech's review, the main difference between Volta and Turin's NVENC is support for HVEC B-frames. I'm not familiar with HVEC enough to know how complex is it to support its B-frames, so it's possible that cost is one of the concern, though I suspect that royalty cost is probably much more likely.
     
  18. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,771
    Likes Received:
    2,819
    Location:
    Pennsylvania
    The article mentioned that Nvidia stated die size impact was the main factor in the decision.
     
  19. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,731
    Likes Received:
    99
    Location:
    Taiwan
    While it's possible that die size was the main factor, but apparently Intel Quicksync Video supports encoding HVEC B-frames, though I'm not sure if its a "real" B-frame. If it is, then the die size can't be too large if Intel was able to put it into an IGPU.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...