Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Discussion in 'Architecture and Products' started by Geeforcer, Nov 12, 2017.

Tags:
Thread Status:
Not open for further replies.
  1. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,606
    Likes Received:
    654
    Location:
    New York
    Interesting. I suppose it’s fortunate for Nvidia that Turing is keeping pace on 12nm so far.

    A quick google turns up a few recent rumors of disastrous 7nm yields at Samsung but nothing Nvidia specific.
     
  2. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,096
    Likes Received:
    6,443
    Don't forget that GTX 480 also came out half a year after Radeon 5870. When 5870 came out it was competing against the GTX 285. And even then, the GTX 480 and Radeon 5870 were basically even trading leadership depending on which games were tested.

    Man, looking back at prior generations, GPU progress has slowed down a LOT for both companies.

    Prior generations ~6-9 months per generation (counting mid gen as a generation as performance jumps for mid gen were generally similar to modern day new generations)
    200 series to 400 series ~1.75 years
    400 series to 500 series ~7 months
    500 series to 600 series ~1.25 years
    600 series to 700 series was ~1 year.
    700 series to 900 series was ~1.5 years.
    900 series to 1000 series was ~1.66 years
    1000 series to 2000 series was ~2.33 years

    Looking at that it also becomes quite evident that GTX 480 was significantly delayed. NV had problems getting the chip to where they wanted it be. And even then it was a HOT running chip that consumed a lot of electricity, especially when compared to the 5870.

    Also, while NV were making really large chips, ATI/AMD were making relatively speaking much smaller chips. So while NV had the performance crown, AMD had the perf/mm and perf/watt crown up until Maxwell. Radeon 2900 really scarred ATI/AMD WRT building large chips for a long time.

    AMD progress is even slower. And I don't expect things to get better from here on out. Then again as old as I am now, 2 years feels like how 1 year used to feel. :D

    Regards,
    SB
     
    #342 Silent_Buddha, Dec 25, 2019
    Last edited: Dec 25, 2019
    Pixel, Cuthalu and no-X like this.
  3. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,289
    Likes Received:
    3,550
    The comparison is for gen vs gen, the HD 2900 XT also came out 6 months late, same for HD 290X which came out 10 months late.
    They were in several games, but the GTX 480 came out on top more, it was 10~15% faster according to Anandtech and TPU.

    That's true. IMO, AMD could have out muscled NVIDIA during the HD 4870/GTX 280 era with a bigger chip. The problem is the HD 4870 was only 30w~40w lower than GTX 280 in average power consumption, it was also hotter in operating temps, and on a smaller node (55nm) vs NVIDIA's 65nm, so AMD might have feared pushing the chip too far, or the new process didn't allow them much headway in die size early in the life cycle of the node.

    AMD then corrected course with the HD 4890 and tried pushing the chip harder with a clock uplift, they ended up jacking power consumption higher, while NVIDIA migrated to 55nm quickly and released the GTX 285 with lower power consumption which allowed them to increase clocks and maintain the lead, it was a close call between the two in power consumption at the end, which probably explains why AMD was fearful of a bigger die.

    However the real wasted opportunity for AMD was the HD 5870, which was miles ahead of the power drunk Fermi arch (GTX 480), which failed hard to capitalize on the 40nm node, and was released with cut cores and reduced clocks, AMD most likely didn't expect the GTX 480 to fail so spectacularly in efficiency and to be castrated by this much, so they played their hand conservatively, they were also planning an architectural migration from VLIW5 to VLIW4 (as they were under pressure to correct their compute deficiency), however that didn't turn out to be good against the fixed Fermi (GTX 580 with full die and originally planned clocks). Thinking back, the fixed Fermi stood it's ground (performance wise) against two arch variations from AMD (HD 5870 and HD 6970).
     
    #343 DavidGraham, Dec 25, 2019
    Last edited: Dec 25, 2019
    yuri, pharma and nnunn like this.
  4. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    127
    Likes Received:
    154
    With the numbers for Orin published we have first official performance numbers for Ampere, so let's have some fun with numbers to see what we can expect of Ampere and whether different infos fit together:
    What do we know?
    Official Info: Orin Plattform with 2 Orin is 400 Int8 TOP. 2GPU Version with 2 Orin ist 2000 TOPS. So one Ampere GPU is 800 TOPS INT8 at 300W power. With 300 W per GPU it's sure they use a big GPU in the car this time.
    Then we have this one Twitteruser, which posted all the super infos and the codename hopper as the first person, it'S pretty sure he has some real insight. According to him in HPC it should be 8 GPC with 8 TPCs with doubled TCs per SM, 6 HBM.
    128 SM, 8192 Shader. Each SM 64 FP32 Cudacores and 16 TCs (8 TCs per SM before).

    New Process, big Chip -> ~120 SM in the end product. 1920 TC. 1 TC is 256 TOPs Int8. Leads us to 1627mhz for 800 TOPs, 11% higher clock as V100.

    Sounds pretty reasonable for me and fits together with just one problem, 2 HPC big chips in one car computer? So far they used the consumer line, as they only need inference speed. Therefore i'm sceptical in this case.

    So let's think of a AM102 configuration, which would fit in these numbers.
    A smaller chip with 8 GPCs with 7 TPCs each. 7168 Shader, cut off 256 =6912 Shader, 54 TPC. 1728TC with 1800mhz we're at 800 TOPs. That's 11% more than TU102 FE, which seems alright and might fit in a car.
    2080Ti FE 14,2 TFlops FP32. Speculated AM102 configuration: 24,8 Tflops, maybe 24 if you cut more SM for the consumer product. At 24 TFlops we would have 69% more shader power than a 2080Ti. Should hopefully translate to 50-60% more speed in games.

    7168 Shader are 55% more than TU102. Add more tensors, small architecture improvements, more in RT (I think more like Maxwell->Pascal). But with the number of rops, memory interface not growing, just 2 more gpcs and caches also not growing much,everything beside SMs would increase in size less than 55%. So it might work out with 55% more transistors than TU102.
    This GPU is 7nm EUV, i hope we get more like a 2x transistor density vs 16nm unlike the DUV process. 754mm²/2x1,55= 584mm²

    So that's my educated guess, how a possible AM102 might look. Feel free to discuss and destroy my assumptions :)
     
    Man from Atlantis likes this.
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,137
    Likes Received:
    3,037
    Location:
    Finland
    Even though NVIDIA says Orin is using "next gen GPU architecture" it doesn't mean it's Ampere (assuming here that Ampere is the next gen desktop, which isn't a given) - in fact the little they told suggests it's heavy lifting Tensor-GPU more than traditional GPU and I doubt that will follow to desktop
     
  6. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    127
    Likes Received:
    154
    Xavier is coupled with Turing, as Orin will come 2 years after Xavier, i'm pretty sure it'll be next years architecture, whatever it's called.

    It's funny, as we had the exact same discussion with tensor cores and Turing. So many people were sure, that Turing won't have Tensor Cores. I wouldn't be surprised by more TCs also in the desktop lineup. We have EGX server based on GPUs also in desktops for inference, adobe and other software makers are exploring the possibilities of inference in their software, so Quadros might profit from it and we have win ml/direct ml which might also lead to DL inferencing in games (beside DLSS). The whole software world is researching the possibilities of DL, so it's not such a bad bet to try to be the best in it also in the consumer space.
     
    pharma and DavidGraham like this.
  7. Adonisds

    Newcomer

    Joined:
    Feb 10, 2019
    Messages:
    57
    Likes Received:
    31
    400 to 500 and 600 to 700 shouldn't really be considered a generation jump because the architecture was the same. If you do, for consistency you should also consider 2000 Super a new generation
     
    Picao84 likes this.
  8. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    219
    Likes Received:
    39
    What separates Hopper, from Ampere ..?
     
  9. ionutkkk

    Newcomer

    Joined:
    Mar 25, 2012
    Messages:
    2
    Likes Received:
    1
    I think the MCM approach..
     
  10. Shortbread

    Shortbread Island Hopper
    Veteran

    Joined:
    Jul 1, 2013
    Messages:
    4,108
    Likes Received:
    2,333
  11. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,718
    Likes Received:
    905
    They have a real monster waiting.
     
  12. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    219
    Likes Received:
    39
    I doubt that is true for gaming Ampere.
     
    Lightman likes this.
  13. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,289
    Likes Received:
    3,550
    50% more performance at half the power would mean that Ampere is 3X more power efficient than Turing, which is almost unheard of in the industry, it's hard to believe that is true outside of some special cases, maybe in Ray Tracing for example.

    Not to mention that NVIDIA will push this to the limit, and we can have a monstrous 7nm die that is 150% than a 2080Ti, still sounds hard to believe.

    More plausible would be 50% more performance OR half the power.
     
    Kej, PSman1700, milk and 4 others like this.
  14. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    288
    Likes Received:
    189
    I don't find it outside the realm of plausibility tbh, considering the 2 node jumps in between the two architectures. We've never had that before afair.

    EDIT: Also by leveraging that (posible) advantage they may be able to set the chips much lower on the power curve.
     
  15. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    280
    Likes Received:
    163
    You think a GPU 50% faster than a 2080ti at 125-150 watts is plausible?
     
  16. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    288
    Likes Received:
    189
    Yes, if anything I would need to ask why not. I don't consider it a given, but it's completely plausible imho.

    With 7nm they have over 3x the ransistor density and 40% more performance or 60% lower power. They have no competition in the higher-end so no pressure to offer a greater performance advantage, so that's great oportunity to clock it very low on the performance curve (also leaving a margin for later on). If you would tell me 100% performance uplift and same 3x power efficiency gain, I would say definitively less plausible, but a measly 50% increase all things considered? I don't see any problem in considering it plausible.
     
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,137
    Likes Received:
    3,037
    Location:
    Finland
    2 node jumps? AFAIK we don't know wether NVIDIA will use 7nm or 7nm+ and regardless of which they pick, it's really only 1 node jump (I wouldn't call 7nm > 7nm+ a node jump)
     
  18. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    127
    Likes Received:
    154
    It's pretty sure they'll use 7nm+. But yes, of course it's more like 1 node jump and at least 7nm DUV isn't even a full node jump, as you see in AMDs product density improvements of x1,6.
     
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,137
    Likes Received:
    3,037
    Location:
    Finland
    You can't compare different manufacturers nodes directly like that, we don't know what kind of density AMD would have offered on TSMC 16/12nm process, it could have been lower or higher than GloFo 14/12nm
     
  20. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    288
    Likes Received:
    189
    1- 16/12nm -> 10nm
    2- 10nm -> 7nm
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...