Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Discussion in 'Architecture and Products' started by Geeforcer, Nov 12, 2017.

Tags:
Thread Status:
Not open for further replies.
  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,130
    Likes Received:
    3,019
    Location:
    Finland
    Because they will be essential in future and consoles have long cycle.
    Tensor-cores are there for AI-workloads (and by no means exclusive to NVIDIA). Also while "hardware always comes first" the hardware doesn't come if you don't know what it's coming for - if same chips weren't used for AI workloads there wouldn't be Tensor-cores in them
     
    w0lfram likes this.
  2. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,286
    Likes Received:
    3,543
    Nothing wrong with repurposing one's strengths onto another segment and carving out a new frontier in the process, If NVIDIA succeeds in making DLSS widely available (and continues on the success of the latest version) it will change the industry forever, that will happen by reiterating on the process and trying different things, seeing what sticks and what doesn't, instead of slacking off and waiting for chips to fall into their places, that's generally how you become a leader in your field. That's how NVIDIA established itself with CUDA and AI anyway.

    Also Tensors can be used for accelerating DirectML if that ever takes off.
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,602
    Likes Received:
    643
    Location:
    New York
    Yeah gaming was certainly not the primary target for Nvidia’s AI efforts but give credit to them for finding a way to make the hardware relevant to gamers anyway.

    They will have serious competition very soon though. JHH keeps touting the need for flexible AI accelerators which makes sense but that flexibility doesn’t need to be bundled with billions of useless graphics focused transistors. Someone will eventually build a cheaper and faster AI accelerator with the software to back it up. Only a matter of time.

    One thing in their favor is that for AI applications where visualization or image processing is also important their products provide the total package. It’s a niche within a niche though.
     
    w0lfram and Lightman like this.
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,286
    Likes Received:
    3,543
    Right now, their greatest threat comes from the Alveo lineup from Xilinx, however FPGAs have a much harder programming curve and some potential latency problems, compile time is also significantly longer.

    Other potential competition is of course Intel and it's multi front effort in AI: FPGAs, ASIC, and GPUs, but they seem to be lagging off a bit.

    Intel canned their Xeon Phi in favor of solutions from Nervana, then canned Nervana's solutions in favor of Hebana's, their Altera FPGAs are also behind Xilinx, and their GPU initiative is yet to be proven, they have also yet to present a reliable software stack that goes toe to toe with CUDA, but they do remain the sleeping giant that can be awaken at any moment.

    Google seems uninterested in commercializing their ASIC AI approach (TPUs) for unknown reasons. And AMD seems to be scattered on too many fronts at the moment and has a very weak footing in AI markets compared to even Intel.

    There is also the potential startup from time to time, but the big ones seem to be snatched up quickly by either Google or Intel or maybe even NVIDIA?
     
    Rootax, pharma and Lightman like this.
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,130
    Likes Received:
    3,019
    Location:
    Finland
    We might have had those for some time already, but Google doesn't sell their TPUs out, just lease them in the cloud I think? (at least https://mlperf.org/training-results-0-6/ suggests the performance is about where it needs to be and they're not carrying the extra transistor load from GPU-parts, of course Google hasn't told how many transitors they're using for this either, but certainly less than NVIDIA since there's no GPU in the same chip)
     
  6. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38
    All of that^ is about to pass.

    AMD provides enough price/performance gap, to make Radeon the best mainstream choice. And Nvidia will never be able to compete with navi until they are on 7nm. AMD is making mega profits, because of the margins on these navi chips. They have room to go lower in price, where Nvidia can't compete.


    Secondly, we know that Nvidia is years behind AMD.

    AMD itself has been using 7nm for over 2 years and is TSMC's largest 7nm partner. Nvidia was essentially locked out of 7nm and still do not have a 7nm product. And we all know that rdna2 is on 7nm+, so how do you even suggest that Nvidia is a node ahead of AMD? It's confusing.

    Ampere itself, will undoubtedly be a great success, but Nvidia's dominance at high-end gaming (2070s/280s/2080ti,etc) is about to come to and end. Because Nvidia can not keep using the hand-me-down/trickle-down business model, of selling low binned a.i. cards to Gamer's. Nvidia is going to have to come out with their own gaming architecture to combat rdna2.

    I suggest, that will take a few years for Nvidia/Huang to scheme something up. Given the whitepapers, rdna1 wet our lips, rdna2 white papers might require a whole re balancing of the gaming industry...


    ...since we are all speculating.
     
  7. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    114
    Likes Received:
    138
    lol
     
  8. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,943
    Likes Received:
    2,286
    Location:
    Germany
    I wonder if RTG gets different pricing on TSMC 7nm wafers than AMDs CPU group. This is fresh from ISSCC:
    https://pc.watch.impress.co.jp/img/pcw/docs/1236/258/html/photo018_o.jpg.html
    From fiddling with lines in that diagram, it looks to me as if the yielded sqmm is about 70% more expensive in 7 nm. And going from the densities given at TPU.com's database, Navi10 achieves about a 64-68% higher density in 7 nm than TU104-TU106 in 12 nm. Now, of course that doesn't take into account the respective rebates each company is getting, the possible pricing difference between 16 and 12 nm at TSMC, the yield recovery in each chip etc.

    But generally, I'm not inclined to see Nvidia at a massive financial disadavantage right now.

    And when they can switch to 7 or 5 nm successfully, they probably will have a larger window for improving frequency or power consumption – even without taking into account a new µarch.

    Possibly – since we're all speculating – they could announce Ampere as a Volta successor, while keeping back Hopper for a later date and just shrinking Turing to 7nm to exploit said process, which would go along the lines of backdrilling-rumors for improving signal integrity on the PCBs, which is needed for even higher freqs.
     
    DavidGraham likes this.
  9. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    405
    Likes Received:
    431
    well from financial results, its totally the opposite. AMD barely breaks even on their GPU business where Nvidia has 65% gross margin...
     
  10. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,512
    Likes Received:
    873
    Location:
    France
    wOlfram was not serious with his post... Was he ?
     
  11. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,686
    Likes Received:
    897
    Obviously not, see his style of writing also ;)
     
    Lightman likes this.
  12. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,499
    Likes Received:
    242
    Location:
    msk.ru/spb.ru
    Turing is already a better "gaming architecture" than RDNA2.
     
    xpea, Remij, pharma and 2 others like this.
  13. A1xLLcqAgt0qc2RyMz0y

    Veteran Regular

    Joined:
    Feb 6, 2010
    Messages:
    1,431
    Likes Received:
    1,108
    PSman1700 and pharma like this.
  14. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,683
    Likes Received:
    3,758
    Location:
    Pennsylvania
    Yeah, he lives in a completely deluded world. His posts are full of these fantasies.
     
    PSman1700 likes this.
  15. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,686
    Likes Received:
    897
    He would feel right at home in the baseless console sections :D
     
  16. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,718
    Likes Received:
    7,357
    Location:
    ಠ_ಠ
    hm...

    So... would it follow for them to double up the FP32 cores vs INT :?: or would that just be coincidental

    Also, wouldn't they need to increase the register file to match the increase in ALUs?
     
  17. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    184
    Likes Received:
    107
    thay just copying AMD with RDNA2. isn´t it obvious ?? :wink4:
     
  18. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    127
    Likes Received:
    154
    Do they really need to increase the register file? They doubled the register files for Turing compared to Pascal, because of it's concurrent FP32+INT32 excecution.
    Each Sub-SM has 16 FP32 and 16 INT32 cores it can use concurrently with 16kb register file. Pascal was 32 FP32 cores with 16kb.

    Now add 16 FP32 Cores for Ampere and you can use either 32 FP32 Cores or 16 FP32+16INT32 Cores? Maybe adding Ld/St and SFU would be enough? Of course not everyone would be happy to go back to the same register file size /FP32 ratio as in pascal, but if you can pack many more units and just loose some efficiency.

    Just pure speculation, but maybe someone with more knowleadge could explain what would be needed if they double FP32 per SM.

    With Mesh Shaders in DX12 it's at least clear, that future chips should be more compute/RT/TC focused. Nvidia doesn't need many more GPCs with more Rasterization/Tesselation Speed, as the importance of this stuff should go down.
     
    #498 Samwell, Feb 23, 2020
    Last edited: Feb 23, 2020
    pharma likes this.
  19. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,718
    Likes Received:
    7,357
    Location:
    ಠ_ಠ
    Oh, I was looking through the Pascal whitepaper, and it showed 128KB per sub-ALU grouping (32768*4B) for 32 FP32. Maxwell has 64KB w/ 32*FP32 in the sub-SM, but I figured the Pascal amount was the progression. Volta/Turing both show 16*FP32 w/ 64KB RF in the groupings.

    idk, hence the question :p
     
  20. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,602
    Likes Received:
    643
    Location:
    New York
    Volta was built for full speed INT+FP at the cost of maximum FP throughput. If this random twitter rumor is true then it's just Nvidia prioritizing raw FP throughput.

    Register file size or bandwidth? Register file size doesn't matter but bandwidth does. If they don't increase bandwidth then the scheduler can't gather all the operands for 32 FP32 FMAs + 16 INT32 ops in one cycle.

    So issuing an INT operation will cause single cycle bubbles in the FP32 execution pipeline. It's no worse than Pascal. Would be interesting to know how much a 16 wide INT32 pipeline costs vs the dual purpose FP32/INT32 pipes in Pascal and Navi.
     
    w0lfram, PSman1700, nnunn and 2 others like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...