Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

  1. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    771
    Likes Received:
    361
    Nah, it's due to all the MAC piles required for GEMM going brrrrr at alarming speeds.
    See any other ML part; they all hit very nice density figures for their respective nodes.
    Nah N7+ has all the same options as vanilla N7.
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,608
    Likes Received:
    664
    Location:
    New York
    Isn’t cache also relatively dense? A100 has a load of that too.
     
  3. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    771
    Likes Received:
    361
    Kinda, depends on the implementation.
    Zen2 CCD is mostly SRAM, but is nowhere near the peak theoretical density (or what mobile SoCs achieve for that matter).
     
  4. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    751
    Likes Received:
    320
    Tensor cores take only about 10% of the die in Turing, even if that would be 20% for Ampere, that can not explain the high transistor density.

    If that is the case, N7+ is a more plausible explanation for the high density compared to vanilla N7.
     
    #364 Voxilla, Jul 8, 2020
    Last edited: Jul 8, 2020
  5. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    771
    Likes Received:
    361
    They're xtor-dense MAC piles.
    Nah, see RDNA2 stuffz and Zen3 and Kirin 990 5G.
     
  6. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    751
    Likes Received:
    320
    Just read about this new Graphcore AI processor, eclipsing the A100, with 59,4 billion transistors on 823 mm2.
    That is a transistor density of 72 MT/mm2.
    As this is more than the max of 66,7 MT/mm2 for vanilla 7nm, this also must be using N7+.
    The high density is a result of the 900 MB on chip memory this Graphcore has.
    1 bit of SRAM requires 6 transistors, so the 900 MB translates to 43,2 B transistors...
     
    #366 Voxilla, Jul 16, 2020
    Last edited: Jul 16, 2020
    Lightman likes this.
  7. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,144
    Likes Received:
    3,043
    Location:
    Finland
    It doesn't say it's using HP cells, so doesn't need to be N7+ necessarily.
     
    #367 Kaotik, Jul 16, 2020
    Last edited: Jul 16, 2020
  8. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    751
    Likes Received:
    320
    If it would use HD at 91.2 MT/mm2 (which is only suitable for mobile SoCs AFAIK), the 900 MB would fit in 473mm2, leaving the remaining 16.2 B non SRAM transistors on 350 mm2, for a density of 46.3 MT/mm2. I don't think so.
    And in case you would further doubt, SRAM is the densest way you can pack transistors.
     
    #368 Voxilla, Jul 16, 2020
    Last edited: Jul 16, 2020
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,144
    Likes Received:
    3,043
    Location:
    Finland
    Or then TSMCs estimates on transistor density aren't counted on just SRAM cells, but assume certain proportion of memory vs logic vs phys and whatnot.
    For what it's worth, Hexus says it's N7

    https://hexus.net/tech/news/cpu/144154-graphcore-ipu-machine-m2000-1u-blade-capable-1petaflop/
     
    BRiT likes this.
  10. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    751
    Likes Received:
    320
    No that would be ridiculous. The first thing tried on a new process is always some SRAM and based on that density is specified.
    Graphcore says: "...Graphcore Colossus™ Mk2 GC200 IPU. Developed using TSMC’s latest 7nm process"
    Though they don't tell anywhere it is N7+, it doesn't need a genius to figure that out.
    If it proves not to be N7+, I'll buy you a beer, or maybe something stronger :)
     
    #370 Voxilla, Jul 16, 2020
    Last edited: Jul 16, 2020
  11. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,107
    Likes Received:
    5,645
    900MB of SRAM.
    Wow.. imagine having that as framebuffer.
     
    Krteq likes this.
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,144
    Likes Received:
    3,043
    Location:
    Finland
    I'm not an expert on these on any level, but let's see
    https://en.wikichip.org/wiki/7_nm_lithography_process#Industry
    N7 with HD cells is supposed to offer around 91-92 MTrans/mm^2. We also know that with HD cells one SRAM cell is 0.027 µm^2, which fits into 1 mm^2 37 million times.
    Since 1 SRAM cell is actually 6 transistors, that 37 million times turns into 222 million transistors.
    So if the density was reported on just SRAM cells, it would be 222 MTrans/mm^2, not 91-92 MTrans/mm^2 for the HD cells, right?
    (For what it's worth, wikichip did feel the need to point out that N7 can offer really dense SRAM cells)
     
    Lodix and Lightman like this.
  13. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    751
    Likes Received:
    320
    Apparently you can't just compute density from SRAM cell size.
    Like in this table a 14 nm SRAM cell is 0.05um2, which would be 6x20 MT/mm2, but as you see in the table it is actually only 6x13.7 MT/mm2.
    In practice it must be even less, as for the 7nm Samsung chip with 0.026um2 cell size on that same page, the 256 Mbit SRAM is 69.3mm2, which equates to a mere 22 MT/mm2. If anybody has access to this paper, there might be more information there.
     
    BRiT likes this.
  14. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,683
    Likes Received:
    196
    Holy shit that's a lot!
     
  15. A1xLLcqAgt0qc2RyMz0y

    Veteran Regular

    Joined:
    Feb 6, 2010
    Messages:
    1,450
    Likes Received:
    1,151
    This is the "Nvidia Ampere Discussion" thread.

    Can the mod please remove the off topic posts.
     
    pharma, BRiT and PSman1700 like this.
  16. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,582
    Likes Received:
    2,309
    disco_, Lightman and PSman1700 like this.
  17. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    286
    Likes Received:
    165
    Is Cuda 11 also used for the Turing results? Nvidia has a history of not using newer versions of Cuda for older GPUs because it zaps some of the performance deficit.
     
  18. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,739
    Likes Received:
    923
    Yes 50% faster what ive been reading on many places now. Quite good i think.
     
  19. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,582
    Likes Received:
    2,309
    The OctaneBench results are comparing A100 (no RT cores available ) vs the best Turing result (with RT cores).
     
    #379 pharma, Jul 24, 2020
    Last edited: Jul 28, 2020
  20. dorf

    Newcomer

    Joined:
    Dec 21, 2019
    Messages:
    49
    Likes Received:
    123
    Lightman likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...