NVidia Hopper Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by xpea, Sep 21, 2021.

  1. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,397


    So it's actually a bit smaller than GA100.
     
    Krteq, Man from Atlantis and BRiT like this.
  2. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    208
    Likes Received:
    137
    no word about clock though ....
     
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,397
    Math gives 1,775 GHz for 60TF FP32 SXM5 module.
    1,645 GHz for the 350W PCIE version.
     
  4. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    H100 scaling to 700w on air and/or water.
     
  5. dorf

    Newcomer

    Joined:
    Dec 21, 2019
    Messages:
    126
    Likes Received:
    417
    Whitepaper table 3 says "not finalized" and figure 13 says 1.3x clock speed compared to A100.
     
  6. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    There are footnotes speaking of 1,9 GHz edit: Those were given under the impression, that the SXM5 model would use 124 SMs, so maybe that's a bit on the high side for the 132 incarnation. Also not finalized though.
     
    #66 CarstenS, Mar 22, 2022
    Last edited: Mar 22, 2022
  7. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,397
  8. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    p18 of the Whitepaper PDF:
    "Only two TPCs in both the SXM5 and PCIe H100 GPUs are graphics-capable (that is, they can run vertex, geometry, and pixel shaders)."

    Note how Nvidia refers to productized H100 chips instead of the H100 GPU in general. And also, how it's not the "1 SM" as speculated solely on the basis of a slightly misaligend SM earlier.
     
    Jawed likes this.
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,397
  10. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    960
    Likes Received:
    853
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    Do the 64 INT32 lanes share data paths with 64 of the FP32 lanes like in Ampere?
     
  12. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    AFAIU it's conceptually the same setup. Just 2x more execution ressources per register.
     
  13. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,397
    I see 32 lanes and I think it's pretty much the same as in Ampere (not GA100 but GA10x) - one SIMD is FP32/INT and another is FP32 only. Doesn't make much sense to make a separate INT32 as you won't be able to load it properly.
     
  14. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    The Whitepaper says it's 16 of each INT32, FP32, FP32, FP64 plus 4 SFU and 8 L/S inside each quarter-cluster of an SM.
     
  15. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    So, the H100 (1 GPU) has the following advantages vs the MI250X (2 GPUs):

    FP32: 60 vs 48
    FP32 Matrix: 500 vs 95 (with sparsity: 1000 vs 95)
    FP16 Matrix: 1000 vs 383 (with sparsity: 2000 vs 383)
    INT8 Matrix: 2000 vs 383 (with sparsity: 4000 vs 383)
    INT4 Matrix: 4000 vs 383 (with sparsity: 8000 vs 383)

    And all the reliability of a single monolithic GPU die, that has 3TB/s of bandwidth dedicated to it alone, vs the MI250X nature of 2 dies sharing 3.2TB of bandiwdth and a slow interconnect.

    Still, the MI250X enjoys a better theoretical FP64 performance vs the H100, when it can utilize both GPUs of course:

    FP64: 48 vs 30
    FP64 Matrix: 97 vs 60
     
    #75 DavidGraham, Mar 22, 2022
    Last edited: Mar 23, 2022
  16. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    604
    Likes Received:
    1,124
    MI250X bandwidth is only 400GB/s when sharing data or 13,3% of H100. With Hopper nVidia introduces their NVLink Switch System with full speed up to 32 nodes. So even inter-node communication is now 2.25x faster than MI200 interconnect...

    The loser is the US goverment still building their exascale system which will be beaten by a computer in a basement. And next year with Grace nVidia will be so far ahead that Frontier looks like doa.
     
    PSman1700 likes this.
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Funny, it sounds almost like you think they don't know what they want, which in this case wasn't NVIDIA. Next year there will be new products from all which make "Frontier look DOA". But it's completely different story on when they could get a government scale supercomputer up and running on those.
     
  18. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    AFAICT there's still no option to pair NV GPUs with high-end CPUs and get accelerated unified memory as well. There are virtually no high-end CPUs released yet with PCIe 5.0 either ...
     
  19. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    “Hopper” GH100 GPUs Are The Heart Of A More Expansive Nvidia System (nextplatform.com)
     
    #79 pharma, Mar 23, 2022
    Last edited: Mar 23, 2022
    PSman1700 and nnunn like this.
  20. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    A small detour to mention that the biggest hardware announcement of GTC2022 is the Spectrum-4 switch with its mind blowing 100 billion transistors, 20 billion more than Hopper :runaway::runaway::runaway:

    Spectrum-4.jpg

    Most people don't realize that interconnect speed and handling massive data transfer are the key to high-performance distributed HPC workloads. Now that Nvidia controls the full ecosystem (CPU+DPU+GPU+Interconnect+software), they can innovate at their pace and push more easily their standard (it starts with NVlink-C2C)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...