Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    So GA102's FP32/INT32s take rather more than a trivial amount of die space compared to the INT32 version... Well, this was always my suspicion.

    If it's for training and nothing else? But it has loads of FP64, which isn't for training. So FP32 doesn't matter? I would tend to agree, NVidia decided that the new tensor core was more important than anything else, but they couldn't sacrifice FP64.

    So could we take this to mean that the tensor core design is how NVidia now names its architectures?

    If we say that Quadro/Titan/Geforce are for "prototyping" (for apps that end up on DGX) then it seems reasonable to conclude that harmonising the tensor core architecture is the most important aspect of a family of GPUs.
     
  2. glow

    Newcomer

    Joined:
    May 6, 2019
    Messages:
    40
    Likes Received:
    31
    In addition to the above, Turing Tu102 still retained two FP64 "units" per SM (source for both, Nvidia Turing whitepaper, page 8). Same with GA102 (Ampere whitepaper, page 8). IIRC, the reasoning given for the 2 units per SM, were down to maintaining software compatibility. AFAIK, the smaller chips of each family get 0 FP64 units, though I may be wrong on that.

    Full GV100 has 32 FP64 per 64 FP32 and full GA100 has 32 FP64 per 64FP32+64INT32. All of this without accounting for any of the Tensor core contributions.

    So I do agree, it's part older design, part different goals (FP64 is a big deal in specific markets!). GA100 lacks RT cores and NVENC, for that matter, though Nvidia's specific wording was only addressing their A100 product, not the GA100.
     
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    Well, there are less similarities between GA100 and GA10x than between GV100 and TU10x so... maybe?
    I almost think that it's mostly "we've made these chips somewhat at the same time" thing than any technological reason. Even the same production process isn't cutting it anymore.
     
  4. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I have no GT 1030 handy, but at least 1060 still had (a few) DP-units.

    Fun fact: DP-throughput on Radeon HD 5870 still beats RTX 3080. 3090 will finally overtake it though.
     
    #1784 CarstenS, Sep 22, 2020
    Last edited: Sep 22, 2020
  5. glow

    Newcomer

    Joined:
    May 6, 2019
    Messages:
    40
    Likes Received:
    31
    GP104 (used for highest spec version of the GTX1060, all other GTX1060 versions had the GP106) had 4 FP64 and 128 FP32 per SM (Pascal Tuning Guide)!

    That being said, I only have my laptop right now, so I'm unable to evaluate my RTX2070 (TU106) for FP64 support. IMO, it probably does include it at the same reduced level (2 per SM), though I am also curious if that extends into the TU116/117 family, since those received actual changes to the SM vs the larger Turing chips.
     
  6. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Even GT 1030 and GTX 1650 have it, so...
     

    Attached Files:

    glow likes this.
  7. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Could you - again - point me to the relevant section? Or are you not talking about this ixbt-review?
     
  8. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    No problem:
    "In Ampere, there were also some changes in the TMU, which were modestly written in the slide along with the caching improvements: "New L1 / texture system". According to some reports, Ampere doubled the rate of texture samples (you can read twice as many texels per cycle) for some popular texture formats with point sampling without filtering - such samples are recently very often used in computational tasks, including noise reduction filters and other post-filters that use screen space and other techniques. Together with the doubled L1 cache bandwidth, this will help feed the doubled number of FP32 blocks with data."
     
    nnunn, Cat Merc, pharma and 4 others like this.
  9. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Ah, I see. I was wondering if this was something they derived from their testing, because there are parts I don't necessarily agree with their conclusions. :)
     
  10. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
  11. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    251
    Likes Received:
    355
    My impression and understanding is that these days external product codenames are as much (if not more so) a part of technical marketing than for practical internal reasons. I'd actually wonder (if given truth serum) what Nvidia (or others) actually does internally.

    The likelihood is that Ampere products have much more differences, at least in terms of internal approach, than the naming suggests. However Nvidia seems to want to market their GPU designs as an unified line design wise. Whereas AMD with CDNA and RNDA it's likely they have more similarities in terms of internal approach but their new marketing approach want's a clear distinction (likely wanting to put more emphasis on their overall ecosystem including consoles).
     
  12. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    17,879
    Likes Received:
    5,330
    Davros to the rescue :
    [​IMG]
     
    glow likes this.
  13. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    You have a card you can test with and share your findings ?
     
  14. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    It's not about that I arrived at different test results, it's about xbit's conclusion from their own data. Not sure, if there's something lost in translation though. For example RM's D3D10 Fire simulation, where they explicitly state, it uses one texture fetch and 130 sin/cos instructions. After the results are shown, they derive „So this time, in a purely mathematical test, the new RTX 3080 was ahead of its predecessor RTX 2080 by only 50%, which clearly indicates an emphasis on something else, and not ALU.“ while sin/cos is done by the SFUs, not the FP32-ALUs. Again, not sure, if there's something lost in translation.
     
  15. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    first ever video showing the 3090 running games at 8k on a 8k TV

     
    Lightman, PSman1700 and pharma like this.
  16. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Two YouTubers given preferred treatment over every other NDA signees.
    Here's the other guy:


    edit: Yes, I fully realize those are not reviews of the card, thanks.
     
    #1797 CarstenS, Sep 23, 2020
    Last edited: Sep 23, 2020
    Cyan, pharma and PSman1700 like this.
  17. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,088
    Nice reviews, 8k60 is reality in 2020. Now DLSS and ray tracing in that mix and see how far you can come? The details and fidelity that comes available in Eternal @8k is impressive, and that at 60fps/hdr.
     
    #1798 PSman1700, Sep 23, 2020
    Last edited: Sep 23, 2020
    pharma likes this.
  18. So it's official:

    https://www.nvidia.com/en-gb/geforce/news/rtx-3090-out-september-24/

    The leaked review was accurate.

    It's also interesting that nVidia is placing the non-Titan Geforce RTX 3090 as a prosumer graphics card, as their marketing material seems to focus heavily on productivity applications.
    I wonder what changed their mind on their taxonomy.
     
    Lightman and Cyan like this.
  19. Dangerman

    Newcomer

    Joined:
    Apr 1, 2014
    Messages:
    43
    Likes Received:
    8
    I think it's because there's a 12GB card in reserve against RDNA 2 (rogame has PCI IDs for a 12GB card). I mean, *maybe* the 20GB 3080s will have extra two SMs or so enabled but 20GBs will be solve at a large premium and made for prosumers or the guillable who see dat extra VRAM.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...