Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Personal opinion, your mileage may vary:
    If 3070/3060Ti are on or above the performance level of 2080 Ti and Radeon RX 6800 non-XT is not too far away, the latter's 16-GByte-complement of graphics memory becomes more and more of a compelling pro-argument.

    In other words: A 16 GByte variant of at least 3070 and a 20-GByte-3080 would be much more desireable.
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    According to Anthony at Linus Tech Tips "RX 6800 (the non-XT) is closer to 3080 than 3070", it's in the shortcircuit unboxing video
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Yeah the 6080 is apparently considerably faster than the 3070. VRAM capacity is the least of its worries.
     
  4. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I cannot do else yet than keeping it vague.
     
    Lightman and Krteq like this.
  5. Exactly this. The numbers in the names are worthless in this day and age. The 970 debuted in 2014 for $330. Four years later the RTX 2070 released with a $599 MSRP.
    We should just stick with comparing performance-per-cost, and within that scope the performance-per-cost evolution of consumer graphics cards we've had during these past 6 years is really disappointing IMO. Especially when compared to the 6 years prior to those.
    AMD's lower ability to compete has only ever been good for nvidia's pockets.
     
    Krteq, trinibwoy and BRiT like this.
  6. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
  7. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    If some of those unlabelled benchmarks are specifically made to fit the 80 GByte budget, then of course, why wouldn't it.

    edit: here are the footnotes from the presenation which I could find geared towards the benchmarks you posted. Obviously, they were rounding to whole numbers.
    1 AI Training running DLRM using Huge CTR framework on a 450 GB Criteo dataset. Normalized speedup ~2.6X
    2 HPC: Quantum Espresso with CNT10POR8 dataset on a 1.6TB dataset. Normalized speedup ~1.8X
    3 Data Analytics: big data benchmark with 10TB dataset, 30 analytical retail queries, ETL, ML, NLP. Normalized speedup ~1.9X
    [my bold]
    From there, it seems pretty natural for larger local memory to allow for much better perf scaling.
     
    #2487 CarstenS, Nov 18, 2020
    Last edited: Nov 18, 2020
  8. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    The 10TB /30 queries is standard configuration for this benchmark. Nvidia started using this benchmark a few years ago to showcase AI/Hadoop performance using RAPIDS to data science/business analytics types, always using 10TB /30 queries (which doesn't fit 80 GB budget).

    Edit: Not familiar with the other benchmarks so can't comment on configurations used. I would venture to guess they used standard configurations as well.
     
    #2488 pharma, Nov 18, 2020
    Last edited: Nov 18, 2020
  9. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Just pointing out memory requirements in order to better quantify those (previously unnamed) benchmarks. I only found out later in my edit, what they were and how much memory they used. If that's standard amounts, fine! :)
     
    sonen and pharma like this.
  10. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    Finally could get hold of a non scalped 3090.
    Works quite nice, just had a try with the bundled CoD game.
    Unfortunately with RT on, and in the heat of battle, the game quickly crashed.
    Might be a bug in the game, or that hardware stability issue again :-(
    crash.png
     
    Lightman, PSman1700, CarstenS and 2 others like this.
  11. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    Lightman, Babel-17, PSman1700 and 2 others like this.
  12. Babel-17

    Veteran

    Joined:
    Apr 24, 2002
    Messages:
    1,073
    Likes Received:
    307
    pharma likes this.
  13. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    I am curious to see the performance of Ampere GPU in Unreal 5, there is no raytracing but the engine seems to be heavy on compute side with Nanite mixing hardware rasterizer for bigger triangle and software rasterizer for small triangle and Lumen seems to use compute a lot. They are better in raytracing but for the engine heavily favouring compute power without using raytracing they will probably perform better than RDNA 2.

    I will not be surprised if Ampere GPU perform very well in UE5.
     
  14. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    Compared to the 2070 the 3090, is only twice faster for my fluid sim (BlazeFX),
    which is less than I expected.
    I guess I'm limited by memory bandwidth.
    36 TFlop/s not so much of an improvement to 7.5 TFlop/s in this case.
     
    #2494 Voxilla, Nov 19, 2020
    Last edited: Nov 19, 2020
    nnunn and T2098 like this.
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Is it possible to get access to the matrix math (tensor cores) in these GPUs? I'm wondering if you programmed this in terms of matrix operations, would you get a useful speedup?
     
  16. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505

    The amount of FP32 computation is relatively low relative to the amount data read/written.
    This is inherent to this kind of fluid simulation, which involves juggling very large volume textures.

    There is one step that is high on FP32 computation and that is cubic texture sampling.
    Sadly texture units have stagnated to only linear interpolation.
    If TMUs would be enhanced with cubic interpolation, that would be nice.
    You can never match TMU cubic interpolation with shader computation as even then fetching texels is a bottleneck and not computation.

    I do see similarities between tensor cores and a TMU capable of cubic interpolation.
    This functionality could be efficiently merged into a unified tensor/TMU hardware unit.

    I’ve been speculating if the RX 6800 would be any good for this kind of fluid simulation.
    It might possibly not be out of the box, but I’ll be only sure when I can test it.
     
    Lightman likes this.
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Matrix ALUs are effectively optimised for a high bandwidth/FLOP ratio :)

    But I have no idea what your texel re-use is like...
     
    PSman1700 likes this.
  18. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    Tensor cores compute vector dot product, like a0*w0+a1*w1+a2*w2+a3*w3 ...
    That is not what you need for fluid simulation (but do need for cubic interpolation)
    The tensor cores can only reach full speed with heavy data reuse like in convolutional kernel and batched computation and matrix*matrix computation.
    That is also unlike fluid simulation where there is little data reuse and no matrix*matrix.
    (We are getting a bit off topic here.)
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I'm merely suggesting that if you re-formulate the algorithm using matrices (or matrix-vector math), you might make progress.
     
    pharma and nnunn like this.
  20. manux

    Veteran

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,276
    Location:
    Self Imposed Exhile
    One has to wonder if fluid/smoke/physics could end up partially/fully accelerated by dnn's. Old stuff from 2016, there has to be newer research available. One more potential use for tensor cores?

     
    pharma likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...