GPU Ray Tracing Performance Comparisons [2021] *spawn*

Discussion in 'Architecture and Products' started by DavidGraham, Mar 29, 2021.

  1. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    Ampere is same as Volta which is 16 in h/w.
     
  2. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    57
    Likes Received:
    107
    Each SIMD has 16 lanes, but warps are 32 threads which are executed over two cycles (plus pipelining)
     
    Krteq, trinibwoy and OlegSH like this.
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    Well yeah but the h/w is 16 wide which allows for higher granularity of execution on branches amongst other things.

    Warp/wave widths are a different topic altogether and they too may become an issue for a pure raytraced future.
     
    PSman1700 likes this.
  4. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    57
    Likes Received:
    107
    What do you mean by "the h/w"? Instructions need to be done to an entire warp at once, which makes it the smallest unit, not the SIMD size. Otherwise you might as well call GCN waves 16 wide as well.

    If current GPUs were able to pick and choose parts of a warp to execute independently then the whole subwarp interleaving paper being discussed would be irrelevant.
     
  5. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    They don't need to be done "at once", and they in fact are not since the h/w needs two cycles to go through a warp. This opens up opportunities for a more granular control over how these warps are being executed, whether they are used in full in current h/w or not.
     
    PSman1700 likes this.
  6. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    57
    Likes Received:
    107
    The same instruction has to be done for those two cycles. There are two SIMDs and the scheduler can only issue one instruction per clock, alternating between the two (or to tensor, SFU or MIO)
     
  7. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    57
    Likes Received:
    107
    T2098 and Krteq like this.
  8. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,210
    The 3080 12GB is 200% faster than 6900XT in Metro Exodus Enhanced Edition!

     
    Lightman and PSman1700 like this.
  9. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,088
    I think 3090 is supposed to take fights with the 6900XT, flagship vs flagship.
     
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    Most of us care a lot more about € Vs €. Or do you think Intels upcoming Arc-flagship (expected to be around 6700XT/3070 level) should also be compared just to 6900 XT and 3090 (Ti)?
     
    Wesker, Lightman and Krteq like this.
  11. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,088
    Well now i see, the 3090 isnt in the same class of performance. 6800XT's fighting it out with the 3080/Ti. The 3090 has no direct AMD competitor, yet.
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    6900 XT is cheaper than 3080 Ti and about the same price as 3080 12GB (European prices, just checked from Geizhals). Why should 3080(Ti) be compared to 6800 XT instead of 6900 XT?
    As for performance class, other than RT 6900 is on the same class as 3090 despite the price difference
     
  13. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    It's slower unless you limit the comaprison to low resolutions without RT. But why would you do that?
     
    PSman1700 likes this.
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    I don't, just pointed out that saying AMD doesn't have a card in 3090 performance class is false (unless you limit yourself to RT games only). But this is all going besides the point where PSman1700 said 6900 should be compared to 3090 and not 3080/Ti, even though 6900 is priced around 3080 12GB
     
  15. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,088
    I ment mainly in performance, as prices are in fantasy land now anyways. In performance, the 3090/Ti is in its own class, unless you would want to omit RT games, which is nearly impossible these days.
     
  16. TopSpoiler

    Newcomer

    Joined:
    Aug 18, 2020
    Messages:
    74
    Likes Received:
    176
    From the GA102 white paper:
     
    PSman1700 likes this.
  17. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    57
    Likes Received:
    107
    That doesn't contradict what I've been saying. Each (32-wide) warp is sent to either the int/fp pipe or the dedicated fp, to be executed over two cycles.

    On clock 0, the scheduler sends a fp instruction from warp 0 to the fp SIMD. 16 of the 32 threads start execution.

    On clock 1, the scheduler sends an instruction from warp 1 to the fp/int SIMD. The other 16 threads of warp 0 start execution. Depending on whether warp 1 is doing an int or fp instruction, the subcore is now doing either 16+16 or 32 fp only.

    But it still can't do anything more fine grained than the 32-thread warp size.
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Nvidia has been doing this since G80. Hardware was 8-wide but execution / branching granularity was 32 threads. Hasn’t changed since 2006.
     
  19. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    It had actually. Maxwell and Pascal were 32 wide in h/w.
     
    PSman1700 likes this.
  20. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    I meant the warp size. It’s always been 32. Hardware width is important for latency but doesn’t help with branching granularity or anything that the software sees.
     
    #1480 trinibwoy, Feb 23, 2022
    Last edited: Feb 23, 2022
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...