AMD Radeon RDNA2 Navi (RX 6700 XT, RX 6800, 6800 XT, 6900 XT) [2020-10-28, 2021-03-03]

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    4,547
    Likes Received:
    2,084
    Yes Ampere is still the better gaming arch it seems. NV needs to come with 16 or higher ram counts though.
     
  2. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,360
    Likes Received:
    3,096
    Location:
    Germany
    I'm really looking forward to that one. It has 80 instead of 72 CUs, so at max 11 % more performance, if AMD did bin very strict for power.
     
    Lightman and PSman1700 like this.
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    https://www.realworldtech.com/transistor-count-flawed-metric/

    Note the comments around cache densities relative to logic.
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,552
    Likes Received:
    4,713
    Location:
    Well within 3d
    This might come down to the improved circuit implementation and better characterization of the silicon. There was often more consistent behavior out of earlier GPUs once the overgenerous voltage levels were pruned, and it seems AMD has put physical optimization work that has been sorely lacking to an extent I hadn't considered.

    Wavefront launch seems like it could be handled at the shader engine level, which means the average lifetime of wavefronts need to be higher to avoid underutilization. Per shader array resources would have nearly 50% more contention, so things like the L1 and export bus could be more crowded.


    It stands out how close the minimums and average are. Given how much broader the other implementation is between minimum and average, it feels like there could be some driver or software issue capping performance--or there is a very pervasive bottleneck.

    Some elements of the architecture may not have seen the same improvement. It does seem like the L2 hasn't received much attention, and if we believe it played any role in amplifying bandwidth signficantly in prior GPUs, not increasing its bandwidth or concurrency means it's amplifying things significantly less.

    I'm also curious about getting the full slide deck, including footnotes. The memory latency figures are something I'm curious about. The improvement numbers do seem to hint that there's substantial latency prior to the infinity cache, which deadens some of its benefits.
    It's also possible that if it's functioning as a straightforward victim cache that it's thrashing more due to streaming data. The driver code mentioning controlling allocation would seem to point to more guidance being needed to separate working sets that can work versus those that thrash even 128MB caches.

    I'm curious if there's also an effect based on not just incoherence but also how quickly the rays in a wavefront resolve. Incoherence can bring in a lot of extra memory accesses, but being tied together in batches of 32 or so can also lead to a greater number of SIMD-RT block transactions if at least one ray winds up needing measurably more traversal steps than the rest of the wavefront.
    It looks like AMD does credit the infinity cache for holding much of the BVH working set, and apparently the latency improvement is noted. That does point to there being a greater sensitivity to latency with all the pointer chasing, but that can go back to my question about the actual latency numbers. Even if it's better, it seems like it can be interpreted that the latency figures prior to the infinity cache are still substantial.

    It does seem to show that even at the large capacity offered that there's a lot of accesses that remain very intractable for cache behavior. If the old rule of thumb that indicated that miss rate drops in proportion to the square root of cache size, the cache would have been more than sufficient to get very high hit rates.
     
    fellix, Lightman, jayco and 5 others like this.
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    Is the white paper out yet?
     
  6. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    And perhaps there is more to performance than just ray tracing? And they offer more VRAM as well as lower power consumption than the competition.
    Ryzen's success is somewhat making people forget where AMD's graphics performance was before RDNA, just over a year and a half back. To reach where they have in a relatively short period of time is still a commendable achievement. RDNA 2 is more like Zen 2 in that respect, bringing them back into contention. Will RDNA 3 be their Zen 3 for graphics? Apparently AMD have committed to another 50% perf/W improvement for RDNA 3.
    Techpowerup did have the slides posted here, they cover some of the memory latency aspects - https://www.techpowerup.com/review/amd-radeon-rx-6800-xt/2.html
     
    Lightman likes this.
  7. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    680
    Likes Received:
    363
    Well, between this and the Apple M1 review I'm convinced "Infinity Cache" was designed primarily for APUs in mobile devices; and was just used for RDNA2 as well because the designs were simple and cheap to scale. The M1's ultra efficient performance, especially for single core stuff, looks to be at least partially due to its extreme bandwidth saturation per core for both main memory and their system wide last level cache. With AMD wanting a piece of that high margin laptop market no doubt they found much the same. Which is good news for their laptop chips next year, bad news for RDNA2's 4k performance. Despite all the "benefits" it's still no substitute for an actual main memory bus, ohwell.

    What I'm not worried about though is the poor hardware raytracing performance. Hardware raytracing is good for coherent rays and that's about it anyway. In fact it's primarily good for sharp reflections and that's it period. Even shadows are questionably useful at best since you have so much content restriction with it (severely limited movement, environment detail, etc.) EG the Demon Souls remake doesn't use a whiff of hardware raytracing, and looks better than either Watchdogs Legion or Miles Morales. It feels typically weird to watch Digital Foundry do their "the difference is obvious" thing as at least half the time I'm narrowly squinting at the glass in a game hoping to see what's so obvious about it.

    That being said it is a victory for Nvidia PR wise. This is what AMD gets for letting their rival control the narrative of what's "important" in GPU tech.
     
    #967 Frenetic Pony, Nov 19, 2020
    Last edited: Nov 19, 2020
    Wesker, Lightman and gamervivek like this.
  8. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    776
    Likes Received:
    279
    Location:
    india
    Buzzkill :mad2:

    So, 6 Shader Engine, 120CUs, HBM3 version when?

    There were rumors that AMD were targeting GA104 with Navi21, but probably revised their targets once they saw Ampere underperformance.
    256-bit bus is quite the cost-cutting measure , wonder how many at AMD are now wishing that they could have gone for the jugular to regain the crown, at least till nvidia sort out their Ampere woes.
     
  9. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    6,110
    Likes Received:
    6,389
    Location:
    Barcelona Spain


    [​IMG]

    not bad in rasterization
     
  10. Leoneazzurro5

    Newcomer

    Joined:
    Aug 18, 2020
    Messages:
    226
    Likes Received:
    249
    About this point, I'm thinking that the BVH structure retention / discarding by the cache may be a driver and application matter, more than being hardwired. This also explains a part of the need of having specific optimization for the AMD ray tracing implementation.
     
    NightAntilli likes this.
  11. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,135
    Likes Received:
    1,290
    Too much reviews, too much pages. But the one thing i want to know seems still unclear.

    TechPowerUp:
    That's what i have expected and hoped for. If there is no traversal HW, BVH might be arbitary, DXR implementation is backed with compute traversal and AMDs software BVH data structures.
    If this is fast enough to b useful, it means full flexibility - custom BVH can be shared for multiple purposes, LOD is possible. All my initial complainds about RT might be resolved. (But AMD needs to expose intersection extensions.)

    PCGH:
    umm... this sentences contradict themselves. Not sure what he tried to say (although german is my natural language), but it seems he means:
    'AMD has confirmed traversal and shading can run in parallel' - which would imply traversal runs on FF unit like NV does.

    I guess the truth is somewhat in between, but maybe somebody can clarify?
     
  12. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    521
    Likes Received:
    746
    Minimums are very often close in many titles across many architectures and can be caused simply by game logic - chuck generation/shaders changes/etc.
    I would not make any conclusions based on the min FPS results.

    Here you can see exactly what I am talking about, another test and they did not hit the same min fps hiccup
     
    pharma and BRiT like this.
  13. Dictator

    Regular Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    408
    Likes Received:
    2,314
    In this case based on the hotchips presentation and other information I have seen - it means you can run other compute tasks on the CUs while the CUs are also doing traversal.
     
    pharma and PSman1700 like this.
  14. dskneo

    Regular

    Joined:
    Jul 25, 2005
    Messages:
    649
    Likes Received:
    193
    Its a bit of a zen2 moment. Almost there but not quite. I'm looking forward to next years navi 3, but they could have beat the 3080 completely if they used gddr6x. Infinity cache mitigates, but doesn't match.
     
  15. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    4,203
    Likes Received:
    3,385
    So big discrepancy between AMD's internal pre-launch benchmarks and actual performance results from 17 reviews. No surprises that review dates and launch date coincided.
    https://www.3dcenter.org/news/radeo...ltate-zur-ultrahd4k-performance-im-ueberblick
     
    PSman1700 likes this.
  16. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    I notice some quite impressive multi monitor idle power draw (with 7 mhz memory, which only gives a quarter of the required bandwith). https://www.techpowerup.com/review/amd-radeon-rx-6800-xt/31.html
    I guess that's a strong hint that they are simply presenting the screen from infinity cache. Maybe basicly running the gpu with the infinity cache as main memory - I wonder how much they can extend this to other "2d" usage scenarios. Obviously they are not doing it for video playback right now, where video memory is running full! speed (see the same page above).

    Also it seems that the (at least) 64mb requirement for 2*4k is too much for this mode: https://www.computerbase.de/2020-11..._leistungsaufnahme_desktop_youtube_und_spiele (again probably falling back to full speed instead of some 2d memory clock)
     
    Lightman and Jawed like this.
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,783
    Likes Received:
    3,953
    Location:
    Finland
    I don't speak german, but from your post I'd guess they're trying to say that the traversal isn't reserving any CUs/stream processors. The CU chucks along shaders while Ray Accelerator does it's thing and queues traversal to be run on that CU next
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    As I understand it, DXR specifically obfuscates the data format of the BVH. It seems the intention here is that each IHV can optimise the data format to match the way the hardware works.

    So, for example, perhaps inline ray tracing (DXR 1.1) is preferred on AMD and the BVH data format is optimised for that.
     
    Lightman and DavidGraham like this.
  19. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,047
    Likes Received:
    1,477
    Location:
    France
    The rdna2 (big) problem is RT imo, and I don't think faster memory would have help a lot.
     
    PSman1700 and pharma like this.
  20. dskneo

    Regular

    Joined:
    Jul 25, 2005
    Messages:
    649
    Likes Received:
    193
    RT does not even enter the equation for me. I will care for it when its relevant and a must have, a few generations from now. Not this year.
     
    entity279 and Silent_Buddha like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...