AMD Radeon RDNA2 Navi (RX 6800, 6800 XT, 6900 XT) [2020-10-28]

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,987
    Likes Received:
    134
    I asked a few times earlier in the thread, but I didn't get clarification. Recall that there are 4 Packers per Scan Converter, so Navi21 has 32 Packers. And 8 Packers per Raster Unit (2 Scan Converters). Each Packer, up to 4 Packers, is being dispatched to each Shader Array with optimised fragments, arranged as 1x2, 2x1 or 2x2 fragment groups as discussed below for VRS (my speculation). The efficiency gains are from these packed fragments.

    [​IMG]
     
  2. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,053
    Likes Received:
    1,239
    I like the idea to do RT at lower res than raster, so only RT needs upscaling.
    On the other hand, if i had the choice: 1 ray per pixel at 1080p or 4rpp at 270p, i would choose the former because i get more spatial information for the same number of rays.

    So now we know what AMD has meant with 'Select Lighting Effects' on that early RDNA2 slide :D
     
    chris1515 and PSman1700 like this.
  3. xEx

    xEx
    Veteran Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    1,054
    Likes Received:
    539
    About the future of graphics I find this approach more interesting. It's about how instead of calculating geometry and colors we could simply "imagine" it when neural networks. Maybe it's a topic worth having it's own thread.

    Is in Spanish but have subs.

     
    xpea and pharma like this.
  4. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    4,108
    Likes Received:
    3,230
    Did not find a post on AMD's Tech Demo released (Youtube) on the Nov. 19th, so here it is.

    AMD releases RDNA2 technology demo as a 1080p video - VideoCardz.com
     
    Lightman, PSman1700 and Remij like this.
  5. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    196
    Likes Received:
    323
    Yea, I saw it. It didn't really blow me away. The tech is fine, and produces respectable results, although the RT is particularly noisy in many shots.. (which they also try to cover up with DOF) but IMO is ultimately let down by lackluster art and presentation.

    These new cards would have been the perfect time to reintroduce Ruby, with crazy good shadows and reflections. I always liked the Ruby demos.

    But the GOAT Radeon tech demo for me personally is.... Pipe Dream



    God I love it!
     
    Kej, eloyc, CarstenS and 5 others like this.
  6. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    4,108
    Likes Received:
    3,230
    Agreed, but the theme was pretty good! :lol2:
     
    Lightman and PSman1700 like this.
  7. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    196
    Likes Received:
    323
    Yea I have no problem with the theme or anything.. just kinda the scenario they decided on. A robot Ninja runs around a hangar while a robot drone searches for him... except it's not even that exciting and nothing happens.. lol.

    They should have scaled it in. Make that single character far more detailed, have a denser, slightly smaller environment, really zoom in on the detail on him at times.. have some parts of him reflecting the environment.. have him do some cool animations and then fight an enemy at which point the drone comes out, casts beautiful shadows of the two ninja robots fighting and reflecting the environment... and then close with him killing the other robot and then the drone chasing him off.

    Also.. 1080p... no sir. 1440p AT LEAST with youtube, regardless if the native resolution of the demo is 1080p.

    I dunno.. lol :D
     
    PSman1700 and pharma like this.
  8. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,555
    Likes Received:
    7,455
    I still have Pipe Dream stored at multiple locations with redundancy so that I'll never risk losing it. It's still the most memorable demo I've ever experienced.

    Regards,
    SB
     
    Kej, Alexko, CarstenS and 3 others like this.
  9. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    196
    Likes Received:
    323
    Yeah haha.. it blew my mind at the time and of course still holds up as brilliantly today. The 9700 Pro was such a killer GPU too.
     
    Lightman and PSman1700 like this.
  10. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    872
    Likes Received:
    204
    Location:
    'Zona
    I have seen something relatively recent to Pipe Dream, maybe it was RT... I don't think the newer version was from AMD though.
    Maybe it was an opensource/fanmade remake?

    Edit- 4k PipeDream on youtube
     
  11. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    The other point is Infinity Cache also appears to be a forward looking feature, which could also be used on CPUs & APUs. And since SRAM traditionally scales better than analog, it's cheaper in the future than adding more PHYs. SRAM scaling seems to be slowing down however, with TSMC only promising a 1.35x scaling for 5nm and 1.2x for 3nm. For the Apple A14, SRAM scaling was actually found to be only 1.19x so it's significantly lower than TSMC's claimed 1.35x. Whether this is due to the process not delivering advertised gains, or design decisions for power/performance, we won't fully know yet until we can analyse more 5nm chips. But it would still be better than analog.
    Yes I had posted this chart a few pages back and commented on the likely position of AMD's mobile gaming platforms next year. Cezanne will be able to make use of the updated 7nm and increased power efficiency of Zen 3 to further increase AMD's CPU lead over Comet Lake, though Tiger Lake H could bring parity. I still predict Cezanne + RDNA2 to be the best selling mobile gaming platform in 2021. This is a significant market btw, and in Nvidia's recent earnings call they specifically mentioned that they've had 11 successive quarters of double digit growth in mobile.
    But the die shot of N21 at least certainly does not seem to have any HBM PHYs, or have I missed something?
    How is it scummy? NV was comparing different power and performance levels, on different processes. But AMD's compares the SAME clockspeeds and power, iso process. So an RDNA2 CU at the same clock will consume ~50% of the power of an RDNA1 CU. Unlike desktops where you can push power and thermals, for mobile GPUs this is very relevant as you are power limited.
     
    Lightman likes this.
  12. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,529
    Likes Received:
    477
    Location:
    Varna, Bulgaria
    TSMC should probably consider researching the various EDRAM technologies to resolve the memory scaling issues, particularly for large cache arrays.
    IBM and Intel already employ different integration methods, though these are very tightly related to their particular manufacturing process.
     
  13. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,555
    Likes Received:
    7,455
    Just thinking about it, instead of that kind of lackluster robot thing they recently released, it would have been cool if they'd done a 4k (or even 1440p) RT remake of Pipe Dream running in real time. There's lots of opportunities there to showcase some RT effects. Lighting, shadows, reflections (of moving objects), etc. Maybe a dynamic light or two moving around while the scene is playing out. I just feel like it would have been more impressive than that robot demo.

    Regards,
    SB
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    I'm interested in seeing the endnotes for some of the slides like the memory latency one. It might give some of the base values that go into their percentages. I'm not sure whether the infinity cache's latency improvement is a percentage of the total memory latency (L0,L1,L2,memory total) or it's relative to the latency of the DRAM access.

    Driver commits indicate it can happen at page granularity, and there are also flags for specific functionality types. It's not clear BVH fits in that, unless it might hide under the umbrella of some of the metadata related to DCC or HiZ.
    Some of those would seem to be better kept in-cache, since DCC in particular can suffer from thrashing of its metadata cache, injecting a level latency sensitivity normal accesses wouldn't.

    Is there a source for this, or tests that can tell the difference between an SE being inactivated versus an equivalent number of shader arrays disabled across the chip?

    AMD's Sienna Cichlid code introduced a function to track for disabling formerly per-SE resources like ROPs at a shader array level. This might lead to similar outcomes.

    We do have some comparison in terms of AMD's patent for BVH acceleration versus Nvidia's. There are some potential points of interest, such as the round-trip node traversal must make to the SIMD from the RT block, and the implicit granularity of execution being SIMD-width.
    There are some code commits that give instruction formats for BVH operations that look to be in-line with the patent.

    RBEs are something that can be disabled at a different granularity than SEs, though.

    The packers I am thinking of are related to primitive order processing, which is related to rasterizer ordered views rather than how primitives are translated to wavefronts.

    Perhaps as scaling falters, the pressure will resume to go back to EDRAM despite the cost and complexity penalties.
    Neither IBM or Intel have that technique available at smaller nodes. IBM's next Power chip dropped the capability since IBM sold off that fab to Globalfoundries--which then gave up scaling to lower nodes, and Power was the standout for having EDRAM.
     
    tinokun, PSman1700 and pjbliverpool like this.
  15. Digidi

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    381
    Likes Received:
    205
    If you are checking mangos and Linux driver the Packer are coming after the scan converter. I think packer are taking the pixel from the rasterizer and send them to the shaders?

    [​IMG]https://www.pcgamer.com/a-linux-update-may-have-let-slip-amd-big-navis-mammoth-core/
     
    #1315 Digidi, Nov 23, 2020
    Last edited: Nov 23, 2020
    j^aws likes this.
  16. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    8,446
    Likes Received:
    2,626
    Location:
    Guess...
    It may have invited unwelcome comparisons to Nvidias Marbles demo.
     
    PSman1700 likes this.
  17. Mat3

    Newcomer

    Joined:
    Nov 15, 2005
    Messages:
    166
    Likes Received:
    10

    From the article, part of the implementation is tracing rays to voxels. Could the ray tracing box testers on these new GPUs possibly be used for that? Voxels are boxes too...

    "Lumen uses ray tracing to solve indirect lighting, but not triangle ray tracing," explains Daniel Wright, technical director of graphics at Epic. "Lumen traces rays against a scene representation consisting of signed distance fields, voxels and height fields. As a result, it requires no special ray tracing hardware."
    To achieve fully dynamic real-time GI, Lumen has a specific hierarchy. "Lumen uses a combination of different techniques to efficiently trace rays," continues Wright. "Screen-space traces handle tiny details, mesh signed distance field traces handle medium-scale light transfer and voxel traces handle large scale light transfer."

     
    PSman1700 and pharma like this.
  18. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,987
    Likes Received:
    134
    The driver leak has changes to SIMD waves, which combined with the slide about RB+ and Packers connected to Scan Converters in the driver leak as well, suggests some optimisations post scan conversation and dispatching to Shader Arrays. Number of Packers per Scan Converters doubled from RDNA1, but triangle per clock rasterisation remains the same at 4 per clock.
     
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,595
    Likes Received:
    3,711
    Location:
    Finland
    There you go
    upload_2020-11-23_22-25-37.png
     
    Lightman likes this.
  20. Digidi

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    381
    Likes Received:
    205
    I hope that we get rasterizer results soon from @CarstenS or @Ryan Smith . I’m interested in the values of 0% culling list or strip polygons and how it relates to navi 10 and rtx 3090.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...