AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    It's not DXR 1.0 vs 1.1, it's hybrid rasterization+rt vs rt only
     
  2. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France

    How do you define "brute force" ? Devs have a lot of work to do to make improve their RT solutions too.
     
    PSman1700 likes this.
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    You might be on to something there.

    Yeah obviously raytracing today in games is as far from brute force as you can get. The ray counts are ridiculously low and we’re relying on stochastic accumulation and denoising to get by.

    I don’t even know what brute force hardware even means. Both AMDs and Nvidia’s RT patents spend a lot of time on techniques to avoid casting unnecessary rays. So again, not brute force.
     
    #1283 trinibwoy, Nov 22, 2020
    Last edited: Nov 22, 2020
    HLJ, tinokun, pjbliverpool and 5 others like this.
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Unbiased path tracing is effectively the definition of brute force: no limit on the count of bounces and the new rays produced by bounces. It's subtler than that, but that's the basic model. It's what professional rendering solutions aim to produce (though they can make more approximations). When you have seconds or minutes to spend on a single frame, you're more likely to be using (at least some) brute force :)

    For real time rendering you have to make careful choices about the counts of rays and the counts of bounces.

    There are other choices to make, such as the level of detail in the geometry used for ray tracing. e.g. you might use low quality trees when trying to decide how trees are reflected.

    Then you can work out which parts of the scene contribute the most to the final visual quality.

    Also you can decide how far rays are allowed to travel.

    Luckily, temporal and spatial "averaging" (sorts of denoising) helps substantially. At typical gaming framerates, the ray tracing for one frame can substantially help other frames. Similarly, like we see with variable rate shading which is designed to allow developers to lower the pixel shading resolution in some parts of the frame (e.g. near the edge or very dark), you can vary the quality of ray tracing across the frame.

    Just relying upon denoising gives you the poor performance seen in Control when trying to do lots of ray tracing techniques simultaneously.
     
  5. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France

    But from what I get from DF videos about RT, and other media, devs are already optimising where RT is done, and how it's done, they are careful about amount of rays&bounces.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I'm hopeful there's many more steps along the road of real time ray tracing, in other words I think devs have barely started to optimise.
     
  7. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    With the data we have now, It seems Nvidia is much better at RT. With multiplatform game, PS5 and Xbox Series X|S are very important for publisher and studios. It means the requirement will be made around this console. Out of raytracing it will probably help AMD because this time they aren't so far we see it with AC Valhalla and Dirt 5. I mean if RDNA2 performance in rasterization were not good, it would not have help a lot.

    For RT it will probably be use for shadows or specular reflection and maybe in some rare title the two effects together. GI will be done using other methods.

    Some studios like 4A games or Quantic Dreams want to create rendering pipline centered around RT but other engine like UE 5 will not use RT at all.
     
    #1287 chris1515, Nov 22, 2020
    Last edited: Nov 22, 2020
    BRiT likes this.
  8. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    I've provided answers to your questions already. Maybe I was too vague before. It's not unbalanced to have a primitive rasterizer perform coarse rasterization and feed the output to multiple fine rasterizers to output pixels. Also, don't bother comparing to Navi10 you'll only get confused. Certainly testing Navi21 is the best way to understand the performance characteristics.
     
    tinokun, BRiT, Digidi and 3 others like this.
  9. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    https://patents.google.com/patent/US10062206B2/en

    Not 100% sure but I think I understood that reference.

    [​IMG]
     
  10. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Doesn't seem unbalanced to me. It looks like each Raster Unit has 2 scan converters. Each RDNA1 scan converter rasterised a triangle with 16 fragments coverage. Now with RDNA2, we have 2 scan converters working on 1 triangle, and capable of rasterising smaller triangles to larger triangles with coverage ranging from 1-32 fragments per Raster Unit.

    Peak triangle throughout has increased with increased GPU clocks, and triangle size variance is handled better with higher rasterisation efficiency. So geometry throughput is closer to peak throughput.
    Found the slide - the details are under "Geometry Processor" below, where Navi21 has 4 Prim Units, sends 4 triangles to rasterise and culls 8 triangles per cycle. So each Prim Unit still culls 2 triangles and sends 1 to Raster Unit - unchanged from RDNA1.

    [​IMG]
     
    pjbliverpool, Digidi and Lightman like this.
  11. Dampf

    Regular

    Joined:
    Nov 21, 2020
    Messages:
    283
    Likes Received:
    474
    Of course UE5 will support DXR.

    You are probably referring to Lumen not using HW-RT, which is a bit disappointing as it leaves fixed function hardware unused and thus wasting performance. I think it is possible Lumen might get replaced by RTXGI, as it basically does the same (multi-bounce, dynamic GI) as Lumen but using hardware accelerated RT to update light probes which makes it very efficient.

    RTXGI works on any DXR capable GPU, so it should work for Xbox and AMD RDNA2 PC GPUs as well.
     
    tinokun, DegustatoR, Lightman and 3 others like this.
  12. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    Lumen system is very different than RTXGI and this is not what they have in mind at least probably for this console generation, Lumen is probably easier to use for lighting artist because there is no probes at all.

    https://www.eurogamer.net/articles/...eal-engine-5-playstation-5-tech-demo-analysis

    Unreal Engine 4 uses RT but not UE 5 after maybe in the future they will RT for specular reflection. The engine was designed around PS5 and Xbox Series X|S.

    Demon's souls uses a froxel based GI system based on probes.
     
    #1292 chris1515, Nov 22, 2020
    Last edited: Nov 23, 2020
    BRiT and Deleted member 90741 like this.
  13. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    But this make no sence. Not many triangles are these days bigger than 16 pixel. You have more issues to rasterizes small triangles which are smaller than a pixel.

    For polygons which are smaller than a pixel now you can only rastzerize 4 polygons in total. That means you have 4 scanconverter which have work and the other 4 will have a rest. If you see this makes no sense for furter game title. If you think about nanite engine whith much small polygons the new rasterzier will have no advantage...
     
    #1293 Digidi, Nov 22, 2020
    Last edited: Nov 22, 2020
  14. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    It's an intermediate step, think of it as a sorting mechanism.

    If I get it right, you now have the 4 coarse rasterizer á 16 pixels, which is fine for now and probably wasn't worth it to redesign the coarse rasterizers. This is the hard limit for coarse grained geometry and that's what traditional fill rate tests measure via one full screen quad (or at least very few very large triangles).

    If the size check concludes that you would waste rasterizing performance because the triangles are too small, it skips the coarse rasterizer and sends the data to the smaller rasterizers, which operate more efficiently at smaller polygons.

    edit for spelling
     
    #1294 CarstenS, Nov 22, 2020
    Last edited: Nov 22, 2020
  15. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Culling is likely to require a small fixed number of clock cycles, while scan conversion requires a variable amount of work and time. The optimal ratio between these 2 functions likely changes significantly throughout the frame but on average it might very well be a 1-to-2 ratio gives better perf than 1-to-1. For instance it could help reducing pipeline bubbles post scan conversion, thus reducing the time CUs sit idle waiting for work to do.
     
  16. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    That's an interesting patent, but I wasn't referencing it.

    Coarse rasterization before fine rasterization is not a new concept. For example, you can quickly coarse rasterize and perform a hierarchical depth test before performing fine rasterization. The reason I referenced it is because the driver define that's causing confusion is probably referring to fine rasterizer and may be related to RB+.

    There are still triangles > 16 pixels. Someone would need to analyze some games to see how often they occur.

    Many triangles smaller than a pixel will be thrown out as part of the faster culling process. Nanite is using compute for most small triangles and it remains to be seen if other engines will follow suit. It's likely IHVs will continue to improve rasterization performance while trying to spend as few transistors as possible.
     
    TheAlSpark, tinokun, Digidi and 5 others like this.
  17. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    Thank you for the answer. But how can a rasterizer be more efficent. Even at gcn, triangles smaller than a polygon can be converted in 1 clock and also now you have only 4 rasterizer which can handle 1 polygon per clock. It is the same like gcn. Only the culling engine before is new. So how do you improve rasterizsation with 2 kindes of rasterizer when both do the same?

    I see only advantage for big bpolygons over 4x4 pixel size.

    Edit: Also it is strange that each array have it's own rasterizer (Scan converter) if you look into linux driver:
    num_se: 4 You have 4 sahder engins
    num_sh_per_se: 2 you have 2 Shaderarrys for each shaderengine
    num_sc_per_sh: 1 and each shaderarry have 1 scan converter for its own?:
    So for scan converter you get: num_se x num_sh_per_se x num_sc_per_sh = 4x2x1 = 8

    But if i follow that we have 2 rasterizer for a shaderengine it should look like:
    num_se: 4
    num_sc_per_se: 2

    @CarstenS how is 6800xt performing against a 3090 with its 7 gpc when you have 0% culling (list and strip) polyogons? Have AMD here an advantage?
     
    #1297 Digidi, Nov 22, 2020
    Last edited: Nov 22, 2020
  18. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Perhaps you were joking but this is far from a stupid idea. With good post-processing AA, and perhaps clever texturing tricks, it might produce very interesting results. For instance, you might render and especially sample textures at 4K, but path-trace at 320p, and just interpolate the path-tracing results for pixels where you don't path-trace.
     
  19. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    This is actually an open question. With alternative geometry representations you technically have neither triangle meshes and their associated UV mapping in memory, nor do you possibly have the need to have such at all. With a few recent advancements, very recent, you can even get detailed sharp geo out of indirect tracing with implicit surfaces, which is the best guess as to what UE5 is using.

    Maybe they'll have it for animated meshes. As of the last presentation those were the only triangle meshes used, and afaik implicit surface animation is still a semi open question. Never the less Dreams manages to do it somehow. It's certain UE5 has it's own top level acceleration structure using SDF's, and I'd be surprised if they didn't embed any triangle meshes in a blas into that top level. Either way UE5 already supports at least basic animated characters using SDF's, so that's going to be used for diffuse GI and possibly even semi-glossy reflections, at the very least. Meaning the only likely use of DXR in UE5 will be for sharp reflections, and maybe for caustics if anyone really wants that accuracy.

    As to "wasting performance" it's just the opposite. Alternate tracing structures are so much faster than hardware raytracing regardless of hardware vendor that it's no question which should be used. That demo was doing multi bounce recursive diffuse GI and glossy reflections on a scale and detail level that would crush a 3090 running at 720p; while doing so on a sub 3070 equivalent at 1440p. While I do expect them to bake into probes as it solves a lot of energy loss problems they're encountering, even then doing that with Lumen is faster than hardware raytracing.
     
    #1299 Frenetic Pony, Nov 22, 2020
    Last edited: Nov 22, 2020
  20. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    I have not doubt post PS5 and Xbox Series, the situation will change with RT being more important but I think the RT will be very selective for the next 5 years.

    https://gfxspeak.com/2020/10/09/unreal-engine-almost/

     
    jgp, Jawed, Lightman and 1 other person like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...