AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The internal particulars may have changed over time. There are still ACEs, as HWS is more concerned with having a processor virtualize the fixed number of queues the ACEs are processing so that an arbitrary number of queues can be swapped in and out.
    Whether HWS is distinct hardware isn't clear to me.
    ACEs are at least since Sea Islands come in groups of 4 custom processors that share some resource, like the microcode store they use.
    When HWS was introduced, it seemed to come at the expense of ACE resources being re-assigned to monitoring and controlling the queues by the other ACEs. This may have been why Fury's marketing went from 8 ACEs to 4ACEs + HWS.
    There's other nuances like which pipes have dispatch capability that might distinguish ACEs.
    Whether more modern GPUs would give HWS hardware ACE capabilities once it became standard isn't clear. I think the HWS has since been described as a dual-threaded processor/block, and so this may have become more distinct from whatever it is the ACEs are.

    I believe there were code changes related to the graphics command processor (cluster of at least 3 cores) that hint at a possible extra processor that has some similar functions as HWS for graphics, in that it can swap and direct what the hardware graphics queues are linked to.
     
    PSman1700 likes this.
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    No joke, I thought that I had stumbled into a thread from 2010 :)
     
    Alexko, Kej, Krteq and 4 others like this.
  3. xEx

    xEx
    Veteran

    Joined:
    Feb 2, 2012
    Messages:
    1,060
    Likes Received:
    543
    Do we know at what time the NDA will lift?
     
  4. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    I think everything is 9am EST 12pm PST
     
    xEx likes this.
  5. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    62
    Likes Received:
    123
    Location:
    Toronto-ish
    Yep - each MEC block has 4 processor threads. If I remember correctly each thread can either run "pipe" microcode (multiplexing between 8 queues on the pipe and managing/submitting work from those queues) or "HWS" microcode (a layer above the pipes/queues which dynamically maps queues from a larger set onto the available HW queues. It also multiplexes an unlimited number of processes onto a finite number of VMIDs.

    Each HW queue has a Hardware Queue Descriptor associated with it, while each application queue has a Memory Queue Descriptor. The HWS microcode is passed a runlist with a set of MQDs for each process plus a set of resources (HW queues + VMIDs) plus a few other parameters. At that point HWS takes over and maps (copies) sets of MQDs into HQDs and lets the queues run for a programmable time quantum. At the end of the time quantum it rolls the waves off the hardware, HQD contents are written back into MQDs, and the next set of MQDs is selected.

    If you have trouble sleeping you can pick through amdkfd->kfd_packet_manager.c and explore out from there. "Oversubscription" refers to having more MQDs than available HQDs or more processes than VMIDs, requiring HWS to round-robin multiplex sets of MQDs onto the HW queues.

    If HWS is not being used then driver code maps MQDs to HQDs directly, but we normally run with HWS enabled all the time.
     
    jgp, Kej, Ethatron and 4 others like this.
  6. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    1. May I ask, which game developers (or when) used image sharpening filters for real-time resizing of rendered image?
    2. In the case you are not talking particularly about real-time image resizing, GPU-accelerated AI-based image resizing was used long time before DLSS, e. g. for photo-resizing for large-print purposes, but also by some game developers for creating high-resolution textures.
     
    NightAntilli and Jawed like this.
  7. Dictator

    Regular

    Joined:
    Feb 11, 2011
    Messages:
    681
    Likes Received:
    3,969
    Any game that offered bicubic or lanczos up or downscaling to me is considered a form of sharpening to a degree. And that goes way back.
    Other than that.. any game on PC that offered a subnative resolution option as well as had controls for sharpening. That goes back to pre-UE4 games AFAIK.
    GeDoSaTo tool on PC offered down and upscaling with sharpening ever since its inception as well.
     
    tinokun, PSman1700 and pharma like this.
  8. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    You know what, ignore me. I was getting ACEs mixed up with Shader Arrays. I blame Tech Report Techspot which does the same in their Navi vs Turing architecture article!

    In terms of shader arrays what could be going on here though? If a shader array still contains one primitive unit then RDNA2 can't have 2 of them per Shader Engine can it? Even though the series X definitely does? Or have I misunderstood something else?
     
    #788 pjbliverpool, Nov 18, 2020
    Last edited: Nov 18, 2020
  9. Leoneazzurro5

    Regular

    Joined:
    Aug 18, 2020
    Messages:
    335
    Likes Received:
    348
    https://videocardz.com/newz/amd-radeon-rx-6800-launch-press-deck-transcript

    it says

    Geometry Processor
    • 8 Pre-Cull Prims/Cycle
    • 4 Post-Cull Prims/Cycle
    So 2 Pre-cull primitives per SE and 1 Post-cull primitive per SE
     
  10. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany

    Attached Files:

    PSman1700 and pharma like this.
  11. Leoneazzurro5

    Regular

    Joined:
    Aug 18, 2020
    Messages:
    335
    Likes Received:
    348
    Well at 2+GHz it is a considerable amount of triangles/s anyway, and with working mesh shaders and probably better primitive culling it should be more than enough .RBE seems to be quite reworked compared to Navi10, anyway.
     
  12. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Indeed but doesn't each primitive unit accept 2 un-culled primitives and output 1 culled primitive per clock? So in RDNA we have:

    • 2 Shader Engines
    • 2 Shader Arrays per Shader Engine
    • 1 Primitive Unit per Shader Array
    • Hence 4 primitives output per clock

    According to Hotchips the XSX is setup the same albeit with 14 CU's per Shader Array rather than 10 in RDNA.

    I thought it was pretty much confirmed that there were 4 Shader Engines in Navi21 which means based on the above if should have 8 primitive units.

    Also bare in mind we know Navi21 has 128 ROPS which would also suggest 4 Shader Engines and 8 Shader Arrays (16 ROPS per SA) given that's how the Series X is configured this way with 2 Shader Engines and 64 ROPs.

    So the only explanation I can think of for Navi21 only outputting 4 primitives per clock is if the overall architecture is drastically changed, i.e. still 4 Shader Arrays with doubled up resources in each, or the Primitive Units in Navi21 only output 1 Primitive every other clock vs one every clock in Navi10. Which sounds strange - especially as the Series X still outputs 1 per clock.

    Hopefully we'll find out in about 4 hours!
     
    Lightman, PSman1700 and pharma like this.
  13. Leoneazzurro5

    Regular

    Joined:
    Aug 18, 2020
    Messages:
    335
    Likes Received:
    348
    The figures should be for the whole chip as all other figures in that section were calculated for the Navi21 as a whole. Details are unclear, so it is unknown if it's one unit with double unculled primitive gen per SE or ther eare two units with halved culled primitive generation per clock. A thing is that, by pushing clocks so high and by relying on improved culling, they could have less need to improve their geometric power.
     
  14. Anyone else here completely shocked by the fact that it's 11h am GMT of launch day and not one review has leaked so far?

    I saw some charts on reddit supposedly from a youtuber but those seemed fake.
     
    Lightman and PSman1700 like this.
  15. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Yes could be. I wonder then if the same would apply to the PS5 and XSX if this is true of Navi21? I don't think it's actually been explicitly stated what the primitive throughput is on either of those consoles has it? We know how many primitive units the XSX has but if they are half as effective as those in RDNA then the throughput would be half what we currently think it is.
     
    PSman1700 likes this.
  16. Leoneazzurro5

    Regular

    Joined:
    Aug 18, 2020
    Messages:
    335
    Likes Received:
    348
    Well both PS5 and XBSX are custom chips, so they haven't to be 1:1 with desktop chips, i.e. we got no infinity Cache on consoles.
     
  17. dskneo

    Regular

    Joined:
    Jul 25, 2005
    Messages:
    816
    Likes Received:
    298
    These popped up. Take with salt

    [​IMG]
     
  18. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    More prices ... UK etailer is listing prices for Asus Radeon RX 6800 (XT) cards
    https://www.guru3d.com/news-story/uk-etailer-is-listing-prices-for-asus-radeon-rx-6800-(xt)-cards.html
     
    #798 pharma, Nov 18, 2020
    Last edited: Nov 18, 2020
  19. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Makes you wonder, doesn't it? :)
     
    Lightman likes this.
  20. 2h30m before embargo lifts? Is that like a record-setting embargo respectfullness on a graphics card (or anything non-apple) from the last 10 years?

    But wow, those tables really turn from Gears 5 onwards.


    Wonder what? What should I wonder about?? I need to know, this wait is taking forever!!!!
     
    Lightman likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...