AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

3dilettante · Nov 18, 2020

Jawed said:
Doesn't HWS supplant ACE? Or it's another module that provides more finely-controllable use of async compute?

The internal particulars may have changed over time. There are still ACEs, as HWS is more concerned with having a processor virtualize the fixed number of queues the ACEs are processing so that an arbitrary number of queues can be swapped in and out.
Whether HWS is distinct hardware isn't clear to me.
ACEs are at least since Sea Islands come in groups of 4 custom processors that share some resource, like the microcode store they use.
When HWS was introduced, it seemed to come at the expense of ACE resources being re-assigned to monitoring and controlling the queues by the other ACEs. This may have been why Fury's marketing went from 8 ACEs to 4ACEs + HWS.
There's other nuances like which pipes have dispatch capability that might distinguish ACEs.
Whether more modern GPUs would give HWS hardware ACE capabilities once it became standard isn't clear. I think the HWS has since been described as a dual-threaded processor/block, and so this may have become more distinct from whatever it is the ACEs are.

I believe there were code changes related to the graphics command processor (cluster of at least 3 cores) that hint at a possible extra processor that has some similar functions as HWS for graphics, in that it can swap and direct what the hardware graphics queues are linked to.

trinibwoy · Nov 18, 2020

Jawed said:
Holy smokes, I was wondering how we could summon you

No joke, I thought that I had stumbled into a thread from 2010

xEx · Nov 18, 2020

Do we know at what time the NDA will lift?

eastmen · Nov 18, 2020

xEx said:
Do we know at what time the NDA will lift?

I think everything is 9am EST 12pm PST

bridgman · Nov 18, 2020

3dilettante said:
Whether HWS is distinct hardware isn't clear to me. ACEs are at least since Sea Islands come in groups of 4 custom processors that share some resource, like the microcode store they use. When HWS was introduced, it seemed to come at the expense of ACE resources being re-assigned to monitoring and controlling the queues by the other ACEs. This may have been why Fury's marketing went from 8 ACEs to 4ACEs + HWS.

Yep - each MEC block has 4 processor threads. If I remember correctly each thread can either run "pipe" microcode (multiplexing between 8 queues on the pipe and managing/submitting work from those queues) or "HWS" microcode (a layer above the pipes/queues which dynamically maps queues from a larger set onto the available HW queues. It also multiplexes an unlimited number of processes onto a finite number of VMIDs.

Each HW queue has a Hardware Queue Descriptor associated with it, while each application queue has a Memory Queue Descriptor. The HWS microcode is passed a runlist with a set of MQDs for each process plus a set of resources (HW queues + VMIDs) plus a few other parameters. At that point HWS takes over and maps (copies) sets of MQDs into HQDs and lets the queues run for a programmable time quantum. At the end of the time quantum it rolls the waves off the hardware, HQD contents are written back into MQDs, and the next set of MQDs is selected.

If you have trouble sleeping you can pick through amdkfd->kfd_packet_manager.c and explore out from there. "Oversubscription" refers to having more MQDs than available HQDs or more processes than VMIDs, requiring HWS to round-robin multiplex sets of MQDs onto the HW queues.

If HWS is not being used then driver code maps MQDs to HQDs directly, but we normally run with HWS enabled all the time.

no-X · Nov 18, 2020

pharma said:
Lol ... Image Sharpening filters were in use by developers long before AMD created RIS. Can't say the same about DLSS.

1. May I ask, which game developers (or when) used image sharpening filters for real-time resizing of rendered image?
2. In the case you are not talking particularly about real-time image resizing, GPU-accelerated AI-based image resizing was used long time before DLSS, e. g. for photo-resizing for large-print purposes, but also by some game developers for creating high-resolution textures.

Dictator · Nov 18, 2020

no-X said:
1. May I ask, which game developers (or when) used image sharpening filters for real-time resizing of rendered image?
2. In the case you are not talking particularly about real-time image resizing, GPU-accelerated AI-based image resizing was used long time before DLSS, e. g. for photo-resizing for large-print purposes, but also by some game developers for creating high-resolution textures.

Any game that offered bicubic or lanczos up or downscaling to me is considered a form of sharpening to a degree. And that goes way back.
Other than that.. any game on PC that offered a subnative resolution option as well as had controls for sharpening. That goes back to pre-UE4 games AFAIK.
GeDoSaTo tool on PC offered down and upscaling with sharpening ever since its inception as well.

pjbliverpool · Nov 18, 2020

3dilettante said:
ACEs are front end processors without a fixed relationship with shader engines. The number of ACEs per shader engine has varied over GCN's lifetime. The PS4 had 8 for 2, Hawaii had 8 for 4, Fury had 4 for 4, APUs have had 1/2 for 1, etc.
I haven't seen restrictions in the number of CUs that could be addressed by an ACE in the past. Given how long it took for decent adoption of asynchronous compute, getting one queue to be used would be a likely minimum and it's not like those scenarios couldn't access all CUs. Other marketing for things like rapid response queues seemed to show allocations that spanned shader engines as well.

You know what, ignore me. I was getting ACEs mixed up with Shader Arrays. I blame ~~Tech Report~~ Techspot which does the same in their Navi vs Turing architecture article!

In terms of shader arrays what could be going on here though? If a shader array still contains one primitive unit then RDNA2 can't have 2 of them per Shader Engine can it? Even though the series X definitely does? Or have I misunderstood something else?

Leoneazzurro5 · Nov 18, 2020

pjbliverpool said:
You know what, ignore me. I was getting ACEs mixed up with Shader Arrays. I blame Tech Report which does the same in their Navi vs Turing architecture article!

In terms of shader arrays what could be going on here though? If a shader array still contains one primitive unit then RDNA2 can't have 2 of them per Shader Engine can it? Even though the series X definitely does? Or have I misunderstood something else?

https://videocardz.com/newz/amd-radeon-rx-6800-launch-press-deck-transcript

it says

Geometry Processor

8 Pre-Cull Prims/Cycle
4 Post-Cull Prims/Cycle

So 2 Pre-cull primitives per SE and 1 Post-cull primitive per SE

CarstenS · Nov 18, 2020

Leoneazzurro5 said:
https://videocardz.com/newz/amd-radeon-rx-6800-launch-press-deck-transcript

it says

Geometry Processor

8 Pre-Cull Prims/Cycle

4 Post-Cull Prims/Cycle

So 2 Pre-cull primitives per SE and 1 Post-cull primitive per SE

Sounds a lot like RDNA1.

Leoneazzurro5 · Nov 18, 2020

CarstenS said:
Sounds a lot like RDNA1.

Well at 2+GHz it is a considerable amount of triangles/s anyway, and with working mesh shaders and probably better primitive culling it should be more than enough .RBE seems to be quite reworked compared to Navi10, anyway.

pjbliverpool · Nov 18, 2020

Leoneazzurro5 said:
https://videocardz.com/newz/amd-radeon-rx-6800-launch-press-deck-transcript

it says

Geometry Processor

8 Pre-Cull Prims/Cycle

4 Post-Cull Prims/Cycle

So 2 Pre-cull primitives per SE and 1 Post-cull primitive per SE

Indeed but doesn't each primitive unit accept 2 un-culled primitives and output 1 culled primitive per clock? So in RDNA we have:

2 Shader Engines
2 Shader Arrays per Shader Engine
1 Primitive Unit per Shader Array
Hence 4 primitives output per clock

According to Hotchips the XSX is setup the same albeit with 14 CU's per Shader Array rather than 10 in RDNA.

I thought it was pretty much confirmed that there were 4 Shader Engines in Navi21 which means based on the above if should have 8 primitive units.

Also bare in mind we know Navi21 has 128 ROPS which would also suggest 4 Shader Engines and 8 Shader Arrays (16 ROPS per SA) given that's how the Series X is configured this way with 2 Shader Engines and 64 ROPs.

So the only explanation I can think of for Navi21 only outputting 4 primitives per clock is if the overall architecture is drastically changed, i.e. still 4 Shader Arrays with doubled up resources in each, or the Primitive Units in Navi21 only output 1 Primitive every other clock vs one every clock in Navi10. Which sounds strange - especially as the Series X still outputs 1 per clock.

Hopefully we'll find out in about 4 hours!

Leoneazzurro5 · Nov 18, 2020

pjbliverpool said:
Indeed but doesn't each primitive unit accept 2 un-culled primitives and output 1 culled primitive per clock? So in RDNA we have:

2 Shader Engines

2 Shader Arrays per Shader Engine

1 Primitive Unit per Shader Array

Hence 4 primitives output per clock

According to Hotchips the XSX is setup the same albeit with 14 CU's per Shader Array rather than 10 in RDNA.

I thought it was pretty much confirmed that there were 4 Shader Engines in Navi21 which means based on the above if should have 8 primitive units.

Also bare in mind we know Navi21 has 128 ROPS which would also suggest 4 Shader Engines and 8 Shader Arrays (16 ROPS per SA) given that's how the Series X is configured this way with 2 Shader Engines and 64 ROPs.

So the only explanation I can think of for Navi21 only outputting 4 primitives per clock is if the overall architecture is drastically changed, i.e. still 4 Shader Arrays with doubled up resources in each, or the Primitive Units in Navi21 only output 1 Primitive every other clock vs one every clock in Navi10. Which sounds strange - especially as the Series X still outputs 1 per clock.

Hopefully we'll find out in about 4 hours!

The figures should be for the whole chip as all other figures in that section were calculated for the Navi21 as a whole. Details are unclear, so it is unknown if it's one unit with double unculled primitive gen per SE or ther eare two units with halved culled primitive generation per clock. A thing is that, by pushing clocks so high and by relying on improved culling, they could have less need to improve their geometric power.

Deleted member 13524 · Nov 18, 2020

Anyone else here completely shocked by the fact that it's 11h am GMT of launch day and not one review has leaked so far?

I saw some charts on reddit supposedly from a youtuber but those seemed fake.

pjbliverpool · Nov 18, 2020

Leoneazzurro5 said:
The figures should be for the whole chip as all other figures in that section were calculated for the Navi21 as a whole. Details are unclear, so it is unknown if it's one unit with double unculled primitive gen per SE or ther eare two units with halved culled primitive generation per clock. A thing is that, by pushing clocks so high and by relying on improved culling, they could have less need to improve their geometric power.

Yes could be. I wonder then if the same would apply to the PS5 and XSX if this is true of Navi21? I don't think it's actually been explicitly stated what the primitive throughput is on either of those consoles has it? We know how many primitive units the XSX has but if they are half as effective as those in RDNA then the throughput would be half what we currently think it is.

Leoneazzurro5 · Nov 18, 2020

Well both PS5 and XBSX are custom chips, so they haven't to be 1:1 with desktop chips, i.e. we got no infinity Cache on consoles.

dskneo · Nov 18, 2020

These popped up. Take with salt

pharma · Nov 18, 2020

More prices ... UK etailer is listing prices for Asus Radeon RX 6800 (XT) cards

ASUS Radeon RX 6800 XT ROG STRIX LC: £ 764; £ 917 with tax; 851 / 1,022 EUR
ASUS Radeon RX 6800 XT TUF GAMING OC: £ 676; £ 811 with tax; 753/903 EUR
ASUS Radeon RX 6800 TUF Gaming OC: £ 588; £ 705 with tax; 655/785 EUR
ASUS Radeon RX 6800 ROG STRIX OC: £ 605; £ 726 with tax; 674/809 EUR

The standard UK Value Add Tax rate is 20 percent. The base prices of the cards are 579 euros for the Radeon RX 6800 and 649 euros for the Radeon 6800 XT.

https://www.guru3d.com/news-story/uk-etailer-is-listing-prices-for-asus-radeon-rx-6800-(xt)-cards.html

CarstenS · Nov 18, 2020

ToTTenTranz said:
Anyone else here completely shocked by the fact that it's 11h am GMT of launch day and not one review has leaked so far?

Makes you wonder, doesn't it?

Deleted member 13524 · Nov 18, 2020

dskneo said:
These popped up. Take with salt

2h30m before embargo lifts? Is that like a record-setting embargo respectfullness on a graphics card (or anything non-apple) from the last 10 years?

But wow, those tables really turn from Gears 5 onwards.

CarstenS said:
Makes you wonder, doesn't it?

Wonder what? What should I wonder about?? I need to know, this wait is taking forever!!!!

AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

3dilettante

trinibwoy

Meh

xEx

eastmen

bridgman

no-X

Dictator

pjbliverpool

B3D Scallywag

Leoneazzurro5

CarstenS

Moderator

Attachments

Leoneazzurro5

pjbliverpool

B3D Scallywag

Leoneazzurro5

Deleted member 13524

Guest

pjbliverpool

B3D Scallywag

Leoneazzurro5

dskneo

pharma

CarstenS

Moderator

Deleted member 13524

Guest

Similar threads