AMD RDNA4 Architecture Speculation

Nisaaru · Mar 5, 2025

DegustatoR said:
Not really. XT will end up faster than 5070Ti in games which favor AMD h/w. This is a result where I'd expect the average w/o RT to be.

I meant the TDP.

DegustatoR · Mar 5, 2025

Nisaaru said:
I meant the TDP.

Ah, yeah, power is still a sore point for AMD. The "downclocked"/power limited 9070 does okay though.

trinibwoy · Mar 5, 2025

DegustatoR said:
Ah, yeah, power is still a sore point for AMD. The "downclocked"/power limited 9070 does okay though.

Yeah TPU does 60fps vsync testing and the 9070 looks very good there.

LordEC911 · Mar 6, 2025

RDNA4 Instruction Set Architecture Reference Guide was posted over at Anandtech's forum.
My apologies if it has already been posted in the thread.

rSkip · Mar 9, 2025

Work Graphs not supported or I had my configurations wrong? Weird.

GitHub - GPUOpen-LibrariesAndSDKs/WorkGraphsHelloWorkGraphs

Contribute to GPUOpen-LibrariesAndSDKs/WorkGraphsHelloWorkGraphs development by creating an account on GitHub.

github.com

Lurkmass · Mar 13, 2025

GFX12 register definitions (specifically for controlling the dynamic VGPR allocation settings in shown sample) of interest to anyone and another notable design change is the removal of HiZ support as well ...

I wonder if the removal of HiZ is related to the fact that UE5's Nanite renderer generates it's own HZB (reprojects geometry from last frame) which sort of obsoletes the need for a hardware assisted occulsion culling testing ?

fellix · Mar 13, 2025

Lurkmass said:
I wonder if the removal of HiZ is related to the fact that UE5's Nanite renderer generates it's own HZB (reprojects geometry from last frame) which sort of obsoletes the need for a hardware assisted occulsion culling testing ?

Or simply the benefit of HiZ has already diminished to the point of marginal performance obsolescence? May be framebuffer color compression more than compensates for the loss.

Shifty Geezer · Mar 13, 2025

Lurkmass said:
I wonder if the removal of HiZ is related to the fact that UE5's Nanite renderer generates it's own HZB (reprojects geometry from last frame) which sort of obsoletes the need for a hardware assisted occulsion culling testing ?

I very much hope IHVs aren't designing their GPUs around one engine!

trinibwoy · Mar 13, 2025

Isn’t HiZ even more important in the age of mesh shaders? More objects and more triangles = more potential overdraw right?

DegustatoR · Mar 13, 2025

trinibwoy said:
Isn’t HiZ even more important in the age of mesh shaders? More objects and more triangles = more potential overdraw right?

Most engines handle this themselves in compute these days probably? So h/w capability isn't as important. But also AMD h/w wasn't good at this anyway and you generally would get a speed up doing it in compute instead.

Charlietus · Mar 13, 2025

I wonder how much hardware goes mostly unused in modern GPU's (the MSAA blocks for example). I would accept it getting cut if IHV's emulated that functionality instead, for reasonable performance (not like in Nvidia's phisx debacle) if it meant that chips could get smaller and cheaper.

DegustatoR · Mar 13, 2025

Charlietus said:
chips could get smaller and cheaper

General rule is FF is always smaller and cheaper than handling the same thing via generic compute. The benefits of the latter are mostly down to flexibility and the option to avoid using it at all in some cases.

Charlietus · Mar 13, 2025

DegustatoR said:
General rule is FF is always smaller and cheaper than handling the same thing via generic compute. The benefits of the latter are mostly down to flexibility and the option to avoid using it at all in some cases.

What's FF?

DegustatoR · Mar 13, 2025

Charlietus said:
What's FF?

Fixed function h/w.

raytracingfan · Mar 13, 2025

Charlietus said:
I wonder how much hardware goes mostly unused in modern GPU's (the MSAA blocks for example). I would accept it getting cut if IHV's emulated that functionality instead, for reasonable performance (not like in Nvidia's phisx debacle) if it meant that chips could get smaller and cheaper.

Hardware VRS uses the MSAA blocks. Call of Duty's software VRS also uses the MSAA blocks.

Lurkmass · Mar 13, 2025

fellix said:
Or simply the benefit of HiZ has already diminished to the point of marginal performance obsolescence? May be framebuffer color compression more than compensates for the loss.

HiZ for occlusion culling and DCC for compressed memory bandwidth traffic are both different hardware blocks that implement different technologies with different purposes ...

Shifty Geezer said:
I very much hope IHVs aren't designing their GPUs around one engine!

GFX12 introduced page table entry (PTE) support for DCC to cover the case memory traffic compression for atomic memory operations and we all how this "one engine" likes using 64-bit texture atomics to render it's geometry to a visibility buffer ...

trinibwoy said:
Isn’t HiZ even more important in the age of mesh shaders? More objects and more triangles = more potential overdraw right?

Not if you never use the HW rasterizer to render most of the geometry like Nanite! HiZ is only active (disabled otherwise) when your HW rasterizer is in use ...

raytracingfan said:
Hardware VRS uses the MSAA blocks. Call of Duty's software VRS also uses the MSAA blocks.

HW VRS can interact with MSAA in that you can apply coarse pixel shading for MSAA pixels but elsewise it doesn't really prevent IHVs from removing or emulating other accelerated functionality (fast clears/resolve/compression) for MSAA render targets ...

CoD's software VRS implementation can use FMask hardware (removed since GFX11) whenever available (console APIs) for memory bandwidth reduction optimization but it's not required and they have a slower fallback implementation on PC ...

trinibwoy · Mar 13, 2025

Lurkmass said:
Not if you never use the HW rasterizer to render most of the geometry like Nanite! HiZ is only active (disabled otherwise) when your HW rasterizer is in use ...

Yeah I appreciate that but to Shifty’s point UE is just one engine.

Lurkmass · Mar 14, 2025

trinibwoy said:
Yeah I appreciate that but to Shifty’s point UE is just one engine.

Even if UE is just one engine, Epic Games has enough power to coerce IHVs into changing hardware design ...

It's only a matter of time before other graphically advanced engines implement similar GPU driven rendering systems because the benefits are too big for the rest of the industry to ignore and not doing so just becomes sub-optimal use of both current and future hardware designs as well. Competitive pressure from Epic Games ultimately means that other ISVs will 'converge' their technology some more or risk their middleware fading into irrelevance ...

Just as RAGE's virtual texturing technology and early implementations of deferred renderers in the past paved the way for modern high-end rendering paradigms so too will deferred texturing and GPU driven renderers since a competitive alternative seemingly has yet to emerge (if ever) so hardware vendors are now taking the opportunity to further optimize their designs to win some more benchmarks. The industry will cave into the arms race of ever higher graphical fidelity (both from other ISVs and HW) one way (implement similar technology) or another (ditch their own engines for UE) ...

Ext3h · Mar 14, 2025

DegustatoR said:
Most engines handle this themselves in compute these days probably?

They already used to back when HiZ was introduced - even without compute or geometry shaders. Biggest impact of HiZ was only in titles without any form of occlusion queries and high level culling, that tried to get away with pure Z-sorting only.

And even then it was only saving memory bandwidth, it wasn't actually saving on-chip communication overhead since all had to be routed through a single FF instance. And since memory bandwidth stopped being a concern once the Z buffer effectively started fitting in L3, there's no longer a point in keeping anything but the full resolution buffer around and thus reduce false sharing conflicts that still occur on the effectively immutable but yet shared highest levels of the HiZ pyramid...

fellix · Mar 14, 2025

Yep. Modern GPUs just threw a ton of on-chip cache at the problem.

AMD RDNA4 Architecture Speculation

Nisaaru

DegustatoR

trinibwoy

Meh

LordEC911

rSkip

GitHub - GPUOpen-LibrariesAndSDKs/WorkGraphsHelloWorkGraphs

Lurkmass

fellix

Shifty Geezer

uber-Troll!

trinibwoy

Meh

DegustatoR

Charlietus

DegustatoR

Charlietus

DegustatoR

raytracingfan

Lurkmass

trinibwoy

Meh

Lurkmass

Ext3h

fellix

Similar threads