AMD RDNA4 Architecture Speculation

dr_ribit · Mar 23, 2025

DavidGraham said:
NVIDIA definitely runs tensor and fp32 ops concurrently, especially now with their tensor cores busy almost 100% of the time (doing upscaling, frame generation, denoising, HDR post processing, and in the future neural rendering).

Latest NVIDIA generations have become exceedingly better at mixing all 3 workloads (tensor+ray+fp32) concurrently, I read somewhere (I can't find the source now) that ray tracing + tensor are the most common concurrent ops, followed by ray tracing + fp32/tensor + fp32.

Concurrent execution of CUDA and Tensor cores

Yes, that is what it means. I don’t know where you got that. If the compiler did not schedule tensor core instructions along with other instructions, what else would it be doing? NOP? Empty space? Maybe you are mixing up what the compiler does and what the warp scheduler does. The warp...

forums.developer.nvidia.com

I need help understanding how concurrency of CUDA Cores and Tensor Cores works between Turing and Ampere/Ada?

There isn’t much difference between Turing, Ampere and Ada in this area. This question in various forms comes up from time to time, here is a recent thread. It’s also necessary to have a basic understanding of how instructions are issued and how work is scheduled in CUDA GPUs, unit 3 of this...

forums.developer.nvidia.com

The way I read this it seams that while the workloads are executed concurrently, they are still not dispatched concurrently (unlike on some other architectures). So some pipes will be underutilized.

fellix · Mar 24, 2025

RDNA 4’s “Out-of-Order” Memory Accesses

AMD’s RDNA 4 brings a variety of memory subsystem enhancements. Among those, one slide stood out because it dealt with out-of-order memory accesses. According to the slide, RDNA 4 allows requ…

old.chipsandcheese.com

trinibwoy · Mar 24, 2025

fellix said:
RDNA 4’s “Out-of-Order” Memory Accesses

AMD’s RDNA 4 brings a variety of memory subsystem enhancements. Among those, one slide stood out because it dealt with out-of-order memory accesses. According to the slide, RDNA 4 allows requ…

old.chipsandcheese.com

Love this guy. Micro benchmarks aren’t dead yet.

Albuquerque · Mar 25, 2025

Mod Mode: I spun off the NVIDIA-specific conversation around multi-dispatch and computation into another thread. This one is for RDNA4

DegustatoR · Apr 3, 2025

According to ITHome, who are quoting their own sources now, the Radeon RX 9070 GRE is indeed coming. Presumably, this means another Chinese market exclusive; however, there were instances where AMD brought GRE models to the global market (RX 7900 GRE).

The media have no specs on this graphics card yet, but they assume that this means a 192-bit memory bus and, as a result, a 12GB GDDR6 memory configuration. This sounds a lot like an RX 9070-class GPU, except AMD already has one, and the RX 9060 XT is already confirmed to have a 128-bit memory bus.

https://videocardz.com/newz/amd-reportedly-preparing-radeon-rx-9070-gre

DegustatoR · Apr 15, 2025

Just like the RTX 5060 Ti, the Radeon RX 9060 XT will feature two memory configurations of 8GB and 16GB. The difference compared to GeForce is that AMD is sticking to GDDR6 technology and clocks of 20 Gbps.

Based on the most recent information we have from AMD board partners, the RX 9060 XT will launch with 2048 Stream Processors. This is, of course, nothing surprising, because the card was meant to use the Navi 44 GPU, which has half the core count of Navi 48.

We also have an update on clocks, and it looks very interesting. First, a reminder that the RX 7600 XT, the predecessor to the RX 9060 XT, featuring the Navi 33 XT GPU, had a game clock of 2470 MHz and a boost clock of 2755 MHz. The RDNA4 update will have much higher clocks. According to our information, the RX 9060 XT will ship with a 2620 MHz game clock and a 3230 MHz boost clock. But that’s not all, we also learned that some OC variants will have a 3.3 GHz boost.

https://videocardz.com/newz/amd-radeon-rx-9060-xt-features-2048-cores-boost-clock-of-3-2-ghz

Sega_Model_4 · Apr 15, 2025

RDNA 4’s Raytracing Improvements

Raytraced effects have gained increasing adoption in AAA titles, adding an extra graphics quality tier beyond traditional “ultra” settings.

chipsandcheese.com

DegustatoR · Apr 17, 2025

https://videocardz.com/newz/amd-radeon-rx-9070-gre-to-feature-3072-cores-and-12gb-memory-further-specs-leaked

trinibwoy · Apr 17, 2025

Guessing $399 GRE and $299 9060 XT.

no-X · Apr 21, 2025

Radeon RX 9060 XT should be like 5 % slower than GeForce RTX 5060 Ti. Why should it be $130 cheaper? Also, there's just $50 difference between Radeon RX 9070 XT and Radeon RX 9070. Why should be the Radeon RX 9070 GRE $150 cheaper? I'd expect $499 for the GRE (maybe $479 if Su's in a good mood), $399 for XT 16GB and $349 for XT 8GB.

trinibwoy · Apr 21, 2025

no-X said:
Radeon RX 9060 XT should be like 5 % slower than GeForce RTX 5060 Ti. Why should it be $130 cheaper? Also, there's just $50 difference between Radeon RX 9070 XT and Radeon RX 9070. Why should be the Radeon RX 9070 GRE $150 cheaper? I'd expect $499 for the GRE (maybe $479 if Su's in a good mood), $399 for XT 16GB and $349 for XT 8GB.

That would be 4 cards in a $200 range. And only $100 between the 9070 GRE, “regular” and XT. Thats a tight grouping.

raytracingfan · Apr 22, 2025

Approximately (+/- $30) $449 9070 GRE, $349 9060 XT 16GB, $319 9060 XT 8GB, and $269 9060 is my guess if tariffs weren't a factor. With tariffs...

Sega_Model_4 · Apr 23, 2025

AMD to launch Radeon RX 9060 XT on May 18th, RX 9070 GRE pushed back to Q4

https://videocardz.com/newz/amd-to-launch-radeon-rx-9060-xt-on-may-18th-rx-9070-gre-pushed-back-to-q4

fellix · Apr 28, 2025

RDNA4 White Paper: https://www.amd.com/content/dam/amd...ctures/rdna4-instruction-set-architecture.pdf

AMD RDNA4 Architecture Speculation

dr_ribit

Concurrent execution of CUDA and Tensor cores

I need help understanding how concurrency of CUDA Cores and Tensor Cores works between Turing and Ampere/Ada?

fellix

RDNA 4’s “Out-of-Order” Memory Accesses

trinibwoy

Meh

RDNA 4’s “Out-of-Order” Memory Accesses

Albuquerque

Red-headed step child

DegustatoR

DegustatoR

Sega_Model_4

RDNA 4’s Raytracing Improvements

DegustatoR

trinibwoy

Meh

no-X

trinibwoy

Meh

raytracingfan

Sega_Model_4

fellix

Similar threads