AMD RDNA3 Specifications Discussion Thread


Can anyone test this on Linux, I read it's expected to be around 5% perf deficit or so
In practice I doubt a lack of shader prefetch will make much difference. First time a shader runs it may run a tiny bit slower due to cache not being primed. But it would definitely be interesting to force-disable prefetch on RDNA2 and benchmark a few games just to see…
 
Do we have performance per watt figures at 300W for the 7900 XTX?

It'd be interesting if they specifically chose 300W for the figure if [1] that's below the knee of the power curve for 7900 XTX and [2] above the knee of the power curve for 6900 XT. That'd be a classic case of manipulating the numbers while simultaneously using real data which isn't terribly relevant to the shipped product as it's spec'd as a 355W part.

Although I still don't think that'd necessarily change things much. I'm just guessing something went horribly horribly wrong and the shipping product ended up being something other than what they used for the testing.

Regards,
SB
 
Anyone know what this tasty little tidbit of RDNA3 driver code means?
#if VKI_BUILD_GFX11 bool enableRayTracingHwTraversalStack; ///< Enable using hardware accelerated traversal stack #endif
 
Anyone know what this tasty little tidbit of RDNA3 driver code means?
#if VKI_BUILD_GFX11 bool enableRayTracingHwTraversalStack; ///< Enable using hardware accelerated traversal stack #endif
I found these too:
RtIp2_0 = 0x3, ///< Added more Hardware RayTracing features, such as BoxSort, PointerFlag, etc supportRayTraversalStack : 1; ///< HW assisted ray tracing traversal stack support supportPointerFlags : 1; ///< Ray tracing HW supports flags embedded in the node

With all these fancy new RT features, why isn’t RDNA3 any faster per clock at RT than RDNA2? Are AMD’s drivers not making use of them yet, or is something else broken?
 
The MultiDrawDirect Accelerator is also not wokring good. Its better than the 6900xt but not the 2,3 which what was staed in the slides.
 

Attachments

  • polygons6.jpg
    polygons6.jpg
    42.3 KB · Views: 28
7900 XTX close to two times faster than 6900 XT. Factoring in slightly higher clock and shader counts, id's still get at least 1.5 over the thumb.
Quite different in Cyberpunk 2077, which runs 27 percent faster than RDNA 2 with the same computing power on RDNA 3, and there is even a 30 percent increase in percentile FPS. Doom Eternal is up 32 and 40 percent, Spider-Man Remastered is up 25 and 33 percent, and Metro Exodus is still up a good 22 and 21 percent.
 
It'd be interesting if they specifically chose 300W for the figure if [1] that's below the knee of the power curve for 7900 XTX and [2] above the knee of the power curve for 6900 XT. That'd be a classic case of manipulating the numbers while simultaneously using real data which isn't terribly relevant to the shipped product as it's spec'd as a 355W part.

Although I still don't think that'd necessarily change things much. I'm just guessing something went horribly horribly wrong and the shipping product ended up being something other than what they used for the testing.

Regards,
SB
I think it's above the knee of the power curve for the 6900 XT, since performance per watt degrades on the 6950 XT. Though perhaps it degrades more on the 7900 XTX and that's why they didn't compare to the 6950 XT at 335 Watts.

In any case, without testing at 300W I don't think we can confirm that AMD "lied".
 
One thing I thought was weird was Tomshardware saying, "The GPU shader counts are where things start to get a bit different from other architectures. AMD says there are still 64 Streaming Processors (SP) per CU, but there are now four SIMD32 vector units per CU as well — two of which can only process FP32 or Matrix operations and not INT32."
if you have 4 SIMD, wouldn't you want them to be flexible and do all types of operations? The excuse being power savings.

Also, I believe it was LTT or GN? that was talking about how the thermistor on the fan is too close to the heatsink. They measured it and it was at least 10C off from ambient. Heatsoak causing poor readings isn't doing it any favors.
 
I think it's above the knee of the power curve for the 6900 XT, since performance per watt degrades on the 6950 XT.
All of this is extremely workload dependant. In most AAA games my 6900xt (watercooled, limit set at 520W/475A) consumes around 350W (if fully loaded, which is hard to do at 1080p) while in Timespy it is usually well above 390W with peaks of 470W and higher. Clock/fps most of the times scales linearly (provided 100% true GPU load is achievable), but power scaling is mostly unpredictable.

Also, Timespy has weird interdependance between performance and amount of monitors active (more than one tanks the graphics score by 1K points or so, very significant loss) while it never happens in TSE or most games (in some of them (SOTTR) there is a small loss (about 5-10 fps) at 1440p in the benchmark). There's also the case of furmark which easily hits power limit at 720p noAA and causes the GPU core to consume more than 400A.

In any case, testing power efficiency at capped FPS is very Huang'esque move - by manipulating the limit you can arbitrarily make the difference between more powerful and less powerful GPUs as wide as you want (which was used to prop up Ampere and Ada in their presentations). In this fashion i can also show that my old Vega56 was 2-3x more efficient than Hawaii, let's say, in Witcher 3, by setting limit at which it was still in low-load scenario while Hawaii would be 100% loaded at max core voltage.
 
It looks like the Scheduling is totaly broke? Like in the benchmark i did, it is intresting that compute Shader can not handle the simple "software Rasterizer Task" . Im' wondering why this simple and clear task is not 2x as fast because 7900xtx have double shader performance....
Also the MultiDrawIndirect Accelerator is not working properleay. I cant see 2.3 performance gain.





polygons5.jpg
 
Also the MultiDrawIndirect Accelerator is not working properleay. I cant see 2.3 performance gain.
Slide numbers are always under specific conditions and should be read "up to xyz", not "xyz". Not seeing 2.3 times the performance in one test doesn't mean anything for any other scenario using said feature.
 
Slide numbers are always under specific conditions and should be read "up to xyz", not "xyz". Not seeing 2.3 times the performance in one test doesn't mean anything for any other scenario using said feature.
I agreee but this benchmark was written that it should pull out the most of the MultiDrawIndirect feature. This feature is DX12 and Vulcan Api related. So if AMD has an accelerator it should automaticly kick in if sombody uses MulitDrawIndirect over the API. That is confusing.
 
Back
Top