AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Nope, it's exactly the opposite.
Having more threads in flight for more SIMDs won't do any good for burst workloads, burst workloads are usually small ones (few threads) and benefit from higher frequencies rather tham from more ALUs.
Wider GPUs suffer from underutilization on narrow burst workloads with a few threads.

You're probably using a different definition of "burst" than I am. Aside from RT there is rarely any single workload that occupies the GPU for more than 10% of total frame time. And during that time utilization is almost always poor.

Good renderers are architected to be math bound, not vram, because bandwidth is the most scarce resource (especially on consoles) with the worst scaling.

I haven't seen a math bound frame in any game that I've profiled. If you're lucky you'll have one or two workloads during the entire frame that keep the ALUs 50% busy.
 
You're probably using a different definition of "burst" than I am. Aside from RT there is rarely any single workload that occupies the GPU for more than 10% of total frame time. And during that time utilization is almost always poor.



I haven't seen a math bound frame in any game that I've profiled. If you're lucky you'll have one or two workloads during the entire frame that keep the ALUs 50% busy.

How do Doom and Control fare math wise?
 
It won't ever be sane until the big boye launches.

So one thing I'm not seeing people talk about is what features (aside from RT) will demand big boy performance in the next 2-3 years. In the same way that people are skeptical of RT's performance hit I have similar questions about why some games are so demanding. For example what are Borderlands 3, Control and Star Wars doing that's so heavy?

6900xt.png


How do Doom and Control fare math wise?

No idea. I haven't played more than 30 seconds of Doom and don't have Control yet.
 
In certain games like Cyberpunk prior the latest patch RT reflections were faster on Ampere than the "psycho" setting for SSRs.
In Gears 4, the Insane screen space reflection setting had a huge performance impact that was equal to adding true RT reflections, despite having little effect on visual quality in comparison.

Similarly, in Gears 5 the software screen space GI setting also has a very massive performance impact despite adding very little to the final image, adding true hardware RT GI or reflections would have yielded a much better image quality outcome with similar or better performance profile.

In Assassin's Creed Odyssey, Watch Dogs 2 and Borderlands 3, setting volumetric clouds/fog to max plummeted performance badly for little image quality improvement.

In Watch Dogs 2, Arma 3, Crysis Remastered, using draw distance settings at their max values destroyed performance, because draw distance is CPU heavy, and our current CPUs are not fast enough single threaded wise. So we end up with horrific performance at max settings. Same thing applies to Flight Simulator games, whether 2010 or 2020.

In Quantum Break, running the game on native resolution destroys performance, the advanced lighting of the game was designed to be performant only when upscaled from lower resolutions.

Advanced non hardware RT methods for AO always end up costing massive performance, that remained true for VXAO (in Final Fantasy 15 and Rise of Tomb Raider) or Signed Distance Field AO (in Ark Survival Evolved), adding special shadowing techniques from the sun such HFTS (The Division, Watch Dogs 2, Battlefront 2) or PCSS (Assassin's Creed Syndicate) also cost massive performance.

All of these (and many others) are examples of effects that reduce performance by a huge amount, that can be replaced easily with real RT effects for a massively better image quality gain and/or performance.
 
Last edited:
So one thing I'm not seeing people talk about is what features (aside from RT) will demand big boy performance in the next 2-3 years. In the same way that people are skeptical of RT's performance hit I have similar questions about why some games are so demanding. For example what are Borderlands 3, Control and Star Wars doing that's so heavy?

6900xt.png
Three different games doing different things I'm afraid.
BL3 is how a game should not be using the GPU in general. (Aka "bad optimization".)
Control is very shading heavy even without RT, it does a lot of stuff with SDFs and cone tracing in s/w.
SW is a DX11 game and AMD's DX11 driver is still bad.

In Quantum Break, running the game on native resolution destroys performance, the advanced lighting of the game was designed to be performant only when upscaled from lower resolutions.
Funnily enough, this one seem to gain a huge performance boost on Ampere - but I haven't seen any benchmarks of it on recent GPUs, going only from my own memory of how the game ran on Pascal/Turing.
 
Singlet player games is where the RT efforts need to be, like Cyberpunk. But of course all the big publishers just want multiplayer live service crap everywhere to milk all the money.

It also doesn't help adoption and perception of RT when it's so difficult to actually invest it in. My wife and I would have an Ampere by now if we could buy one.
 
The whole discussion on ray tracing being relevant or how much its used right now can be had on any new tech really. How many games do really make use of nvme ssd? It aint many. But we can guess that more and more will as time progresses.

It also doesn't help adoption and perception of RT when it's so difficult to actually invest it in. My wife and I would have an Ampere by now if we could buy one.

Ye thats a problem, nothings available or its scalped etc. Theres one way to get ahold of ray tracing capable hw though..... laptops if your into that kind of stuff (im not). Its possible to get a 3070 laptop for under 1500 dollars, still alot and too much money but atleast its in stock lol. With a 115w (boost to 130w) 3070m gpu, you'd be looking at 3060Ti or RTX2080 dgpu performance. Not bad in special considering the 1080p/1440p resolutions these tend to run.

Edit: Ray tracing indeed matters more in SP games yes..... BFV had it though, but then the question comes if you actually had/have an advantage against people not running DXR enabled? Nowadays altering normal settings wont give you any advantage at all, which was the case 15 years ago with BF2, where having low setting for shadows ment you had a visibility advantage.
 
this one seem to gain a huge performance boost on Ampere - but I haven't seen any benchmarks of it on recent GPUs, going only from my own memory of how the game ran on Pascal/Turing.
Yeah, on a 6900XT it is in the ~50s fps at native 4K, drops to ~40s fps during combat, and this is a five years old game.


On a 3090 it is considerably better, sticks to ~
60fps during combat.


Still, the point being this game represents something like the pinnacle of rasterized lighting the industry can offer, and it runs badly on the monstrous GPUs of today, compare that to something like Metro Exodus which is doing RT GI + reflections and the difference is clear.
 
I haven't seen a math bound frame in any game that I've profiled. If you're lucky you'll have one or two workloads during the entire frame that keep the ALUs 50% busy.
I think this is a problematic statement because it's outside of gaming where Ampere ALUs really get a workout. The same games on RNDA 2 (ignoring any ray tracing scenario) would be quite different I imagine. Similarly the same games on Turing and Pascal should be showing more.

In the profiling tool you use, isn't there a metric for "shader pipe" utilisation? One level up in the hierarchy from ALU utilisation?
 
I think this is a problematic statement because it's outside of gaming where Ampere ALUs really get a workout. The same games on RNDA 2 (ignoring any ray tracing scenario) would be quite different I imagine. Similarly the same games on Turing and Pascal should be showing more.

In the profiling tool you use, isn't there a metric for "shader pipe" utilisation? One level up in the hierarchy from ALU utilisation?

It’s Nvidia’s Nsight profiler and it breaks down FP1/FP2/INT usage. It would be interesting to see a breakdown of a RDNA 2 frame.
 
Honestly, given the semiconductor situation, everything planning for 2022 might as well slide to 2023...
 
I wonder how, games steadily become more and more graphics heavy and gameplay-lite (especially AAA titles on consoles)
I already explained how. Besides, not all games are AAA titles on consoles, and they are not even the most popular or profitable.

Yeah, that's why you have to use denoiser and other tricks to hide the fact that there's a paltry amount of rays in each image.
That's not half that bad as you're trying to imply.
Ray-tracing introducing noise is a wrong assumption. Discretization, monte carlo integration and stochastic sampling cause noise, this stuff is required for physically based reflections/shadows/etc with both ray-tracing or rasterisation.
Get rid of these concepts and treat all materials as perfect mirror or perfect diffuse and you're done with noise, but that's a derp solution.
As for denoisers, they blur rough surfaces where noise happens, but that's not that critical because integrating 1000s of samples from all possible directions would still provide very blurry reflection on rough surfaces (because reflections on rough surfaces must be blurry, that's exactly what we are doing mathematically by integrating and averaging many samples)

you basically say that we have to pay extra so that poor devs will have to do less routine work. A noble endeavour, but I'll pass, thanks.
Another wrong assumption, guess who will pay one way or another for ever increasing development costs?
Business will cover expenses with micro transactions, microservices, loot boxes, DLCs, you name it.

You're probably using a different definition of "burst" than I am. Aside from RT there is rarely any single workload that occupies the GPU for more than 10% of total frame time. And during that time utilization is almost always poor.
I would cosider geometry draw calls a burst workload, these are usually way below 1 ms, cache flushes and state changes can be required in-between draw calls and other overheads are possible, these are usually small, burst and with low utilization (that's why async compute is usually overlapped with them)

I haven't seen a math bound frame in any game that I've profiled. If you're lucky you'll have one or two workloads during the entire frame that keep the ALUs 50% busy.
By math bound I simply mean that frame performance is limited by any computations on GPU die, doesn't matter whether it's fixed functions blocks or SIMDs, obviously, 100% SIMD ALU utilization is a very rare case.
Following the roofline model, what I can say for sure is that most of frames are never bound by vram bandwidth.
There are plenty of articles on memory frequencies performance impact, performance never scales linerly with memory frequencies for obvious reasons (the roofline model).
Also, regression models usually converge to low coefficients for the bandwidth metric, way lower in comparison with other metrics (especially when you combine all GPU metrics together), which shows that games are rarely bandwidth bound in general.
 
About these leakers' reliability, IDK.
But, starting a new architecture with the lower end card would be quite strange to me.
Assuming the previous RDNA3 leaks were true, the highend uses an expensive packaging tech. This makes it prone to delays unlike the lowend RDNA3 which uses the classic single die approach.

Active bridge chip, TSMC's 3D IC stacking, TSMC 5nm node, new microarch debugging, difficulties in TDP management, etc. - any of those can make it slip.
 
Assuming the previous RDNA3 leaks were true, the highend uses an expensive packaging tech. This makes it prone to delays unlike the lowend RDNA3 which uses the classic single die approach.

Active bridge chip, TSMC's 3D IC stacking, TSMC 5nm node, new microarch debugging, difficulties in TDP management, etc. - any of those can make it slip.

This I know but generally is also true that the bigger dies are the ones usually coming earlier, and bigger dies are more difficult to manufacture, too. SO the reason they come first is the halo effect for the marketing. In this case we would have a cheaper RX6900XT, a good feat or a "midrange" card but without the halo.
 
Status
Not open for further replies.
Back
Top