AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
You mean higher res gfx needs more CPU power up to the point the CPU throttles?
Which gfx CPU workloads depend on resolution? Animation stays the same, culling (if still on CPU) should have no big effect either?
Games will be mostly CPU limited in anything lower than 4K. But that's hardly new, 1024x768 was "high resolution" some time ago.
 
Raytracing is easily scalable, PCs won't have any issues in applying all the performance they have. And once AMD will get their RT up to speed they'll begin to promote it's heavier usage in PC versions of multiplatform titles too. So I wouldn't worry about the overabundance of GPU power really.

NV is ahead in the ray tracing game, it has nothing to do with conspiracy or them 'fiddling with developer input'. Their hardware is better specced, not just in ray tracing but reconstruction and compute aswell.
 
Have they the ressources to pull that out ? I imagine it takes time, tests, etc...
I can imagine AMD has variations on traversal. E.g. a shadow ray needs no front to back order of visiting childrens, then there are variants with short stack, or stackless (would need extra pointer in BVH), etc.
So they might at least test big AAA games and select the fastest variant.
Really not sure if additional patching in ISA from custom shaders made for console would fit into such worklfow. Likely not. May depend on other low level console extensions. (Edit: or just overall differences to console implementation so custom traversal alone would break)

Games will be mostly CPU limited in anything lower than 4K. But that's hardly new, 1024x768 was "high resolution" some time ago.
Yes, but i think he meant it the other way around: 4K is too high, so CPU can't deliver, and thus monster GPU is pointless. Which does not make much sense to me.
 
CP2077 runs at less than 30 fps @ 4K with all bells and whistles on (including all the RT gimmicks), so I don't see why anyone would be against getting a GPU that will allow you to play with smooth 120 fps without an IQ degrading crutch
But do those 30 fps come from CPU limit? Why? CPU involved in building BVH?
 
I for sure could make use of some 2x-3x graphics performance. There's also DRS/VRS for folks who don't want to buy into larger, higher res displays to easily burn GPU cycles and get better images in return.
 
On last gen games sure. Current gen console games targeting upscaled 4K at 30fps should take advantage of the extra horsepower on PCs shooting for native 4K at 120fps+.

Yeah I certainly have no concerns about using up that much performance. Cyberpunk at 4K with Ultra RT gets you 11fps on a 6900XT for example. So even on this new monster it would still be unplayable at that resolution without some upscaling in play.

This is an extreme case but there will be other similar situations where RT is involved. And then as you say, 120fps at 4K (even upscaled) requires much more than what todays top end GPU's can output in many titles.

2160p-ultra.png


https://www.kitguru.net/gaming/dominic-moass/cyberpunk-2077-ray-tracing-on-amd-gpus-benchmarked/
 
I sure as hell wouldnt say the PS5 is 'too powerfull', if i want to play rift apart at the performance rt mode, my resolution gets dropped to anywhere between 1080p and 1440p, along with reduced fidelity. And thats for a game that really looks amazing, up there with the best, but it aint that large of a leap coming from the best PS4 had to offer in the end. The game's lacking a dynamic GI, water looks.... ye, and theres one ray tracing effect.

Cyberpunk at 4K with Ultra RT gets you 11fps on a 6900XT for example. So even on this new monster it would still be unplayable at that resolution without some upscaling in play.

Thats if we assume AMD manages to go another GPU generation without dedicated hardware support for ray tracing, which i cant imagine they will.
 
I sure as hell wouldnt say the PS5 is 'too powerfull', if i want to play rift apart at the performance rt mode, my resolution gets dropped to anywhere between 1080p and 1440p, along with reduced fidelity. And thats for a game that really looks amazing, up there with the best, but it aint that large of a leap coming from the best PS4 had to offer in the end. The game's lacking a dynamic GI, water looks.... ye, and theres one ray tracing effect.



Thats if we assume AMD manages to go another GPU generation without dedicated hardware support for ray tracing, which i cant imagine they will.

Yes fingers crossed that if the 2.7x raster performance is real, it translates into something bigger for RT.
 
Ok, I got you. Why limit the conversation to just pixel shaders though? AMD's description of 64-item "wavefronts" appears to apply to compute workloads as well.
From AMD PowerPoint- White Template (gpuopen.com) page 18:

Compiler makes the decision
  • Compute and vertex shaders usually as Wave32, pixel shaders usually as Wave64
  • Heuristics will continue to be tuned for the foreseeable future
Implies there's at the very least a strong bias towards wave64 being solely for pixel shading.

So the gotcha (for my argument that the hardware really only has 32 work item hardware threads) is the idea that compute shaders can be issued as 64 work item hardware threads. And, indeed, that 128 work item workgroups could be issued as two 64 work item hardware threads, instead of four 32 work item hardware threads.

I can't think of a time when "Allows higher occupancy (# threads per lane)" would apply to compute and improve performance. This is the only stated benefit (in the list of two benefits) that applies to compute under the Wave64 column of the table.

In compute, work items sharing a SIMD lane is not part of the programming model. The closest you can get is with chip-specific data parallel processing (DPP) instructions that work on sub-sets of 8 or 16 work items. And that won't share data across the boundary between work items 0:31 and 32:63.

So the only scenario where work items sharing a lane is effectively exposed is pixel shading attribute interpolation. Maybe someone can think of something else?

So how would compute get higher occupancy with Wave64 versus Wave32 and be faster (i.e. worth doing)? Is there a mix of workgroup size combined with VGPR and LDS allocations that does this?

Locality doesn't seem to be a major motivating factor to keep it on the same SIMD as you get much of the same benefits as long as you're on the same CU. AMDs whitepaper only has this to say on the matter: "While the RDNA architecture is optimized for wave32, the existing wave64 mode can be more effective for some applications.". They don't mention which applications benefit from wave64.
Yes, that's why AMD PowerPoint- White Template (gpuopen.com) seems to be more precise (yet remains vague, citing "heuristics"). In truth in PC gaming we can never really exclude heuristics, because drivers create too much distance to the metal.

Locality is explicitly relevant for attribute interpolation (pixels that share a triangle can share parts of the LDS data). And locality affects texture filtering. So both of these relate specifically to pixel shading.

Is it important that it's one hardware thread vs two? I guess my only point is that from a software perspective it doesn't matter.
Developers probably can't access the wave32/64 decision, so it "doesn't matter to them". Well, you could argue that the more dedicated might complain to AMD that the driver stinks for their game and AMD makes a decision in the driver for them.

In trying to understand the hardware, and why "CUs" are still a part of RDNA, wrapped inside a WGP, the count of hardware thread sizes might be informative.

CUs might be present simply to soften the complexity of getting RDNA working. Drivers were very troublesome for quite a while after 5700XT released, despite the helping hand of the CU mode. And perhaps would have been worse if there was only a single TMU per WGP, with LDS being a single array, not the two that we currently have.

CU mode combined with wave64 mode looks intentional as the softest complexity: the backstop when the driver team is struggling to adapt to a new architecture. G-buffer fill might be the perfect use-case, but that's export bandwidth bound these days, isn't it?

If RDNA 3 has no CUs does it still need wave64 mode? Is wave64 mode more important in that case?
 
I sure as hell wouldnt say the PS5 is 'too powerfull', if i want to play rift apart at the performance rt mode, my resolution gets dropped to anywhere between 1080p and 1440p, along with reduced fidelity. And thats for a game that really looks amazing, up there with the best, but it aint that large of a leap coming from the best PS4 had to offer in the end. The game's lacking a dynamic GI, water looks.... ye, and theres one ray tracing effect.



Thats if we assume AMD manages to go another GPU generation without dedicated hardware support for ray tracing, which i cant imagine they will.

They already have hardware RT... I guess they will try to accelerate more "stages" in the futur for sure, but saying they don't have hardware RT is wrong.
 
Yes fingers crossed that if the 2.7x raster performance is real, it translates into something bigger for RT.

It would be quite pointless to invest all monster power into just compensating RT shortcomings.
So that remains the price question. Otherwise so far it looks like a killer arch.

Ye i dunno what their planning, just increasing the compute by say 3 times, thats impressive, but if the competition's doing that aswell + dedicated hardware rt, it would be the same situation as today, which is kinda boring i think.

They already have hardware RT... I guess they will try to accelerate more "stages" in the futur for sure, but saying they don't have hardware RT is wrong.

Fully agree, thats why i wrote dedicated.
 
Ye i dunno what their planning, just increasing the compute by say 3 times, thats impressive, but if the competition's doing that aswell + dedicated hardware rt, it would be the same situation as today, which is kinda boring i think.



Fully agree, thats why i wrote dedicated.

You mean put the Ray Accelerators outside the CU ? My take is they can beef the RA, and still having them in the CUs. It's still dedicated hardware.
 
I dont care how they do it, aslong as they drive up competition which hopefully both drives prices down aswell as more pressure in the market to innovate and generally even larger leaps in performance.
Maybe that happens when Apple joins the game dunno.
 
Status
Not open for further replies.
Back
Top