AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
There was a new LDS ds_bvh_stack_rtn instruction added to LLVM. How it would fit into the RT traversal kernel is unclear to me, but if I were to venture a guess based on the patch, it probably:

* hosts the BVH traversal stack on the LDS
* feeds (some of) the ray data to the TCP directly from the LDS-hosted stack
* writes the results from the TMU-hosted ray intersection unit to the LDS-hosted stack directly
* returns the result to the shader/VGPR as an indirect reference to the LDS-hosted stack

... given that it "acceses LDS in a complicated way".

The minimum expectation is that this reduces VGPR pressure (8 VGPRs versus 12-16 VGPRs). The utopia expectation is that the LDS gets a ray traversal engine, and so potentially one can now offload RT traversal and co-execute other work now (i.e., simply waiting on lgkm_cnt(0) when you run out of work to co-execute, like your normal LDS accesses).
 
Last edited:
Maybe I missed the boat but why don't we expect more offloading this génération like nvidia or Intel, or even more ? Did they kind of announced somewhere that it would be like rdna2 ?
 
That's piss easy.
50% better efficiency than RDNA2 is far from enough to beat Ada in RT, compute and rendering. They can only win in old school pure raster workloads that are now "useless" bc you reach 150+ fps at 4k. But I would love to be wrong for the sake of competition
 
50% better efficiency than RDNA2 is far from enough to beat Ada in RT, compute and rendering
I assume you've profiled N31 over a number of different workloads already?
Also not like anyone's gonna use those for GEMM brrr (it has no meme cores) or rendering (that's not a real market).
They can only win in old school pure raster workloads
Yea shit that matters, especially down the stack where PPA race gets kinda cuhrazee.
hey can only win in old school pure raster workloads that are now "useless" bc you reach 150+ fps at 4k
Good news, 4k@240 monitors are on the horizon!
But I would love to be wrong for the sake of competition
rofl
 
50% better efficiency than RDNA2 is far from enough to beat Ada in RT, compute and rendering. They can only win in old school pure raster workloads that are now "useless" bc you reach 150+ fps at 4k. But I would love to be wrong for the sake of competition
Don't you know what ">50%" means? Greater than 50%.
Zen4 should have been only "15%" faster than Zen3 according to some people, because they ignored the greater than sign before It.
Of course, we don't know the exact %, It can be 60% or even 90%, who knows.

BTW, I seriously don't care how It performs in compute or rendering. I want It for games, and for that only raster and RT is important.
 
Last edited:
There was a new LDS ds_bvh_stack_rtn instruction added to LLVM. How it would fit into the RT traversal kernel is unclear to me, but if I were to venture a guess based on the patch, it probably:

* hosts the BVH traversal stack on the LDS
* feeds (some of) the ray data to the TCP directly from the LDS-hosted stack
* writes the results from the TMU-hosted ray intersection unit to the LDS-hosted stack directly
* returns the result to the shader/VGPR as an indirect reference to the LDS-hosted stack

... given that it "acceses LDS in a complicated way".

The minimum expectation is that this reduces VGPR pressure (8 VGPRs versus 12-16 VGPRs). The utopia expectation is that the LDS gets a ray traversal engine, and so potentially one can now offload RT traversal and co-execute other work now (i.e., simply waiting on lgkm_cnt(0) when you run out of work to co-execute, like your normal LDS accesses).

Co-execute some workloads, which might put it above thermals/power anyway. "Abuse LDS as much as possible" has been a major theme, so it being occupied is going to narrow the scope of workloads available. Of course "abuse memory heirarchy as much as possible" is a general theme everywhere right now. Maybe when one of those SRAM replacements gets commercialized and/or HBM somehow becomes cheap it'll lessen, but until then.

Side note: I wonder if MWII will be an AMD win. A 4090 reportedly "only" gets 100 fps at 4k max settings on the final build. 6950xt and a 3090ti looked pretty equal in the beta, meaning according to "leaks" the hypothetical AMD X9XX might be the top performer there. Prepare for insufferable AMD PRing if so, November 3rd stream will have like 1 benchmark and it'll be shown 100x.
 

That would be impressive for a mobile part, unimpressive if it's a sizeable chunk of an N32.
It depends on how much it would be cut down on shaders, bus and clocks.
If it's a 256 bit part, it would be strange to put it against a possible NV competition with a 192 bit bus.
Another point of the equation is the current notebook designs topping at around 150W.

Edit: if it was a N33, it would be amazing. But, Greymon said not a N33.
The strange thing here is that for 1080p (most used resolution on laptops) N33 is already supposed to almost hit those performance levels with similar power envelope.
It would be strange to have two mobile solutions so close in terms of performance but very different in terms of costs.
Something does not add up.
 
Last edited:

That would be impressive for a mobile part, unimpressive if it's a sizeable chunk of an N32.
It depends on how much it would be cut down on shaders, bus and clocks.
If it's a 256 bit part, it would be strange to put it against a possible NV competition with a 192 bit bus.
Another point of the equation is the current notebook designs topping at around 150W.

Edit: if it was a N33, it would be amazing. But, Greymon said not a N33.
The strange thing here is that for 1080p (most used resolution on laptops) N33 is already supposed to almost hit those performance levels with similar power envelope.
It would be strange to have two mobile solutions so close in terms of performance but very different in terms of costs.
Something does not add up.
What's a max. TBP for mobile GPUs these days? 150W?
 
150+-15W yea.
If it's a 256 bit part, it would be strange to put it against a possible NV competition with a 192 bit bus.
Dawg 6800M versus 3080 Mobile is literally that right now just in the opposite direction.
Also NV now caps their mobile lineup with Gx103 parts.
GA103 exists, remember?
That would be impressive for a mobile part, unimpressive if it's a sizeable chunk of an N32.
Yea it's mobile N32.
 
Status
Not open for further replies.
Back
Top