AMD: RDNA 3 Speculation, Rumours and Discussion

Jawed · Oct 19, 2022

So, maybe NVidia's unlaunching happened because RDNA 3 arrived at AIBs...

Bondrewd · Oct 19, 2022

Jawed said:
So, maybe NVidia's unlaunching happened because RDNA 3 arrived at AIBs...

No, it's all brand saving exercise.
4080 12GB caused quite a stir.
The bad kind of

fehu · Oct 19, 2022

So 2X the RT performance and still about 2X less than the competition?

Leoneazzurro5 · Oct 19, 2022

fehu said:
So 2X the RT performance and still about 2X less than the competition?

It's >2x, so it may be anything from 2,1 upwards. In reality, I expect them to be still slower than the competition, but less than with RDNA2.

TESKATLIPOKA · Oct 19, 2022

fehu said:
So 2X the RT performance and still about 2X less than the competition?

If It has 2x better raster performance, but only 2x better RT performance then that would mean RT performance is the same as was with RDNA2 and no improvement was made there.

pTmdfx · Oct 19, 2022

There was a new LDS ds_bvh_stack_rtn instruction added to LLVM. How it would fit into the RT traversal kernel is unclear to me, but if I were to venture a guess based on the patch, it probably:

* hosts the BVH traversal stack on the LDS
* feeds (some of) the ray data to the TCP directly from the LDS-hosted stack
* writes the results from the TMU-hosted ray intersection unit to the LDS-hosted stack directly
* returns the result to the shader/VGPR as an indirect reference to the LDS-hosted stack

... given that it "acceses LDS in a complicated way".

The minimum expectation is that this reduces VGPR pressure (8 VGPRs versus 12-16 VGPRs). The utopia expectation is that the LDS gets a ray traversal engine, and so potentially one can now offload RT traversal and co-execute other work now (i.e., simply waiting on lgkm_cnt(0) when you run out of work to co-execute, like your normal LDS accesses).

Rootax · Oct 19, 2022

Maybe I missed the boat but why don't we expect more offloading this génération like nvidia or Intel, or even more ? Did they kind of announced somewhere that it would be like rdna2 ?

Bondrewd · Oct 19, 2022

Rootax said:
Maybe I missed the boat but why don't we expect more offloading this génération like nvidia or Intel, or even more ?

No.

Rootax said:
Did they kind of announced somewhere that it would be like rdna2 ?

They're yet to say anything besides "it gonna have class-leading perf/W".

TESKATLIPOKA · Oct 20, 2022

Bondrewd said:
They're yet to say anything besides "it gonna have class-leading perf/W".

Even compared to Ada? Does It include RT perf or not?

Bondrewd · Oct 20, 2022

TESKATLIPOKA said:
Even compared to Ada?

That's piss easy.

TESKATLIPOKA said:
Does It include RT perf or not?

Idk ask their marketing team, they be busy shilling real_frames™ now.

xpea · Oct 20, 2022

Bondrewd said:
That's piss easy.

50% better efficiency than RDNA2 is far from enough to beat Ada in RT, compute and rendering. They can only win in old school pure raster workloads that are now "useless" bc you reach 150+ fps at 4k. But I would love to be wrong for the sake of competition

Bondrewd · Oct 20, 2022

xpea said:
50% better efficiency than RDNA2 is far from enough to beat Ada in RT, compute and rendering

I assume you've profiled N31 over a number of different workloads already?
Also not like anyone's gonna use those for GEMM brrr (it has no meme cores) or rendering (that's not a real market).

xpea said:
They can only win in old school pure raster workloads

Yea shit that matters, especially down the stack where PPA race gets kinda cuhrazee.

xpea said:
hey can only win in old school pure raster workloads that are now "useless" bc you reach 150+ fps at 4k

Good news, 4k@240 monitors are on the horizon!

xpea said:
But I would love to be wrong for the sake of competition

rofl

TESKATLIPOKA · Oct 20, 2022

xpea said:
50% better efficiency than RDNA2 is far from enough to beat Ada in RT, compute and rendering. They can only win in old school pure raster workloads that are now "useless" bc you reach 150+ fps at 4k. But I would love to be wrong for the sake of competition

Don't you know what ">50%" means? Greater than 50%.
Zen4 should have been only "15%" faster than Zen3 according to some people, because they ignored the greater than sign before It.
Of course, we don't know the exact %, It can be 60% or even 90%, who knows.

BTW, I seriously don't care how It performs in compute or rendering. I want It for games, and for that only raster and RT is important.

Frenetic Pony · Oct 20, 2022

pTmdfx said:
There was a new LDS ds_bvh_stack_rtn instruction added to LLVM. How it would fit into the RT traversal kernel is unclear to me, but if I were to venture a guess based on the patch, it probably:

* hosts the BVH traversal stack on the LDS
* feeds (some of) the ray data to the TCP directly from the LDS-hosted stack
* writes the results from the TMU-hosted ray intersection unit to the LDS-hosted stack directly
* returns the result to the shader/VGPR as an indirect reference to the LDS-hosted stack

... given that it "acceses LDS in a complicated way".

The minimum expectation is that this reduces VGPR pressure (8 VGPRs versus 12-16 VGPRs). The utopia expectation is that the LDS gets a ray traversal engine, and so potentially one can now offload RT traversal and co-execute other work now (i.e., simply waiting on lgkm_cnt(0) when you run out of work to co-execute, like your normal LDS accesses).

Co-execute some workloads, which might put it above thermals/power anyway. "Abuse LDS as much as possible" has been a major theme, so it being occupied is going to narrow the scope of workloads available. Of course "abuse memory heirarchy as much as possible" is a general theme everywhere right now. Maybe when one of those SRAM replacements gets commercialized and/or HBM somehow becomes cheap it'll lessen, but until then.

Side note: I wonder if MWII will be an AMD win. A 4090 reportedly "only" gets 100 fps at 4k max settings on the final build. 6950xt and a 3090ti looked pretty equal in the beta, meaning according to "leaks" the hypothetical AMD X9XX might be the top performer there. Prepare for insufferable AMD PRing if so, November 3rd stream will have like 1 benchmark and it'll be shown 100x.

Leoneazzurro5 · Oct 21, 2022

https://twitter.com/x/status/1583485036179095556

That would be impressive for a mobile part, unimpressive if it's a sizeable chunk of an N32.
It depends on how much it would be cut down on shaders, bus and clocks.
If it's a 256 bit part, it would be strange to put it against a possible NV competition with a 192 bit bus.
Another point of the equation is the current notebook designs topping at around 150W.

Edit: if it was a N33, it would be amazing. But, Greymon said not a N33.
The strange thing here is that for 1080p (most used resolution on laptops) N33 is already supposed to almost hit those performance levels with similar power envelope.
It would be strange to have two mobile solutions so close in terms of performance but very different in terms of costs.
Something does not add up.

TESKATLIPOKA · Oct 21, 2022

He also said this.

https://twitter.com/x/status/1582599964567314432

Not sure which model he meant, most likely N31.

Krteq · Oct 21, 2022

Leoneazzurro5 said:
https://twitter.com/x/status/1583485036179095556

That would be impressive for a mobile part, unimpressive if it's a sizeable chunk of an N32.
It depends on how much it would be cut down on shaders, bus and clocks.
If it's a 256 bit part, it would be strange to put it against a possible NV competition with a 192 bit bus.
Another point of the equation is the current notebook designs topping at around 150W.

Edit: if it was a N33, it would be amazing. But, Greymon said not a N33.
The strange thing here is that for 1080p (most used resolution on laptops) N33 is already supposed to almost hit those performance levels with similar power envelope.
It would be strange to have two mobile solutions so close in terms of performance but very different in terms of costs.
Something does not add up.

What's a max. TBP for mobile GPUs these days? 150W?

TESKATLIPOKA · Oct 21, 2022

Krteq said:
What's a max. TBP for mobile GPUs these days? 150W?

GPU Power: 145+ W
AMD

Bondrewd · Oct 21, 2022

Krteq said:
150W?

150+-15W yea.

Leoneazzurro5 said:
If it's a 256 bit part, it would be strange to put it against a possible NV competition with a 192 bit bus.

Dawg 6800M versus 3080 Mobile is literally that right now just in the opposite direction.
Also NV now caps their mobile lineup with Gx103 parts.
GA103 exists, remember?

Leoneazzurro5 said:
That would be impressive for a mobile part, unimpressive if it's a sizeable chunk of an N32.

Yea it's mobile N32.

PSman1700 · Oct 21, 2022

Krteq said:
What's a max. TBP for mobile GPUs these days? 150W?

Over 175w depending on manufacturer.

AMD: RDNA 3 Speculation, Rumours and Discussion

Similar threads