AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

AFAIK, AMD never used TSMC 10nm in any shipping/shipped consumer products. They went straight from GloFo 14/12nm to TSMC 7nm.
TSMC's 7nm is close to Intel's modified 10nm. They are practically the same.

Yeah, this is DX10 and G80 vs R600 again. The hardware, even Ampere, just isn’t ready to say goodbye to rasterisation.
Nobody wants to say goodbye to Rasterization yet, we are in the era of hybrid rendering.
 
For 6xxx AMD focused on rasterization, next they will focus on RT performance
To focus on RT perf just now AMD should better expose thier intersection instructions directly, so we can bypass DXR.
I assume the resulting flexibility makes AMDs approach better suited to the upcoming 'say goodbye to low poly models' unlimited detail mumbo jumbo.

I would not be happy to see AMD adopting the idea to implement such involved algorithms and data structures completely in hardware. The developer can no longer improve it, nor can he adjust it to his needs.
IMO It's better to offer helpful sub functionality which adds no constraints. And as we see, intersection instructions alone already compete.
If AMD lags only half a generation behind in DXR benchmarks, but could offer full flexibility, it is maybe the other side which might end up updating their RT. >:)
 
Quake 2 RTX tests, the RTX 3080 is 190% to 200% faster than 6800XT:

CkE8Kw3pzex6T3ZyDTCyBU-970-80.png.webp


https://www.pcgamer.com/amd-rx-6800-xt-vulkan-ray-tracing/
What we now need are Intel's Quake II numbers to get a complete picture of the RT performance landscape.
 
To focus on RT perf just now AMD should better expose thier intersection instructions directly, so we can bypass DXR.
I assume the resulting flexibility makes AMDs approach better suited to the upcoming 'say goodbye to low poly models' unlimited detail mumbo jumbo.

I would not be happy to see AMD adopting the idea to implement such involved algorithms and data structures completely in hardware. The developer can no longer improve it, nor can he adjust it to his needs.
IMO It's better to offer helpful sub functionality which adds no constraints. And as we see, intersection instructions alone already compete.
If AMD lags only half a generation behind in DXR benchmarks, but could offer full flexibility, it is maybe the other side which might end up updating their RT. >:)

What exactly would devs do differently if they had direct access to the intersection shader? Isn’t that the same as writing a compute shader that takes a node pointer as input?

Or in other words what’s stopping devs from implementing their own raytracing pipeline using compute today?
 
Are those numbers averages for the entire frame? I don’t think using those numbers will give an accurate ALU:FP ratio. There are tons of factors that affect instruction throughput during the frame. It’s better to look at instantaneous throughputs at specific points.

Yep for the whole frame. Ok, I'd like to upload the file but it's bigger than what forum rules.

Random points from key sections:
 
Last edited:
Or in other words what’s stopping devs from implementing their own raytracing pipeline using compute today?
Two things:
* Intersection instructuions can be used only indirectly using DXR.
* If there is a hardware traversal unit, it's likely faster and wee want to use it.
What exactly would devs do differently if they had direct access to the intersection shader?
Me: Everything. I would ignore DXR, reuse my BVH that i already have if possible, and have adaptive BVH for LOD.
Industry: No idea if they think it's worth it, or if DXR is the easy and future proof way to go.
 
Two things:
* Intersection instructuions can be used only indirectly using DXR.
* If there is a hardware traversal unit, it's likely faster and wee want to use it.

Me: Everything. I would ignore DXR, reuse my BVH that i already have if possible, and have adaptive BVH for LOD.
Industry: No idea if they think it's worth it, or if DXR is the easy and future proof way to go.

PC industry probably cares about cost of supporting all different hw. Consoles as fixed platform on the other hand,...
 
Two things:
* Intersection instructuions can be used only indirectly using DXR.
* If there is a hardware traversal unit, it's likely faster and wee want to use it.

So software intersection and hardware traversal? The opposite of what AMD currently does?

Me: Everything. I would ignore DXR, reuse my BVH that i already have if possible, and have adaptive BVH for LOD.
Industry: No idea if they think it's worth it, or if DXR is the easy and future proof way to go.

And your thinking is that adaptive BVH in compute will be faster than hardware intersection?
 
Just tested Quake II RTX on my overclocked RX6800XT and I'm happy with how it performs as I had very low expectations to start with.
Good news is, it works and it performs better than RTX2060 I had a chance to play on previously.

Full details with Upscaling AA:
1920x1080 = 79FPS on Demo1 map
2560x1440 = 48FPS on Demo1 map

I took few photos as screen grab was turning black screens form fullscreen mode.
20201217-001533.jpg
20201217-001625.jpg
20201217-001616.jpg
20201217-001654.jpg
20201217-001705.jpg


Top pictures are QHD and the last two are FHD ;)
 
Just tested Quake II RTX on my overclocked RX6800XT and I'm happy with how it performs as I had very low expectations to start with.
Good news is, it works and it performs better than RTX2060 I had a chance to play on previously.

1.4 version is quite a bit better performer than old versions. 2060 likely gains 15% or so more perf on this version if it's similar boost as other cards are seeing.
 
Would be good to see a proper Ampere vs Turing vs RDNA2 comparison. Still unclear how RDNA2 compares to Turing in RT specifically. Everything so far suggests 6800/XT is roughly similar or slightly worse than the standard 2080, but need actual data.
 
Would be good to see a proper Ampere vs Turing vs RDNA2 comparison. Still unclear how RDNA2 compares to Turing in RT specifically. Everything so far suggests 6800/XT is roughly similar or slightly worse than the standard 2080, but need actual data.

We have this benchmark from pcgamer. For what it's worth my 3070fe gets 54fps. Overclocked to max 57fps at 1440p resolution. It's power draw limited on 3070fe. I get 150-200MHz lower gpu clock on q2 rtx versus cp2077.

upload_2020-12-16_16-54-2.png

https://www.pcgamer.com/amd-rx-6800-xt-vulkan-ray-tracing/
 
1.4 version is quite a bit better performer than old versions. 2060 likely gains 15% or so more perf on this version if it's similar boost as other cards are seeing.

That's true, I can revisit this game on mobile RTX2060 to compare, but the difference is quite big, so I don't think it will change anything in what I said. I would be interested in seeing results from this new version ran on 2080s, 2080Ti and 3060Ti for comparison.
What is important to me now is that I can relive Quake II with RT rendered at QHD resolution, as playing with 40-60FPS range brings back the feelings of playing on K6-2 and Riva TNT but at a much lower resolution of course ;)
 
That's true, I can revisit this game on mobile RTX2060 to compare, but the difference is quite big, so I don't think it will change anything in what I said. I would be interested in seeing results from this new version ran on 2080s, 2080Ti and 3060Ti for comparison.
What is important to me now is that I can relive Quake II with RT rendered at QHD resolution, as playing with 40-60FPS range brings back the feelings of playing on K6-2 and Riva TNT but at a much lower resolution of course ;)

3060ti is likely 10-15% slower than my 3070fe. Putting it somewhere around 48fps @1440p.
 
Me: Everything. I would ignore DXR, reuse my BVH that i already have if possible, and have adaptive BVH for LOD.
Industry: No idea if they think it's worth it, or if DXR is the easy and future proof way to go.

This is one of the reasons I'm suggesting ditching hw tracing. You can do what you want in compute, forget the api restrictions.

The other is that the baseline requirements aren't Nvidia cards, they aren't even the PS5/SX. It's the Series S that all high end titles must include. Core features have to run on there, the console where Watchdogs Legion looks like a PS2 game thanks to how low res the raytracing is. Call of Duty doesn't even enable raytracing.

That's why hw raytracing is potentially too costly even for devs that think it's a good idea. It's why I can only see it as an "extra" effect option. But since tracing makes some things much easier on production, it's why it makes sense to me for devs to find faster ways to do tracing. Fast enough that it runs on Series S at 900p or whatever, even if it takes extra programmer time there's 20+ times more artists, and so will save time overall.
 
We have this benchmark from pcgamer. For what it's worth my 3070fe gets 54fps. Overclocked to max 57fps at 1440p resolution. It's power draw limited on 3070fe. I get 150-200MHz lower gpu clock on q2 rtx versus cp2077.

View attachment 5122

https://www.pcgamer.com/amd-rx-6800-xt-vulkan-ray-tracing/

Yes but there is no Turing data. Would be nice to see where RDNA2 lands exactly.

Interesting that Q2 pulls more power than CP2077 on Ampere. I would have thought a more mixed workload like CP would saturate the hardware better, with an mostly RT workload leaving some SMs idle (waiting on returns from RT cores or just general memory contention).
 
Yes but there is no Turing data. Would be nice to see where RDNA2 lands exactly.

Interesting that Q2 pulls more power than CP2077 on Ampere. I would have thought a more mixed workload like CP would saturate the hardware better, with an mostly RT workload leaving some SMs idle (waiting on returns from RT cores or just general memory contention).

Interesting ...
On RDNA2 it pulls less power than some raster games in heavy scenes, as normally my card would clock to around 24xx-25xx MHz range while drawing almost 300W in games like Doom Ethernal or 3DMark TimeSpy. Here in Q2RT, card has enough power headroom to hit 2650MHz+ all the time while drawing around 250W-280W with average closer to 260W.
 
Interesting ...
On RDNA2 it pulls less power than some raster games in heavy scenes, as normally my card would clock to around 24xx-25xx MHz range while drawing almost 300W in games like Doom Ethernal or 3DMark TimeSpy. Here in Q2RT, card has enough power headroom to hit 2650MHz+ all the time while drawing around 250W-280W with average closer to 260W.
Shading units are idling on branching code? Or maybe it's memory bandwidth and/or latency limited in path tracing?
 
Shading units are idling on branching code? Or maybe it's memory bandwidth and/or latency limited in path tracing?

It would be nice to see this broken down in proper profiler, but I lack knowledge and time to do it quickly. Memory bandwidth sounds plausible, but cache should be helping here. In raster games, surprisingly, light games stressing mostly ROP's with little shader code show similar behaviour, where clocks can skyrocket on GPU with power still being in check. At least on old GCN cards (290X) stressing ROP's was the most power intensive task I could run bar from Furmark.
 
Back
Top