AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

j^aws · Nov 22, 2020

Digidi said:
So how do you improve rasterizsation with 2 kindes of rasterizer when both do the same?

I see only advantage for big bpolygons over 4x4 pixel size.

Edit: Also it is strange that each array have it's own rasterizer (Scan converter) if you look into linux driver:
num_se: 4 You have 4 sahder engins
num_sh_per_se: 2 you have 2 Shaderarrys for each shaderengine
num_sc_per_sh: 1 and each shaderarry have 1 scan converter for its own?:
So for scan converter you get: num_se x num_sh_per_se x num_sc_per_sh = 4x2x1 = 8

But if i follow that we have 2 rasterizer for a shaderengine it should look like:
num_se: 4
num_sc_per_se: 2

I asked a few times earlier in the thread, but I didn't get clarification. Recall that there are 4 Packers per Scan Converter, so Navi21 has 32 Packers. And 8 Packers per Raster Unit (2 Scan Converters). Each Packer, up to 4 Packers, is being dispatched to each Shader Array with optimised fragments, arranged as 1x2, 2x1 or 2x2 fragment groups as discussed below for VRS (my speculation). The efficiency gains are from these packed fragments.

JoeJ · Nov 22, 2020

Alexko said:
Perhaps you were joking but this is far from a stupid idea. With good post-processing AA, and perhaps clever texturing tricks, it might produce very interesting results. For instance, you might render and especially sample textures at 4K, but path-trace at 320p, and just interpolate the path-tracing results for pixels where you don't path-trace.

I like the idea to do RT at lower res than raster, so only RT needs upscaling.
On the other hand, if i had the choice: 1 ray per pixel at 1080p or 4rpp at 270p, i would choose the former because i get more spatial information for the same number of rays.

chris1515 said:
I think the RT will be very selective for the next 5 years.

So now we know what AMD has meant with 'Select Lighting Effects' on that early RDNA2 slide

xEx · Nov 22, 2020

chris1515 said:
I have not doubt post PS5 and Xbox Series, the situation will change with RT being more important but I think the RT will be very selective for the next 5 years.

https://gfxspeak.com/2020/10/09/unreal-engine-almost/

About the future of graphics I find this approach more interesting. It's about how instead of calculating geometry and colors we could simply "imagine" it when neural networks. Maybe it's a topic worth having it's own thread.

Is in Spanish but have subs.

Deleted member 2197 · Nov 22, 2020

Did not find a post on AMD's Tech Demo released (Youtube) on the Nov. 19th, so here it is.

The demo focuses mainly on ray tracing, especially soft shadows and reflections. It does also showcase the technology used by Microsoft DirectX 12 Ultimate API, such as mesh shading variable shading rate.

The DirectX Ray Tracing (DXR) is now hardware-accelerated by AMD RDNA2 GPUs. In the demo, we can see ray tracing supported by FidelityFX Compute Effects accompanied by Stochastic Screen Space Rfflrections (SSSR). The FidelityFX Denoiser is also used to improve visual quality while reducing the computing power required to generate the scene.

For more realistic shading geometry AMD uses FidelityFX Ambient Occlusion and to improve performance there is Fidelity Variable Shading enabled. Unfortunately, the demo is not available for download, so we cannot test it ourselves.
...
The demo was prerecorded on AMD Radeon RX 6900XT GPU (the unreleased one) paired with Ryzen 9 3900 12-core CPU.

AMD releases RDNA2 technology demo as a 1080p video - VideoCardz.com

Remij · Nov 22, 2020

pharma said:
Did not find a post on AMD's Tech Demo released (Youtube) on the Nov. 19th, so here it is.

AMD releases RDNA2 technology demo as a 1080p video - VideoCardz.com

Yea, I saw it. It didn't really blow me away. The tech is fine, and produces respectable results, although the RT is particularly noisy in many shots.. (which they also try to cover up with DOF) but IMO is ultimately let down by lackluster art and presentation.

These new cards would have been the perfect time to reintroduce Ruby, with crazy good shadows and reflections. I always liked the Ruby demos.

But the GOAT Radeon tech demo for me personally is.... Pipe Dream

God I love it!

Deleted member 2197 · Nov 22, 2020

Remij said:
Yea, I saw it. It didn't really blow me away. The tech is fine, and produces respectable results, although the RT is particularly noisy in many shots.. (which they also try to cover up with DOF) but IMO is ultimately let down by lackluster art and presentation.

Agreed, but the theme was pretty good!

Remij · Nov 22, 2020

pharma said:
Agreed, but the theme was pretty good!

Yea I have no problem with the theme or anything.. just kinda the scenario they decided on. A robot Ninja runs around a hangar while a robot drone searches for him... except it's not even that exciting and nothing happens.. lol.

They should have scaled it in. Make that single character far more detailed, have a denser, slightly smaller environment, really zoom in on the detail on him at times.. have some parts of him reflecting the environment.. have him do some cool animations and then fight an enemy at which point the drone comes out, casts beautiful shadows of the two ninja robots fighting and reflecting the environment... and then close with him killing the other robot and then the drone chasing him off.

Also.. 1080p... no sir. 1440p AT LEAST with youtube, regardless if the native resolution of the demo is 1080p.

I dunno.. lol

Silent_Buddha · Nov 23, 2020

Remij said:
Yea, I saw it. It didn't really blow me away. The tech is fine, and produces respectable results, although the RT is particularly noisy in many shots.. (which they also try to cover up with DOF) but IMO is ultimately let down by lackluster art and presentation.

These new cards would have been the perfect time to reintroduce Ruby, with crazy good shadows and reflections. I always liked the Ruby demos.

But the GOAT Radeon tech demo for me personally is.... Pipe Dream

God I love it!

I still have Pipe Dream stored at multiple locations with redundancy so that I'll never risk losing it. It's still the most memorable demo I've ever experienced.

Regards,
SB

Remij · Nov 23, 2020

Silent_Buddha said:
I still have Pipe Dream stored at multiple locations with redundancy so that I'll never risk losing it. It's still the most memorable demo I've ever experienced.

Regards,
SB

Yeah haha.. it blew my mind at the time and of course still holds up as brilliantly today. The 9700 Pro was such a killer GPU too.

LordEC911 · Nov 23, 2020

I have seen something relatively recent to Pipe Dream, maybe it was RT... I don't think the newer version was from AMD though.
Maybe it was an opensource/fanmade remake?

Edit- 4k PipeDream on youtube

Erinyes · Nov 23, 2020

ethernity said:
If you happened to watch Scott's Interview with HotHardware, he said the goal of IC is not just performance. It was a tradeoff vs die area, performance and power.
He specifically said if they would have needed a wider bus to get the same BW for more performance. And the power needed by wider bus and more memory chips means higher TBP. He also added that the memory controllers + PHY would occupy a significant footprint on the chip.

The other point is Infinity Cache also appears to be a forward looking feature, which could also be used on CPUs & APUs. And since SRAM traditionally scales better than analog, it's cheaper in the future than adding more PHYs. SRAM scaling seems to be slowing down however, with TSMC only promising a 1.35x scaling for 5nm and 1.2x for 3nm. For the Apple A14, SRAM scaling was actually found to be only 1.19x so it's significantly lower than TSMC's claimed 1.35x. Whether this is due to the process not delivering advertised gains, or design decisions for power/performance, we won't fully know yet until we can analyse more 5nm chips. But it would still be better than analog.

ethernity said:
A downclocked N22/N23 in mobile form would be very efficient looking at the chart below
View attachment 4965
And according to banned member Navi 2x is getting a a lot of interest from Laptop OEMs for its efficiency which is what Scott also mentioned.
They asked some questions which he dodged regarding a cut down IC for low power form factors but it seems obvious.

https://twitter.com/x/status/1326860481890574337

Yes I had posted this chart a few pages back and commented on the likely position of AMD's mobile gaming platforms next year. Cezanne will be able to make use of the updated 7nm and increased power efficiency of Zen 3 to further increase AMD's CPU lead over Comet Lake, though Tiger Lake H could bring parity. I still predict Cezanne + RDNA2 to be the best selling mobile gaming platform in 2021. This is a significant market btw, and in Nvidia's recent earnings call they specifically mentioned that they've had 11 successive quarters of double digit growth in mobile.

ethernity said:
One additional point from that call was that, Pro variants will come and from Linux commits, they will carry 2048bit HBM.

But the die shot of N21 at least certainly does not seem to have any HBM PHYs, or have I missed something?

Jawed said:
That chart is almost as scummy as the NVidia equivalent, both making 2x performance per watt claims by cherry picking places on the curves that don't relate to the best performing cards being compared.

How is it scummy? NV was comparing different power and performance levels, on different processes. But AMD's compares the SAME clockspeeds and power, iso process. So an RDNA2 CU at the same clock will consume ~50% of the power of an RDNA1 CU. Unlike desktops where you can push power and thermals, for mobile GPUs this is very relevant as you are power limited.

fellix · Nov 23, 2020

Erinyes said:
The other point is Infinity Cache also appears to be a forward looking feature, which could also be used on CPUs & APUs. And since SRAM traditionally scales better than analog, it's cheaper in the future than adding more PHYs. SRAM scaling seems to be slowing down however, with TSMC only promising a 1.35x scaling for 5nm and 1.2x for 3nm.

TSMC should probably consider researching the various EDRAM technologies to resolve the memory scaling issues, particularly for large cache arrays.
IBM and Intel already employ different integration methods, though these are very tightly related to their particular manufacturing process.

Silent_Buddha · Nov 23, 2020

LordEC911 said:
I have seen something relatively recent to Pipe Dream, maybe it was RT... I don't think the newer version was from AMD though.
Maybe it was an opensource/fanmade remake?

Edit- 4k PipeDream on youtube

Just thinking about it, instead of that kind of lackluster robot thing they recently released, it would have been cool if they'd done a 4k (or even 1440p) RT remake of Pipe Dream running in real time. There's lots of opportunities there to showcase some RT effects. Lighting, shadows, reflections (of moving objects), etc. Maybe a dynamic light or two moving around while the scene is playing out. I just feel like it would have been more impressive than that robot demo.

Regards,
SB

3dilettante · Nov 23, 2020

Erinyes said:
Techpowerup did have the slides posted here, they cover some of the memory latency aspects - https://www.techpowerup.com/review/amd-radeon-rx-6800-xt/2.html

I'm interested in seeing the endnotes for some of the slides like the memory latency one. It might give some of the base values that go into their percentages. I'm not sure whether the infinity cache's latency improvement is a percentage of the total memory latency (L0,L1,L2,memory total) or it's relative to the latency of the DRAM access.

Leoneazzurro5 said:
About this point, I'm thinking that the BVH structure retention / discarding by the cache may be a driver and application matter, more than being hardwired. This also explains a part of the need of having specific optimization for the AMD ray tracing implementation.

Driver commits indicate it can happen at page granularity, and there are also flags for specific functionality types. It's not clear BVH fits in that, unless it might hide under the umbrella of some of the metadata related to DCC or HiZ.
Some of those would seem to be better kept in-cache, since DCC in particular can suffer from thrashing of its metadata cache, injecting a level latency sensitivity normal accesses wouldn't.

Leoneazzurro5 said:
The vanilla 6800 has one less SE so one less rasterizer. But higher clocks, and it not known if pre-cull numbers are the same.

Is there a source for this, or tests that can tell the difference between an SE being inactivated versus an equivalent number of shader arrays disabled across the chip?

no-X said:
I thought the general consensus was, that AMD disabled one entire SE.

AMD's Sienna Cichlid code introduced a function to track for disabling formerly per-SE resources like ROPs at a shader array level. This might lead to similar outcomes.

Jawed said:
When you bring evidence to back up your assumptions, I guess you'll have an argument.

We do have some comparison in terms of AMD's patent for BVH acceleration versus Nvidia's. There are some potential points of interest, such as the round-trip node traversal must make to the SIMD from the RT block, and the implicit granularity of execution being SIMD-width.
There are some code commits that give instruction formats for BVH operations that look to be in-line with the patent.

Leoneazzurro5 said:
Or there is a problem in a RBE, or in the scheduling HW of that SE.

RBEs are something that can be disabled at a different granularity than SEs, though.

j^aws said:
I asked a few times earlier in the thread, but I didn't get clarification. Recall that there are 4 Packers per Scan Converter, so Navi21 has 32 Packers. And 8 Packers per Raster Unit (2 Scan Converters). Each Packer, up to 4 Packers, is being dispatched to each Shader Array with optimised fragments, arranged as 1x2, 2x1 or 2x2 fragment groups as discussed below for VRS (my speculation). The efficiency gains are from these packed fragments.

The packers I am thinking of are related to primitive order processing, which is related to rasterizer ordered views rather than how primitives are translated to wavefronts.

fellix said:
TSMC should probably consider researching the various EDRAM technologies to resolve the memory scaling issues, particularly for large cache arrays.
IBM and Intel already employ different integration methods, though these are very tightly related to their particular manufacturing process.

Perhaps as scaling falters, the pressure will resume to go back to EDRAM despite the cost and complexity penalties.
Neither IBM or Intel have that technique available at smaller nodes. IBM's next Power chip dropped the capability since IBM sold off that fab to Globalfoundries--which then gave up scaling to lower nodes, and Power was the standout for having EDRAM.

Digidi · Nov 23, 2020

3dilettante said:
The packers I am thinking of are related to primitive order processing, which is related to rasterizer ordered views rather than how primitives are translated to wavefronts.

If you are checking mangos and Linux driver the Packer are coming after the scan converter. I think packer are taking the pixel from the rasterizer and send them to the shaders?

https://www.pcgamer.com/a-linux-update-may-have-let-slip-amd-big-navis-mammoth-core/

pjbliverpool · Nov 23, 2020

Silent_Buddha said:
Just thinking about it, instead of that kind of lackluster robot thing they recently released, it would have been cool if they'd done a 4k (or even 1440p) RT remake of Pipe Dream running in real time. There's lots of opportunities there to showcase some RT effects. Lighting, shadows, reflections (of moving objects), etc. Maybe a dynamic light or two moving around while the scene is playing out. I just feel like it would have been more impressive than that robot demo.

Regards,
SB

It may have invited unwelcome comparisons to Nvidias Marbles demo.

Mat3 · Nov 23, 2020

chris1515 said:
Lumen system is very different than RTXGI and this is not what they have in mind at least probably for this console generation, Lumen is probably easier to use for lighting artist because there is no probes at all.

https://www.eurogamer.net/articles/...eal-engine-5-playstation-5-tech-demo-analysis

Unreal Engine 4 uses RT but not UE 5 after maybe in the future they will RT for specular reflection. The engine was designed around PS5 and Xbox Series X|S.

Demon's souls uses a froxel based GI system based on probes.

From the article, part of the implementation is tracing rays to voxels. Could the ray tracing box testers on these new GPUs possibly be used for that? Voxels are boxes too...

"Lumen uses ray tracing to solve indirect lighting, but not triangle ray tracing," explains Daniel Wright, technical director of graphics at Epic. "Lumen traces rays against a scene representation consisting of signed distance fields, voxels and height fields. As a result, it requires no special ray tracing hardware."
To achieve fully dynamic real-time GI, Lumen has a specific hierarchy. "Lumen uses a combination of different techniques to efficiently trace rays," continues Wright. "Screen-space traces handle tiny details, mesh signed distance field traces handle medium-scale light transfer and voxel traces handle large scale light transfer."

j^aws · Nov 23, 2020

3dilettante said:
The packers I am thinking of are related to primitive order processing, which is related to rasterizer ordered views rather than how primitives are translated to wavefronts.

The driver leak has changes to SIMD waves, which combined with the slide about RB+ and Packers connected to Scan Converters in the driver leak as well, suggests some optimisations post scan conversation and dispatching to Shader Arrays. Number of Packers per Scan Converters doubled from RDNA1, but triangle per clock rasterisation remains the same at 4 per clock.

Kaotik · Nov 23, 2020

3dilettante said:
I'm interested in seeing the endnotes for some of the slides like the memory latency one. It might give some of the base values that go into their percentages. I'm not sure whether the infinity cache's latency improvement is a percentage of the total memory latency (L0,L1,L2,memory total) or it's relative to the latency of the DRAM access.

There you go

Digidi · Nov 23, 2020

I hope that we get rasterizer results soon from @CarstenS or @Ryan Smith . I’m interested in the values of 0% culling list or strip polygons and how it relates to navi 10 and rtx 3090.

AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

j^aws

JoeJ

xEx

Deleted member 2197

Guest

Remij

Deleted member 2197

Guest

Remij

Silent_Buddha

Remij

LordEC911

Erinyes

fellix

Silent_Buddha

3dilettante

Digidi

pjbliverpool

B3D Scallywag

Mat3

j^aws

Kaotik

Drunk Member

Digidi

Similar threads