AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Is there a source for this, or tests that can tell the difference between an SE being inactivated versus an equivalent number of shader arrays disabled across the chip?
Pixel fillrate is down by (a little more than) one quarter. With this little more than is mainly due to lower clocks on the 6800.

I hope that we get rasterizer results soon from @CarstenS or @Ryan Smith . I’m interested in the values of 0% culling list or strip polygons and how it relates to navi 10 and rtx 3090.
Need to sort out irregularities first, waiting for answers from AMD right now.
 
Perhaps as scaling falters, the pressure will resume to go back to EDRAM despite the cost and complexity penalties.
Neither IBM or Intel have that technique available at smaller nodes. IBM's next Power chip dropped the capability since IBM sold off that fab to Globalfoundries--which then gave up scaling to lower nodes, and Power was the standout for having EDRAM.

There's some theoretical replacement for SRAM such as STT-MRAM that scale much better than SRAM, but afaik they're all still in the "theoretically we could make this mass manufacturable" stage with no guarantee of escaping there.

The other option I see is the same 2.5d/3d memory stacking that's being done for all other types of ram. No reason it shouldn't work for SRAM as long as it's kept under whatever z-height limit there is.

Longer term, well it's now theoretically possible for every basic component you need to be built on carbon nanotubes rather than silicon. A limited prototype was even revealed last year. So the "replace silicon" movement has at least some momentum behind it.
 
There's some theoretical replacement for SRAM such as STT-MRAM that scale much better than SRAM, but afaik they're all still in the "theoretically we could make this mass manufacturable" stage with no guarantee of escaping there.

The other option I see is the same 2.5d/3d memory stacking that's being done for all other types of ram. No reason it shouldn't work for SRAM as long as it's kept under whatever z-height limit there is.

Longer term, well it's now theoretically possible for every basic component you need to be built on carbon nanotubes rather than silicon. A limited prototype was even revealed last year. So the "replace silicon" movement has at least some momentum behind it.

What amount of space would they need to hit 80%+ hit rates @ 4k. I don't think 8k is going to be an issue for a long time realisticly as displays are still extremely expensive and even dlss from the other guys has issues running it.

80% @ 1080p with a 128meg cache and 58% @4k right ? So something between 128 and 256 should be enough
 
What amount of space would they need to hit 80%+ hit rates @ 4k. I don't think 8k is going to be an issue for a long time realisticly as displays are still extremely expensive and even dlss from the other guys has issues running it.

80% @ 1080p with a 128meg cache and 58% @4k right ? So something between 128 and 256 should be enough

Several gigs apparently. I thought they were going to run out of room, and they did, but not on the data that's in the cache apparently. At least, based on the quoted space in the cache taken up on average by triple a games. Instead they're hampered in some way by the requirements of assets coming in from main memory. There's just not enough bandwidth to main, so the whole GPU sits on its heels waiting for data. I don't know what pass they're waiting on, g-buffer fill maybe as they read and copy each texel into a buffer.

Which means the problem of feeding the beast, or rather getting the required memory readings into logic fast enough, isn't solved by a huge amount of s-ram. It's not a fundamental silver bullet, and scaling the big cache down in area would probably be better for cost savings than for scaling it up bigger to get more performance.
 
Last edited:
Err, extra post, but here's a question: What on earth is with the following. CoD Black Ops runs on the consoles, with raytraced shadows enabled, at a nigh locked 60, above 1440p. Meanwhile a 6800xt can't even hit 50fps average at 1440p locked.

Are the settings so massively different, are the drivers that terrible, does the bvh overrun the cache and the whole things stalls while main memory is accessed. Like... wtf?
 
Are the settings so massively different, are the drivers that terrible, does the bvh overrun the cache and the whole things stalls while main memory is accessed. Like... wtf?

Unconstrained access to the hardware is likely the reason. The API(s) on the PC are too abstract to allow to reach peak performance of the hardware.
 
If you are checking mangos and Linux driver the Packer are coming after the scan converter. I think packer are taking the pixel from the rasterizer and send them to the shaders?

linux.png
https://www.pcgamer.com/a-linux-update-may-have-let-slip-amd-big-navis-mammoth-core/
I don't know how far back a historical comparison can go to see how the counts for packers have varied. Ideally, I'd like to see a reference for something pre-Vega.
The order things show in the list may sometimes give an idea of a relative hierarchy, but there's enough variation that I wouldn't guarantee a specific workflow or function based on where things are in the table.

The driver leak has changes to SIMD waves, which combined with the slide about RB+ and Packers connected to Scan Converters in the driver leak as well, suggests some optimisations post scan conversation and dispatching to Shader Arrays. Number of Packers per Scan Converters doubled from RDNA1, but triangle per clock rasterisation remains the same at 4 per clock.
The number of waves per SIMD is something that can be modified by the architecture, although I don't see a direct link to a change like that to other functions. The number of waves per SIMD doesn't directly inform what each shader does, and could also be a side-effect of optimizing for the higher clock range of the architecture.

That might mean the half-latency is relative to an L2 miss in prior GPUs. GCN had ~350 cycles for an L2 miss, which if carried over to RDNA2 would mean infinity cache would be roughly equal to an L2 hit. However, the time elapsed since then may leave out the clock rise since GCN and other factors like the infinity fabric or just higher latencies from memory.
It'd be nice if the internal caches were sped up, but that's not clear. There does seem to be a slide mentioning that the L2 cache now provides an aggregate of 2048B/cycle, which means it's supplying double the bandwidth from before. That seems nice, since the infinity cache was unusually close to L2 bandwidth on its own.

Pixel fillrate is down by (a little more than) one quarter. With this little more than is mainly due to lower clocks on the 6800.
My question would go to the earlier code changes about disabling RBEs at a different granularity than per-SE. Perhaps with the geometry engine there can be a more flexible way of performing an initial coarse rasterization step rather than counting on a more static screen space assignment for each SE.
 
Unconstrained access to the hardware is likely the reason. The API(s) on the PC are too abstract to allow to reach peak performance of the hardware.
This early in the gen I doubt any multiplatform games are doing anything too low level on consoles. It's most likely settings.
 
According to DF WDL piece, XBSX runs the game with relatively low RT settings. Wouldn't be surprised if that is the case with CoD also.
This early in the gen I doubt any multiplatform games are doing anything too low level on consoles. It's most likely settings.

It isn't though, take a look. Those shadows are damned smooth, at least as good as Modern Warfare's before it. All at 60 and often at 1800p or above on the consoles.

Could the API difference be as huge as that? If you scaled the relative compute power between a 6800xt and a PS5 the former should be averaging a hundred fps at 1440p, or more.
 
It isn't though, take a look. Those shadows are damned smooth, at least as good as Modern Warfare's before it. All at 60 and often at 1800p or above on the consoles.

Could the API difference be as huge as that? If you scaled the relative compute power between a 6800xt and a PS5 the former should be averaging a hundred fps at 1440p, or more.
COD games always run very well on AMD GPUs relative to consoles. We should wait for some comparisons.
 
This early in the gen I doubt any multiplatform games are doing anything too low level on consoles. It's most likely settings.

I can say from my development, that console API extensions on XBox and in general the PS API allow you to intuitively reach creative constructions, which afterwards are a big PITA to somehow recreate on the PC APIs. The most interesting, and fairly easy to use parts, are often not available on the PC at all.

As a small example, the shared memory is very easy to exploit, and the internal organization of textures is fixed, the two together remove a large amount of processing. The PC is abstracting this organization, and has to, because from game-start to game-start you might change the card, or the driver. Ofc there's the possibility to say, that the organization is acessible/visible, and in exchange you have to manage it yourself, and rebake them when something changes. This is not available on the PC APIs, and it will never be, judging the stance and arguments from driver developers and MS personel.

There are many more, very in-depth programming involving cases, which can't be shared. Be assured that from a performance optimization perspective the two realms (console vs. PC APIs) are somewhat different universes for a programmer. And the consoles are not hard to program, like a long time ago, it's fairly easy and streightforward, if you know what you are doing.
 
I can say from my development, that console API extensions on XBox and in general the PS API allow you to intuitively reach creative constructions, which afterwards are a big PITA to somehow recreate on the PC APIs. The most interesting, and fairly easy to use parts, are often not available on the PC at all.

As a small example, the shared memory is very easy to exploit, and the internal organization of textures is fixed, the two together remove a large amount of processing. The PC is abstracting this organization, and has to, because from game-start to game-start you might change the card, or the driver. Ofc there's the possibility to say, that the organization is acessible/visible, and in exchange you have to manage it yourself, and rebake them when something changes. This is not available on the PC APIs, and it will never be, judging the stance and arguments from driver developers and MS personel.

There are many more, very in-depth programming involving cases, which can't be shared. Be assured that from a performance optimization perspective the two realms (console vs. PC APIs) are somewhat different universes for a programmer. And the consoles are not hard to program, like a long time ago, it's fairly easy and streightforward, if you know what you are doing.
Thx for the info. I had thought low level programming on consoles was still quite complex. Would you say its easier than a high level PC API like DX11? Would these launch, cross gen games already be exploiting enough console optimization to have such a gaping performance gap as alluded to by Frenetic Pony?
 
Err, extra post, but here's a question: What on earth is with the following. CoD Black Ops runs on the consoles, with raytraced shadows enabled, at a nigh locked 60, above 1440p. Meanwhile a 6800xt can't even hit 50fps average at 1440p locked.

Are the settings so massively different, are the drivers that terrible, does the bvh overrun the cache and the whole things stalls while main memory is accessed. Like... wtf?

Maybe Ultra settings goes further than console settings.
 
@DavidGraham So they're just using the RT hardware to accelerate what is typical screen space shadows? It's kind of curious. I'm guessing that keeps the ray tracing very cache friendly and minimizes the bvh because you'd only really have to build a bvh for what's in the players view frustrum. I'll check the video out.
 
Back
Top