AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Real question is whether the 6900xt can beat the 3080 in ray tracing. I have a feeling it can’t. Means for professional applications that deal with RT the $1000 price tag may not be a bargain.

For those same professional applications, machine learning, etc., the extra 8GB of VRAM the 3090 gives, along with the approaching-2x increase in actual memory bandwidth for every data set might well be worth the extra $500.

Yes that makes a lot more sense. I figured it would be a case of different settings but I'm not familiar with the benchmark.

Thats also exactly in line with the previous leak. So 2080Ti level performance certainly isn't too shabby. As has been mentioned before, the synergy with the consoles may close a lot of that gap once games start to optimize RT for that architecture.

This is probably the absolute best case scenario, with a small fairly static scene, little geometry, and at a low resolution with very little VRAM usage. If there was ever a case that the infinity cache could get its hit rate into the 90s, this is probably it.
I'm waiting for benchmarks in Control with everything RT cranked, tested in 4k/5k/8k resolutions.
 
All of the odds.
Infinity Cache, as originally mentioned by RGT when they leaked it for the first time, was developed as a way to significantly increase performance on mobile GPUs by decreasing VRAM channels (hence easier to scale down), and most of all to be able to increase gaming performance on the PC APUs that are stuck on standard DDRx or LPDDRx.

Will be interesting to see how this can/will be implemented in APUs and how much performance it will bring. Also if this can also be done with CPUs increasing the ddr4/5 bandwidth as well. And of course how will this evolves in RDNA3 with this(rumored) Chiplet design.
 
Why would it? You aren't storing raw textures in there, you are storing data that might be needed again from the other caches/units and supposedly RT stuff.
If the 60-70% utilization out of the box is a realistic number and one of the consoles has something similar, we may see more devs than just the AMD partnered ones optimize for that other 20-30%.

Considering the bus width to compute ratio on the consoles I doubt either have "infinity cache". In fact I'm 99.9% certain of it given their APU die sizes. And once a next gen game runs at 1620p on a Series X that's probably getting way past full occupancy for the cache at 4k.

I just don't understand the logic of it at all. The "energy savings" of it versus a 384bit bus are oversold, as all GPU vendors oversell memory subsystem power savings. Do the math and it's a few watts, oh no. They could've hit their relative bandwidth to compute targets for the 6090 by using a 384bit bus and working with a DRAM maker to hit 19gbps, 18gbps was demoed as stable last year so it seems doable. I have my doubts about it increasing instruction throughput that much either beyond raytracing. Sure, raytracing needs a ton of unpredictable memory access and can be pretty latency sensitive. But normal graphics shouldn't be that much affected. They could've had a much smaller die, and thus better throughput and yields, if they'd just stuck with a normal memory subsystem. Not too mention there'd be no worries about thrashing the cache versus buffer size/resolution. If it's really for mobile, which is where a few watts matter, then only bloody do it on mobile.

I'm also utterly surprised that there's not a 6900 and 6900xt. Bin the complete dies into ones that can hit 2.4ghz stable at 350 watts or whatever, sell that at $1k and it can reliably compete with a 3090, while shipping the normal 6900s at $850. As it is the performance difference between a 6800xt and 6900 are embarrassingly close for their price difference. 11% max for $350 more, sheesh.
 
Considering the bus width to compute ratio on the consoles I doubt either have "infinity cache". In fact I'm 99.9% certain of it given their APU die sizes. And once a next gen game runs at 1620p on a Series X that's probably getting way past full occupancy for the cache at 4k.

I just don't understand the logic of it at all. The "energy savings" of it versus a 384bit bus are oversold, as all GPU vendors oversell memory subsystem power savings. Do the math and it's a few watts, oh no. They could've hit their relative bandwidth to compute targets for the 6090 by using a 384bit bus and working with a DRAM maker to hit 19gbps, 18gbps was demoed as stable last year so it seems doable. I have my doubts about it increasing instruction throughput that much either beyond raytracing. Sure, raytracing needs a ton of unpredictable memory access and can be pretty latency sensitive. But normal graphics shouldn't be that much affected. They could've had a much smaller die, and thus better throughput and yields, if they'd just stuck with a normal memory subsystem. Not too mention there'd be no worries about thrashing the cache versus buffer size/resolution. If it's really for mobile, which is where a few watts matter, then only bloody do it on mobile.

I'm also utterly surprised that there's not a 6900 and 6900xt. Bin the complete dies into ones that can hit 2.4ghz stable at 350 watts or whatever, sell that at $1k and it can reliably compete with a 3090, while shipping the normal 6900s at $850. As it is the performance difference between a 6800xt and 6900 are embarrassingly close for their price difference. 11% max for $350 more, sheesh.

Why would it pass full occupancy at 1620p on series x ? you talking about the frame buffer ? What if its not using the cache for the frame buffer ?
 
Never have I been so happy to have been wrong about the number of ROPs. In the other thread I was in total disbelief about there being 128 ROPs on Navi 21.

I was in the same boat and am very glad to be proven wrong. I think Xbox Series X was starting to convince me there would be at least 96 ROPs but I was still half expecting the same old 64.
 
The RX6800 with 60CUs instead of rumored 64CUs is slightly disappointing, but I suppose it's easier to just disable one entire SE so we get 60CUs. Given that it has lower clocks as well, one would have thought they could go with 192 bit/12GB and price it a bit lower. The 6800XT being only $70 more than the 6800 makes it the better buy. AMD really should have matched the 3080, especially as they're giving 16 GB VRAM. Then the 6800 at $579 or even $599 would be better priced. The price difference makes it seem like they simply want to upsell the 6800 buyer to the 6800XT.

Overall pretty much as expected based on all the rumors and a great showing by AMD. The 6900XT staying at 300W was a pleasant surprise. Certainly we should see an XTX or AIB OC versions at 350W+ in the future.
Very good. If we see desktop 6700 and 6600 in Q1/Q2 '21, there will likely be refreshed laptop designs later on in 2021 with Zen3 based mobile processors and RDNA2 mobile GPUs.
Back to school, aka early Q3 best case. Late Q3/early Q4 worst case.

N22 is supposedly just a month behind N21 and I'm guessing there was a bit of a delay as AMD would have originally planned to hit the holiday season with both N22 and N21. We should definitely see N22 in Q1 and probably N23 by late Q1. With the increased perf/W, they're going to be fantastic mobile GPUs.

Cezanne + N22/N23 laptops should hit by back to school hopefully. N23 with its rumored 32CU 128 bit config (And 32MB infinity cache?) should be particularly popular. With smartshift and smart memory access, AMD really offers a formidable mobile gaming platform that I don't think Intel can touch.
All of the odds.
Infinity Cache, as originally mentioned by RGT when they leaked it for the first time, was developed as a way to significantly increase performance on mobile GPUs by decreasing VRAM channels (hence easier to scale down), and most of all to be able to increase gaming performance on the PC APUs that are stuck on standard DDRx or LPDDRx.

The infinity cache is also a forward looking feature and should scale very well as we go down to 5nm and then 3nm as SRAM scales a lot better than analog. And possibly made with the intent of future chiplet designs in mind. For APUs in particular, as you say it should allow a good increase in performance while sticking with regular (and cheaper) DRAM. Do you think Cezanne has some amount of infinity cache already? The iGPU is rumored to be 30% faster than Renoir despite having the exact same Vega configuration and while clocks are a bit higher, they're not that much higher.
 

Thanks, I hadn't got round to reading that yet. I'm definitely encouraged by it but this is a big divergence from how DS has been described in the past by all other sources so I do think we need more detail around this. Its possible (albeit erring on the unlikely) that Anandtech are conflating RTX-IO functionality with Direct Storage functionality. Believe me though when I say that no-one will be more pleased than me if they are correct. As far as I'm concerned, Anandtechs description there is the best case scenario.
 
z1k7j.jpg


https://twitter.com/Locuza_
Given the 2 xGMI / infinity fabric links and 256-bit bus, putting two of those GPUs on a single board looks feasible, in case 8K becomes a thing.
 
I doubt it'd be viable - inter-chip traffic would be too high for two IF links to cover it reliably, and power consumption would be prohibitely high (for example, zen2's soc consumes around 17W of power at all times and its IF bandwidth is ten times less even if we compare it to a 256bit VRAM). Sure, it's faster than PCI-E link but it'd be still not enough for proper "seamless" communication.
 
Thanks, I hadn't got round to reading that yet. I'm definitely encouraged by it but this is a big divergence from how DS has been described in the past by all other sources so I do think we need more detail around this. Its possible (albeit erring on the unlikely) that Anandtech are conflating RTX-IO functionality with Direct Storage functionality. Believe me though when I say that no-one will be more pleased than me if they are correct. As far as I'm concerned, Anandtechs description there is the best case scenario.

Do you really think AMD who designed the SoCs for the XSX and PS5 don't have a full fledged implementation of DS? No-one likes proprietary standards, not even Nvidia. The scant adoption of RTX before it became a part of DirectX shows as much.

Either ways, DS is not really relevant for the PC market at the moment and won't be for some time due to the fragmented nature of PC hardware and software. It will be a while before an SSD is a hard requirement for a PC game.
Given the 2 xGMI / infinity fabric links and 256-bit bus, putting two of those GPUs on a single board looks feasible, in case 8K becomes a thing.

But dual GPU is all but dead at the moment. And power consumption will be prohibitively high anyway. Dissipating more than 500W with an air cooler is close to impossible.
 
I doubt it'd be viable - inter-chip traffic would be too high for two IF links to cover it reliably, and power consumption would be prohibitely high (for example, zen2's soc consumes around 17W of power at all times and its IF bandwidth is ten times less even if we compare it to a 256bit VRAM). Sure, it's faster than PCI-E link but it'd be still not enough for proper "seamless" communication.

2x IF links are ~200 GB/s, if I'm not mistaken. These links must serve some kind of practical purpose.
 
I didn't believe in it either, I kinda still don't. It just doesn't scale well in terms of resolution. What happens when dedicated next gen games come out, does the cache get thrashed to hell at 4k?

128MB is > 16 bytes per pixel on 4k. That's enough, spending more on targets would probably just hurt performance on all other GPUs too. It will not scale to 8k, ofc. If someone does go crazy on render targets, they can avoid trashing by putting only certain parts of render targets in the cache, instead of doing simple LRU replacement, leading to more gradual performance degradation.

It’s questionable if Ampere’s increased RT performance brings any value to actual games. The bottlenecks seem to be elsewhere.

When a game is doing mixed raster/rt rendering, the devs have to balance the two different types of workloads, and they are mostly going to balance them based on the cards they are working on/that are on the market right now, because that's what the customers are running them on.

I fully expect that every GPU generation going forwards is going to be more RT-heavy (that is, improving RT perf more than raster perf), and that this extra RT performance will mostly be wasted in mixed renderer games on release date. By the end of the generation, the balance in most released game will have shifted, though. This process will continue until rasterization will not be used in new games anymore.

The RX6800 with 60CUs instead of rumored 64CUs is slightly disappointing, but I suppose it's easier to just disable one entire SE so we get 60CUs.

The 72CU model already allows salvaging any GPUs with two independent faults that each slag two WGPs. I doubt they will have enough GPUs with more independent faults that there would be much sense in such a model.

RX6800 also cuts ROPs, and probably everything else in the SE too, allowing recovery of any GPU with a single fault anywhere in the SE area.


Given that it has lower clocks as well, one would have thought they could go with 192 bit/12GB and price it a bit lower. The 6800XT being only $70 more than the 6800 makes it the better buy. AMD really should have matched the 3080, especially as they're giving 16 GB VRAM. Then the 6800 at $579 or even $599 would be better priced. The price difference makes it seem like they simply want to upsell the 6800 buyer to the 6800XT.

Agreed, it doesn't seem like they want to sell many of them.
 
The RX6800 with 60CUs instead of rumored 64CUs is slightly disappointing, but I suppose it's easier to just disable one entire SE so we get 60CUs. Given that it has lower clocks as well, one would have thought they could go with 192 bit/12GB and price it a bit lower. The 6800XT being only $70 more than the 6800 makes it the better buy. AMD really should have matched the 3080, especially as they're giving 16 GB VRAM. Then the 6800 at $579 or even $599 would be better priced. The price difference makes it seem like they simply want to upsell the 6800 buyer to the 6800XT.
Maybe they want to upsell 3070 buyer to 6800. For $80 more you get double the VRAM. The base VRAM bandwidth is not only faster, but 6800 also have infinity cache. No matter how you look at it, 6800 is a much better buy than 3070, especially if you mainly game and don't care about tensor and CUDA stuff. Of course after that you might think "add $70 more and you get 6800XT"....
It would be really nice if they priced 6800 at $550.

Looking at the rumored 6700/Navi 22 spec, I prefer for AMD to have another SKU using similar chip or spec as Navi 21 but without the infinity cache to slot below the 6800. 6800 series without infinity cache but still with 256bit RAM.
 
I think the 6800 appeal will be for overclockers... if the card uses the same PCB (and cooling) of the 6800XT then it would be quite easy to bring it up to 2,25 Ghz and maybe more with limited thermal issues, with a good scaling due to having the same memory subsystem.
 
I never believed in the infinity cache, have to give to to AMD and that british youtuber who was so adamant that it exists.
I'm still in shock, and awed by the idea of this godlike rambo cache that can be seen from space :p

I wonder how this stuff affects potential GPGPU applications, but I guess there's CDNA for that so AMD won't have to do both at the same time in the same chip anymore. Hopefully it's more BW-constrained in pure compute tasks so the miners won't buy out all the cards (or at least there'll be some lag between mining algos running well on RDNA2 like it was with Vega which was more or less available at msrp till late october 2018)
ETH mining, the real risk as far as I can tell, uses a memory latency intensive (but not bandwidth intensive) algorithm. Simply can't tell, but I expect RDNA 2 to be good at it.

I do expect there to be some edge cases where the infinity cache falls apart as well. I suspect it'll work great most of the time, but there will be some situations where you can't fake your way out of having the real memory bandwidth. Heavy simultaneous RT + rasterization might be one of those cases. It'll be interesting to see how much if any manual tuning AMD will need to do on the driver side to avoid thrashing the cache in heavy RT games with large working sets.
Yeah, I'm expecting it to take a year or longer to be optimised in the driver. Old games, with their DX9 style renderers will be "fine". Techpowerup seems to like reviewing cards with old games.

Never have I been so happy to have been wrong about the number of ROPs. In the other thread I was in total disbelief about there being 128 ROPs on Navi 21.
I still believe 5700XT has far too much fillrate for its performance. It seems RDNA 2 is more of the same.

I wonder if these 6000 series cards are going to age badly. FLOPS per pixel will only increase with new games. The counter-argument is that existing FLOPS are being thoroughly wasted by RDNA, so there's no point in adding more FLOPS per pixel. Use the existing FLOPS more efficiently instead...
 
Back
Top