AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Some points:
* most of the graphs showing AMD results were achieved using the OC (Rage Mode) and some even the "Ryzen 5k + chipset" magic - this is not good
* not a single slide showing Ray Tracing - this telling
* the cut down version got still whole 128MB cache

Are you sure about that Rage mode point? They showed numbers before introducing rage mode.
 
Yea, my bad. The RX 6800 XT was benchmarked "vanilla" for some reason... consistency anyone?

Yes, they marked one bemchmark using the memory thing, the another not using that and the third one using the memory thing and being OCed. What a marketing

It's also a bit interesting that they used "best API" as opposed to being specific on which API. I'd also wondering if this means their numbers are derived possibly from different APIs for each GPU? This may lead to some possibly interesting divergence when review testing gets done.

Overall I'd say at the moment this is tracking slightly higher than my expectations. It's good enough and priced just better enough to give people pause but unless some more dramatics get revl

But I have to go back to something I keep bringing up with every AMD launch in that people should not expect a Terascale vs Tesla.

That wasn't my point. Nvidia's cards are capable of supporting full RT, which their demos have shown. It's still up to the developers on doing so.

I would expect RDNA2 cards to support "full RT" (well at least DXR "full RT") as ultimately it's a driver support question as DXR doesn't define what type of hardware needs to be present.

Of course the practical issue will be to what extent RDNA2 is capable of performance wise once you "ramp up" the RT effects.

This could be another repeat of the tessellation situation, except this time the higher fidelity settings in this case will be considerably more impactful.

It also would lead to some interesting benchmark messaging/interpretation as reviews tend to want to benchmark apples to apples via "max settings."

I'm disappointed they didn't show anything comparable to RTX IO and GPU based decompression. That suggests it will be an Nvidia exclusive feature and thus not widely adopted.

I thought directstorage support was mentioned?

https://www.tomshardware.com/news/a...-with-ryzen-5000-cpus-via-smart-memory-access

The other new feature is "Smart Memory Access" which maybe is what they're pushing more and over shadowing the above.

I'm guessing Smart Memory Access won't be an "open" solution. Interesting how leverage changes viewpoints on "open" vs "walled garden" approaches isn't it?
 
88143984.jpg


Is that an HBM PHY (x2) at the "south west" side?
 
Doesn't AMD laser cut inactive CUs on their dies?
I think these days it's generally on-die fuses or BIOS inactivation. Some of the recent code changes for shader array inactivation specifically reference one or both being able to inactivate a resource.
The number of places that might need to be lasered and how finely structured everything is likely made laser-cutting fall out of favor some time ago.

So I was going to throw this in the speculation thread when I thought of it a couple days ago but now it is locked.

It seems like the Infinity Cache usage is going to be a very efficient way to scale down on new nodes. Keeping power and performance in a compact size for mobile?
I don't expect them to do a simple shrink for RDNA 3 next year but seems like scaling down to 5nm will allow for additional focus to other power/performance benefits.
It should reduce active power consumption due to memory traffic. Mobile might still be concerned about static leakage, since even with FinFETs a cache of this size may leak enough power to concern a mobile product where the power budget might 1-2 orders of magnitude lower. Smaller caches or aggressive power gating might take care of some of that.

hm... I assume the first is picturing the 6800 with 4 disabled DCUs while the latter is 6900.
That makes sense. Seemed odd to me that it would be depicted like AMD physically blanked out those areas of the die.
 
Zen's L3 was the closest example I could think of AMD's large SRAM implementations prior to the announcement, so this makes sense.
It would probably be structured differently, since each Zen L3 could supply 32 bytes per cycle to 4 cores at ~4GHz, per 16 MB cache.
The delivered bandwidth numbers would be massively higher if that level of bandwidth were maintained, and so would the power and area costs.

I think we can actually calculate how much bandwidth there is in the Infinity Cache, from this slide:


D8PnZMA.png




If we assume they're talking about 16Gbps, then the "1.0X 384bit G6" means 768GB/s and the "256bit G6" is 512GB/s.
If the Infinity Cache is 2.17x the 384bit G6, then its output is 1666.56GB/s. Take away the 512GB/s from the 256bit G6 and we get 1154.56GB/s for the Infinity Cache alone.
I'm guessing this is an odd number because this LLC is working at the same clocks as the rest of the GPU.. maybe they're using the 2015MHz game clock.
 
Infinity cache is super interesting. It will be great to see that get dissected down. Which things it will make super fast? Are the edge cases where performance might fall of the cliff and how game developers will handle that?
 
So, no real details on RT yet, unfortunately. AMD working on some form of DLSS style upsampling, but no details yet.

On the plus side, performance when not using RT rivals NV's equivalently placed cards and on the top end only 2x8 pin power connectors are used. So, power draw when the card is really pushed should be lower than the competition.

I do wonder if AIBs will add a 3rd 8-pin power connector? Not that this is relevant to me as I stopped overclocking cards back with the Radeon x1800xt (whew that's still a mouthful).

I'm leaning towards giving the red team a shot again as I've been less than impressed with the driver quality for the 1070 in my machine, but I want to see what RT perf. is like and see anything at all about AMD's DLSS style upsampling. I expect the RT to be slower than NV, but if it's good enough that would be fine.

Much like I didn't universally allow shadows to be enabled in games until the 1070 due to a combination of performance and shadow quality (wonkiness), I feel it'll be a few hardware generations before I commit to allowing RT to be enabled universally in games. For example, RT in Control is relatively good, but RT in Metro: Exodus was horrible.

Regards,
SB
 
I asked a question that required a yes or no answer. Thanks.


What current or announced game is fully rendered using RT?

Your question contained a false premise. I challenged it by offering you a chance to think through the premise further. Your dismissal is unwarranted, should you wish to engage in honest discussion of the subject matter.
 
I'm disappointed they didn't show anything comparable to RTX IO and GPU based decompression. That suggests it will be an Nvidia exclusive feature and thus not widely adopted.
Huh? AMD had all the same capabilities (and more) with HBCC already long before NVIDIA. And yes, they specifically mentioned RX 6000's support DirectStorage, too.
 
6900XT is going to have a hard time matching the 3090 if the 6800XT matches the 3080, there just not that much 8 more CUs can do. Their own comparison had to have both performance enhancing tech no to match it. Not even sure what the 6900XT is for because the 6800XT is just pretty much the same.

Also a bigger gap between 6800XT and 6800 than I expected but price seems firmly in favor of the 6800XT. Seems weird. AMD might just be pushing everyone to buy the 6800XT.

Also no 6700s which is kind of disappointing. Really looking to see how these perform since they are probably going to be priced better than these high end cards.
 
Pardon my ignorance, but I fail to see how the AMD 6800xt beats the NVidia 3080 considering that the 6800xt has a peak performance of 20.74 TFLOPS vs the 3080 29.77 TFLOPS. the difference is almost 10 teraflops wide!
 
So, no real details on RT yet, unfortunately. AMD working on some form of DLSS style upsampling, but no details yet.

On the plus side, performance when not using RT rivals NV's equivalently placed cards and on the top end only 2x8 pin power connectors are used. So, power draw when the card is really pushed should be lower than the competition.

I do wonder if AIBs will add a 3rd 8-pin power connector? Not that this is relevant to me as I stopped overclocking cards back with the Radeon x1800xt (whew that's still a mouthful).

I'm leaning towards giving the red team a shot again as I've been less than impressed with the driver quality for the 1070 in my machine, but I want to see what RT perf. is like and see anything at all about AMD's DLSS style upsampling. I expect the RT to be slower than NV, but if it's good enough that would be fine.

Much like I didn't universally allow shadows to be enabled in games until the 1070 due to a combination of performance and shadow quality (wonkiness), I feel it'll be a few hardware generations before I commit to allowing RT to be enabled universally in games. For example, RT in Control is relatively good, but RT in Metro: Exodus was horrible.

Regards,
SB

Strategically it's a risky play that could pan out if they actually don't enable DXR support at all until further past launch if RT acceleration is worse.

Reviewers tend to want to bench apples to apples, this means that if DXR were enabled they'd use RT settings (with a tendency to max settings), with no support it'll be down sans RT. This means launch reviews will show a more favourable performance comparison.

While post launch benchmarks with RT support can be massaged to promote that it can be done and from a user experience stand point instead of a direct performance comparison.
 
6900XT is going to have a hard time matching the 3090 if the 6800XT matches the 3080, there just not that much 8 more CUs can do. Their own comparison had to have both performance enhancing tech no to match it. Not even sure what the 6900XT is for because the 6800XT is just pretty much the same.

Also a bigger gap between 6800XT and 6800 than I expected but price seems firmly in favor of the 6800XT. Seems weird. AMD might just be pushing everyone to buy the 6800XT.

Also no 6700s which is kind of disappointing. Really looking to see how these perform since they are probably going to be priced better than these high end cards.

Performance difference between 3080 and 3090 is miniscule. 3090 is mainly for creatives who us blender type apps or machine learning requiring massive amount of memory. There is no good use case for 3090 on gaming side when considering tiny perf uplift and huge uplift in price.
 
Pardon my ignorance, but I fail to see how the AMD 6800xt beats the NVidia 3080 considering that the 6800xt has a peak performance of 20.74 TFLOPS vs the 3080 29.77 TFLOPS. the difference is almost 10 teraflops wide!

Flops is very poor measure for gaming performance. It might apply on pure compute loads but even there infinity cache could be game changer.
 
Pardon my ignorance, but I fail to see how the AMD 6800xt beats the NVidia 3080 considering that the 6800xt has a peak performance of 20.74 TFLOPS vs the 3080 29.77 TFLOPS. the difference is almost 10 teraflops wide!

TFLOPs by itself is just a measure of how many FPUs you have x clock speed. If TFLOPs were the end differentiator with respect to real performance all designs would just be as many FPUs as you can cram in at as a high clock speed. It's a useful technical marketing term as people can deal with the number easier but has severe limitation comparing across architectures.
 
Pardon my ignorance, but I fail to see how the AMD 6800xt beats the NVidia 3080 considering that the 6800xt has a peak performance of 20.74 TFLOPS vs the 3080 29.77 TFLOPS. the difference is almost 10 teraflops wide!
Because FLOPS are just theoretical maximum output of the ALUs at FP32. And in case of Ampere, the raw TFLOPS number is misleading for game performance compared to Turing.
 
I think we can actually calculate how much bandwidth there is in the Infinity Cache, from this slide:

D8PnZMA.png


If we assume they're talking about 16Gbps, then the "1.0X 384bit G6" means 768GB/s and the "256bit G6" is 512GB/s.
If the Infinity Cache is 2.17x the 384bit G6, then its output is 1666.56GB/s. Take away the 512GB/s from the 256bit G6 and we get 1154.56GB/s for the Infinity Cache alone.
I'm guessing this is an odd number because this LLC is working at the same clocks as the rest of the GPU.. maybe they're using the 2015MHz game clock.

Nothing odd in this number, it's 2.25 GHz clock and 4096-bit total bus width to the cache.

2.25 GHz * 512 bytes = 1152 GB/s.

1152 GB/s + 512 GB/s = 1664 GB/s

1664 GB/s / 768 GB/s = 2.16666 ~ 2.17
 
I think we can actually calculate how much bandwidth there is in the Infinity Cache, from this slide:

If we assume they're talking about 16Gbps, then the "1.0X 384bit G6" means 768GB/s and the "256bit G6" is 512GB/s.
If the Infinity Cache is 2.17x the 384bit G6, then its output is 1666.56GB/s. Take away the 512GB/s from the 256bit G6 and we get 1154.56GB/s for the Infinity Cache alone.
I'm guessing this is an odd number because this LLC is working at the same clocks as the rest of the GPU.. maybe they're using the 2015MHz game clock.
If you assume the cache is memory-side, the upper bound would be 32 byte/clk * 2 (bidirectional) * 16 channels, based on Navi 10 data points on L2-to-MC bandwidth.
 
Back
Top