AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

So... will there be any stock available or is this gonna enrage gamers everywhere :LOL:

My attempts to get a 3070 for msrp price failed. Maybe destiny wants me to go team red

I bet they will run out of stock as well. It's the nature of the beast. I still don't have ps5 preorder :/
 
Start watching at 18:25 to get the gist of what is being discussed (DLSS)
However take note of what he says at 18:38. AMD laughs at him when he explains what he thinks the process is for DLSS. Asking if AMD would do something similiar.
 
I bet they will run out of stock as well. It's the nature of the beast. I still don't have ps5 preorder :/

Yar, that just seems to be the nature of the beast this year. AMD is booked solid for laptop chips and console APUs, no reason they won't sell out of GPUs either.
 
However take note of what he says at 18:38. AMD laughs at him when he explains what he thinks the process is for DLSS. Asking if AMD would do something similiar.
His explanation of DLSS is rather old as well, isn't it? AFAIK there's no per-game training anymore with DLSS 2.
 
His explanation of DLSS is rather old as well, isn't it? AFAIK there's no per-game training anymore with DLSS 2.

Correct. Original dlss was hallucinating details on per game trained neural network. DLSS2.0 seems to be generic TAA solution for picking right samples from multiple frames + then some. What I suspect is that there still could be new issus exposed by new games requiring retraining but those improvements would then carry over to all titles. Or perhaps retraining/fixing needs caused by finding glitches on existing games. People go to great lengths to find that one spot in the game that breaks dlss2.0.

 
I'm guessing Smart Memory Access won't be an "open" solution. Interesting how leverage changes viewpoints on "open" vs "walled garden" approaches isn't it?
I'd guess Smart Memory Access could be based on open-standard CXL/GenZ cache-coherent memory access protocols, which work over PCIe. Preliminary PCIe CXL support has recently been added to Insider Preview WDK for Windows Iron.
Infinity Fabric 3 with full CPU/GPU cache coherency has been announced by AMD as a key feature for HP/Cray El Capitan supercomputer, but that's EPYC/CDNA territory.


AMD GPUs up to now have shipped with a default 256MB "aperture" VRAM BAR meaning the CPU can only directly access 256MB of VRAM. And that gets exposed in Vulkan/DX12 as a separate memory heap marked as "Device Local + Host Visible", even though technically it's a subset of the main device local VRAM heap.
But AMD dGPUs going back all the way to the OG GCN (GFX6: Tahiti, Verde, etc.) actually support PCIE BAR resizing up to the full size of VRAM (assuming it's a power-of-two size).
AFAIK 256 MByte aperture is a legacy driver/runtime convention that's only needed on 32-bit platforms.
GPUMMU virtual memory model in WDDM 2.x supports resizing the system memory PCIe aperture segment to match the entire local video memory. It also supports GPU hardware-specific 64 Kbyte pages.
WDDM 2.9 in Insider WDK for Windows Iron adds support for 2 MByte memory pages (large-page) and contiguous memory allocation.

According to Toms, "Smart Memory Access" requires specific developer support. So it's curious why AMD are including it here for older games which presumably don't have it.
Tom's Hardware said they need to "optimize", but it rather sounds like a new development. Vulkan / Direct3D 12 define specific memory heap types to explicitly manage by the programmer, and AFAIK device-visible device-coherent system memory is an yet-undefined type, while device-local host-coherent memory has only been available on integrated GPUs and thus it's not videly used by high-end games.

https://gpuopen.com/events/gdc-2018-presentations/
Memory management in Vulkan and DX12
Adam Sawicki (AMD)
(Powerpoint slides)
 
Last edited:
Tom's Hardware said they need to "optimize", but it rather sounds like a new development. Vulkan / Direct3D 12 define specific memory heap types to explicitly manage by the programmer, and AFAIK device-visible device-coherent system memory is an yet-undefined type, while device-local host-coherent memory has only been available on integrated GPUs and thus it's not videly used by high-end games.

The 256 MiB BAR has been exposed for a while already in Vulkan : http://vulkan.gpuinfo.org/displayreport.php?id=9781#memorytypes
 

I'm always dubious of these recommendations. Back in the day (10-15 years ago) they were always higher because there were some really crappy PSUs out there, but the market has seemingly changed.

I have a 660w Seasonic Platinum PSU. In my eyes its about as capable as a 750w average PSU and potentially an 850w crappy one if those still exist to the same extent of badness.

I'm thinking my 660W platinum should be fine in place of a 750w for the the 6800XT, but I'm not keen to wire up an whole new PC to find out that's not the case. I wish they'd just list Amps needed on the 12V rail or something...
 
A little history behind AMD's "Smart Access Memory" technology if the audience will ...

AMD's statement that CPUs were only able to access 256MB of video memory is rooted behind their pinned memory functionality. Before pinned memory existed, drivers did automatic memory management for the GPUs for many of the older gfx APIs. During sometime when AMD were designing the GCN architecture, it became apparent to them that the API side overhead involved in letting the driver do automatic memory management was increasing over time. Creating resources on older APIs is a synchronous operation which caused stalls in the driver so doing memory allocations on the fly started incurring measurable performance penalties ...

The introduction of pinned memory gives the device (GPU) the capability to bypass the API and use resources created on the host (CPU) side. Pinned memory is an ideal feature on AMD HW to implement many of the 3D application's data streaming solutions since there's no driver overhead involved and it is an asynchronous operation. Pinned memory serves as one of AMD's many features to designing a model for AZDO (Approaching Zero Driver Overhead) ...

On Vulkan, pinned memory is exposed via memory heap 2 with memory type 2 so it is a special 256MB memory region that is device local/host visible ...

The Ryzen 5000/RX 6000 series (Zen 3/RDNA 2) enables the host to seamlessly stream data into any memory region on the device by removing this 256MB limit which opens up potential to reduce even more driver overhead on AMD HW. Smart Access Memory is likely an extension to AMD's prior technology known as pinned memory ...
 
A little history behind AMD's "Smart Access Memory" technology if the audience will ...

AMD's statement that CPUs were only able to access 256MB of video memory is rooted behind their pinned memory functionality. Before pinned memory existed, drivers did automatic memory management for the GPUs for many of the older gfx APIs. During sometime when AMD were designing the GCN architecture, it became apparent to them that the API side overhead involved in letting the driver do automatic memory management was increasing over time. Creating resources on older APIs is a synchronous operation which caused stalls in the driver so doing memory allocations on the fly started incurring measurable performance penalties ...

...

On Vulkan, pinned memory is exposed via memory heap 2 with memory type 2 so it is a special 256MB memory region that is device local/host visible ...

Pinned memory is actually totally unrelated to Smart Access Memory. Pinned memory is system memory that is accessible from the GPU but not allocated by the driver. That is useful if there is something else allocating the memory that is not easily changed to let the driver allocate the memory, and hence mostly a convenience feature. CUDA/ROCm support a generalization where any allocated memory can be accessed on the GPU.

The Vulkan functionality is not the 256 MiB heap but the functionality exposed by the VK_EXT_external_memory_host extension. Note that this typically actually results in slightly worse CPU performance as all non-driver allocated memory is CPU-cacheable and hence you need to enable cache-snooping for that memory on the GPU, which makes the GPU accesses to this memory slower. When you allocate using the driver you can choose for cacheable or non-cacheable. (see the 2 memory types for heap 1)
 
Can sombody explain me why all people guessing that Navi 21 has only 4 Rasterizers? I cant see why it not should be 8 Rasterizers:

Navi21_Navi10_2de1730741a364ff9.png

 
Last edited:
Did anyone talk about that USB-C connector? They didn't even mention HDMI 2.1, did they?
There's quite a bit of info AMD is still keeping to themselves.

Don't take twitter for granted, but use AMD-official info instead: From one of Jawed's post with the AMD-techspecs compared. Almost at the bottom:
https://www.amd.com/en/products/specifications/compare/graphics/10516,10521,10526
I took a screenshot, when it was first posted. There it only said HDMI - Yes. It has been expanded since. :)

edit: Curiously though, here the 300 watts are labelled as GPU power, not Total Board/Graphics Power and they recommend 100 Watt more PSU-oompf for the 6900XT, which is the same 300 watts as the 6800 XT.
 

Attachments

  • Big Navi HMDI 2.1.PNG
    Big Navi HMDI 2.1.PNG
    6.2 KB · Views: 18
Last edited:
Don't take twitter for granted, but use AMD-official info instead: From one of Jawed's post with the AMD-techspecs compared. Almost at the bottom:
https://www.amd.com/en/products/specifications/compare/graphics/10516,10521,10526
I took a screenshot, when it was first posted. There it only said HDMI - Yes. It has been expanded since. :)

edit: Curiously though, here the 300 watts are labelled as GPU power, not Total Board/Graphics Power and they recommend 100 Watt more PSU-oompf for the 6900XT, which is the same 300 watts as the 6800 XT.
At least, AMD clearly said they managed same power for 80 Cu 6900 XT, 65% more than RDNA1 in that case and their most efficient RX 6000 to date, supposedly just binning.
 
300W TBP does not necessarily mean that the GPU is always PL-bound - if you undervolt a vega enough (like 1.05V set, 1V with vdroop), it's stays mostly in the 200W range in most scenarios (not including old or broken games on UE4) even if the PL is maxed out (so, for my card it's 260W + 50% = 390W). Also, it's the usual SKU segregation tactics by AMD - clocks are set low for the cheaper SKUs so the performance gap is more noticeable.

So, I'd hope they won't do it like they did on 5600xt and lock top frequency and power limit at ridiculously low values for 6800 non-XT (which was impossible to circumvent until recently, and that's thanks to the damned miners, I guess, not graphics enthusiasts).
 
Back
Top