AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

A bit of a random question: if a card has 2 x 8pin connectors while it has a TGP of 220W-250W, do we really need to plug in 2 x 8pin? Could it work with 1 x 6 pin + 1 x 8 pin?
 
A bit of a random question: if a card has 2 x 8pin connectors while it has a TGP of 220W-250W, do we really need to plug in 2 x 8pin? Could it work with 1 x 6 pin + 1 x 8 pin?
Are people still actually using PSUs where PCIe leads have something else than 8 or 6+2 plugs?
Most likely the card will check that both are plugged in properly even if 8+6 would be enough for the power draw
 
A bit of a random question: if a card has 2 x 8pin connectors while it has a TGP of 220W-250W, do we really need to plug in 2 x 8pin? Could it work with 1 x 6 pin + 1 x 8 pin?

Assuming the PSU can handle it, yes. The PCIe connector itself provides 75W, each 6-pin provides another 75W and the 8-pin connectors provide 150W. Slot power + 6-pin + 8-pin = 300.

(The two extra pins in an 8-pin connector really only exist to communicate the fact that the cables are thick enough to supply 150W, they are not really used.)

Most likely the card will check that both are plugged in properly

Yeah, you'd probably have to use a 6-to-8 adapter.
 
Are people still actually using PSUs where PCIe leads have something else than 8 or 6+2 plugs?
Most likely the card will check that both are plugged in properly even if 8+6 would be enough for the power draw

Yeah, I've just checked and it's 2 x 6+2. My bad. *facepalm*
 
In general, all power connectors have to be plugged into the GPU even if it does not really need it (i.e. my V56 Nitro+ LE has 3x8 pin, it never goes over 260W at stock settings, but it won't work without the third 8 pin).
 
Paperclips can easily make your 6-pin connectors 8-pin compatible
Well, you can also use those 2xmolex => 8 pin cables, but they are very shoddy in general (and the PSUs which don't have the required connectors will probably die as my 600W, 9 years old FSP PSU did after a month with heavily OC'd Vega), so personally I'd not go with that :)
 
The 256 MiB BAR has been exposed for a while already in Vulkan
The Ryzen 5000/RX 6000 series (Zen 3/RDNA 2) enables the host to seamlessly stream data into any memory region on the device by removing this 256MB limit
Microsoft WDK documentation on GPUMMU states the 256 Mbyte limit is simply a default value set by device firmware to fit into 32-bit virtual adress space, and Vulkan is implemented as a user-mode driver which has to work through WDDM 2.0 kernel-mode driver (DXGK).
The PCIe standard supports BAR Size from 1 MB to 512 GB.

I don't think PCIe Resizable BAR is exclusive to RDNA architecture either; need to look into Linux driver code for a list of supporting hardware though.

knowing that the host is PCI-E 4.0 and the GPU is PCI-E 4.0 ... might enable slightly different optimisations in the driver stack etc...
all non-driver allocated memory is CPU-cacheable and hence you need to enable cache-snooping for that memory on the GPU, which makes the GPU accesses to this memory slower. When you allocate using the driver you can choose for cacheable or non-cacheable. (see the 2 memory types for heap 1)

Yes, it's not about the bandwidth, it's about cache coherence.

If the GPU can use system memory just like its own local video memory (and vice versa, the CPU can use local video memory as if it was system memory), you have to either synchronise the GPU and CPU caches using some cache coherence protocol over PCIe bus - preferably something more complex than bus snooping - or completely disable caching for this physical memory pool, with a detrimental effect on perfromance.
 
Yea it's not anything proprietary, just a slight performance increase similar to smartshift in laptops. And AMD is obviously trying to incentivize people to buy their own CPU+GPU combinations, nothing wrong with it. The competition are free to do something of their own.
From a development standpoint would studios to need to change to specific CPU+GPU combinations? Or is it a "modify and forget" type code change not requiring any testing or validation?
 
Well, you can also use those 2xmolex => 8 pin cables, but they are very shoddy in general (and the PSUs which don't have the required connectors will probably die as my 600W, 9 years old FSP PSU did after a month with heavily OC'd Vega), so personally I'd not go with that :)

I would be hesitant to go with Molex for 8-pin, but I'd probably still try that rather than buying a new PSU.

That's the good part when you have old PCs that aren't worth a dime anymore, so you don't fear being risky with going against all recommendations. Back when I started putting together my own PCs 15 years ago, I too was always overpaying for the PSU and I didn't use adapters because everyone said it was so risky, but today I have used 10+ year old PSUs with 2xMolex to 6 pin (most power hungry was Core 2 Quad and HD 4870) without issues, so today I don't pay much mind towards my PSUs anymore.
And the few times my PSUs have died, the oldest one being a no-name brand from a Pentium 4 prebuilt, it didn't damage the rest of the hardware either.
 
From a development standpoint would studios to need to change to specific CPU+GPU combinations? Or is it a "modify and forget" type code change not requiring any testing or validation?
As per AMD it should provide some gains without any optimization from the developer end, but can provide even bigger gains if specifically optimized for. From what we've seen so far, it doesn't seem to be something than can't be enabled on Intel & Nvidia, so it might be possible on those platforms in the future. We should find out more information during the architecture deep dive around the launch.
 
Yes, it's not about the bandwidth, it's about cache coherence.

If the GPU can use system memory just like its own local video memory (and vice versa, the CPU can use local video memory as if it was system memory), you have to either synchronise the GPU and CPU caches using some cache coherence protocol over PCIe bus - preferably something more complex than bus snooping - or completely disable caching for this physical memory pool, with a detrimental effect on perfromance.

So is that what SAM is actually doing then? Allowing each device to see the others memory pool as if it were it's own and keeping the caches between both CPU and GPU coherent? So this becomes similar to a UMA?
 
So is that what SAM is actually doing then? Allowing each device to see the others memory pool as if it were it's own and keeping the caches between both CPU and GPU coherent? So this becomes similar to a UMA?

No, the caching stuff was available before this already since forever (pre-GCN at least). The SAM change is that the CPU can access 100% of the GPU memory directly by resizing the BAR.


As an example that can already be enabled on many X399 (threadripper) boards, though not under the Smart Access Memory name, and maybe AMD did some driver optimizations to make better use of it in Direct3D 9/10/11.
 
Microsoft WDK documentation on GPUMMU states the 256 Mbyte limit is simply a default value set by device firmware to fit into 32-bit virtual adress space, and Vulkan is implemented as a user-mode driver which has to work through WDDM 2.0 kernel-mode driver (DXGK).
The PCIe standard supports BAR Size from 1 MB to 512 GB.

I don't think PCIe Resizable BAR is exclusive to RDNA architecture either; need to look into Linux driver code for a list of supporting hardware though.

That's interesting... So what is stopping you, in the near future, from using a 512GB NVME drive and through DirectStorage, "connecting it to the GPU" and having SAM reconfigure the BAR to include some of the NVME drive.
I know DirectStorage won't work like that, but would the above scenario even be beneficial?
 
Above 4G decoding option is available on all AM4 motherboards, AFAIR, but I actually doubt that it's the only requirement for SAM. For some reason, xGMI links are present and it'd be strange if they are completely unused, considering that HSA was the 'idee fixe' for AMD not that long ago.
 
Allowing each device to see the others memory pool as if it were it's own and keeping the caches between both CPU and GPU coherent? So this becomes similar to a UMA?

This is the impression I've got from Tom's Hardware article and Dark Side of Gaming article, which basically say that "CPU and GPU gain full access to each other’s memory".

They also say it's similar to Raven Ridge APUs and Infinity Architecture 3, though the Financial Analyst Day 2020 presentations only mention enterprise-grade EPYC and CDNA (Radeon Instinct) chips having Infinity Architecture 3.


AMD's definition of Smart Access Memory in PR materials is different (and more ambiguous):
In conventional Windows-based PC systems, processors can only access a fraction of graphics memory (VRAM) at once, limiting system performance. With AMD Smart Access Memory, the data channel gets expanded to harness the full potential of GPU memory - removing the bottleneck to increase performance.
People interpreted it as support for Resizable BAR and PCIe 4.0 bandwidth - but AFAIK these two technologies are not really exclusive to RDNA2 (RX 6000) and Zen3 (Ryzen 5000).


No, the caching stuff was available before this already since forever (pre-GCN at least)
Only for system memory pools, but not for local video memory pools.

https://gpuopen.com/learn/vulkan-device-memory/
https://gpuopen.com/events/gdc-2018-presentations/
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkMemoryPropertyFlagBits.html
https://computergraphics.stackexchange.com/questions/7504/vulkan-how-does-host-coherence-work
 
Last edited:
So what is stopping you from ... using a 512GB NVME drive and ... having SAM reconfigure the BAR to include some of the NVME drive
NVMe is a block I/O protocol for disk devices - it uses LBA sector numbers to access disk data, which are remapped to actual flash memory addresses by the NVMe controller. It can use PCIe memory mapping for the optional Host Memory Buffer (HMB) feature in entry-level DRAM-less controllers, but flash memory is not visible to the host.
 
Last edited:
Back
Top