AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Well one of my theories for the PC to catch up with next-gen consoles in I/O speed is that future graphics cards may get a direct connection to a fast SSD, without having to send data through the main system RAM.
AMD has had that capability at least since Fiji. It's been implemented so far in Fiji and Vega based Radeon Pro SSG models.
 
AMD has had that capability at least since Fiji. It's been implemented so far in Fiji and Vega based Radeon Pro SSG models.
Sorry, should have written future consumer graphics cards.


Here's a recent tweet from Tim Sweeney showing how the current consoles could press the PC market to create PCIe cards that hold NVMe SSDs and connect directly to GPUs (and GPUs capable of hosting this bus):

 
Well one of my theories for the PC to catch up with next-gen consoles in I/O speed is that future graphics cards may get a direct connection to a fast SSD, without having to send data through the main system RAM.
On Linux, amdgpu supports P2PDMA and PTDMA is in the queue
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.2-AMD-Zen-P2P-DMA
https://www.phoronix.com/scan.php?page=news_item&px=AMD-PTDMA-Driver-No-Queue-V3
On Zen hosts, the GPU can do Passthrough and Peer to peer DMA with other devices over the PCI bus for memory and IO operations. P2PDMA was merged some time ago, and PTDMA is in the queue for 5.8, will miss 5.7 probably.

XGMI can only connect to peer systems, it is the offchip infinity fabric.
I saw some link rate values for XGMI with max rate upto 25Gbps

typedef enum {
XGMI_LINK_RATE_2 = 2, // 2Gbps
XGMI_LINK_RATE_4 = 4, // 4Gbps
XGMI_LINK_RATE_8 = 8, // 8Gbps
XGMI_LINK_RATE_12 = 12, // 12Gbps
XGMI_LINK_RATE_16 = 16, // 16Gbps
XGMI_LINK_RATE_17 = 17, // 17Gbps
XGMI_LINK_RATE_18 = 18, // 18Gbps
XGMI_LINK_RATE_19 = 19, // 19Gbps
XGMI_LINK_RATE_20 = 20, // 20Gbps
XGMI_LINK_RATE_21 = 21, // 21Gbps
XGMI_LINK_RATE_22 = 22, // 22Gbps
XGMI_LINK_RATE_23 = 23, // 23Gbps
XGMI_LINK_RATE_24 = 24, // 24Gbps
XGMI_LINK_RATE_25 = 25, // 25Gbps
XGMI_LINK_RATE_COUNT
} XGMI_LINK_RATE_e;

There is support to handle I2C communication with really high power VRMs, not sure if this was always there. But this new Infineon chip XPDE132G5 (publicly available last year March) can provide 500-1000A.

Code:
  I2C_CONTROLLER_PROTOCOL_VR_XPDE132G5,
  I2C_CONTROLLER_PROTOCOL_VR_IR35217,
  I2C_CONTROLLER_PROTOCOL_TMP_TMP102A,
  I2C_CONTROLLER_PROTOCOL_INA3221,

upload_2020-6-2_19-2-13.png
 
On Linux, amdgpu supports P2PDMA and PTDMA is in the queue
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.2-AMD-Zen-P2P-DMA
https://www.phoronix.com/scan.php?page=news_item&px=AMD-PTDMA-Driver-No-Queue-V3
On Zen hosts, the GPU can do Passthrough and Peer to peer DMA with other devices over the PCI bus for memory and IO operations. P2PDMA was merged some time ago, and PTDMA is in the queue for 5.8, will miss 5.7 probably.

XGMI can only connect to peer systems, it is the offchip infinity fabric.
I saw some link rate values for XGMI with max rate upto 25Gbps

typedef enum {
XGMI_LINK_RATE_2 = 2, // 2Gbps
XGMI_LINK_RATE_4 = 4, // 4Gbps
XGMI_LINK_RATE_8 = 8, // 8Gbps
XGMI_LINK_RATE_12 = 12, // 12Gbps
XGMI_LINK_RATE_16 = 16, // 16Gbps
XGMI_LINK_RATE_17 = 17, // 17Gbps
XGMI_LINK_RATE_18 = 18, // 18Gbps
XGMI_LINK_RATE_19 = 19, // 19Gbps
XGMI_LINK_RATE_20 = 20, // 20Gbps
XGMI_LINK_RATE_21 = 21, // 21Gbps
XGMI_LINK_RATE_22 = 22, // 22Gbps
XGMI_LINK_RATE_23 = 23, // 23Gbps
XGMI_LINK_RATE_24 = 24, // 24Gbps
XGMI_LINK_RATE_25 = 25, // 25Gbps
XGMI_LINK_RATE_COUNT
} XGMI_LINK_RATE_e;

There is support to handle I2C communication with really high power VRMs, not sure if this was always there. But this new Infineon chip XPDE132G5 (publicly available last year March) can provide 500-1000A.

Code:
  I2C_CONTROLLER_PROTOCOL_VR_XPDE132G5,
  I2C_CONTROLLER_PROTOCOL_VR_IR35217,
  I2C_CONTROLLER_PROTOCOL_TMP_TMP102A,
  I2C_CONTROLLER_PROTOCOL_INA3221,

View attachment 3973

Add to that Nvidia's work on GPUDirectStorage and it seems very possible that Microsoft's DirectStorage will implement a standardised API for DMA transfers between NV memory and VRAM. It seems that current gen architectures already support this at the hardware level so this seems like an API problem to resolve.
 
Add to that Nvidia's work on GPUDirectStorage and it seems very possible that Microsoft's DirectStorage will implement a standardised API for DMA transfers between NV memory and VRAM. It seems that current gen architectures already support this at the hardware level so this seems like an API problem to resolve.
Windows drivers for NVidia hardware don't appear to allow any form of externall DMA to write directly to VRAM yet though? The driver does have the capability to map external (device) memory into GPU visible address space for access from GPU, but lacks the option to expose VRAM for DMA by a remote DMA engine. Driver capabilities tend to differ a lot between platforms for NVidia, especially for features driven by the CUDA team as they don't have portability in mind, so this may not become a standard feature any time soon.

For AMD, that part is already in place, only DirectStorage API itself is missing.

Even with DirectStorage API this isn't going to look like GPUDirectStorage though, is it? Not like you could treat a whole file as memory mapped, and count on page faults fetching it into page cache? Just explicit load into target memory, but explicitly initiated from CPU.
 
Last edited:
Windows drivers for NVidia hardware don't appear to allow any form of externall DMA to write directly to VRAM yet though? The driver does have the capability to map external (device) memory into GPU visible address space for access from GPU, but lacks the option to expose VRAM for DMA by a remote DMA engine. Driver capabilities tend to differ a lot between platforms for NVidia, especially for features driven by the CUDA team as they don't have portability in mind, so this may not become a standard feature any time soon.

Yes this would undoubtedly require new drivers from Nvidia. GPUDirectStorage hasn't been launched publicly yet on any platform but it does demonstrate the hardware capability and the fact that Nvidia hasn't been idle in that space. It seems that if they did want to bring this into their gaming lineup (which may become more necessity than preference next gen) then they're well positioned to do so, either via their own API/CUDA extensions or preferably via something common like Microsoft's DirectStorage.

For AMD, that part is already in place, only DirectStorage API itself is missing.

Even with DirectStorage API this isn't going to look like GPUDirectStorage though, is it? Not like you could treat a whole file as memory mapped, and count on page faults fetching it into page cache? Just explicit load into target memory, but explicitly initiated from CPU.

Outside of HBCC (and actually including it) I've not been able to find much information on how AMD would handle DMA between SSD and VRAM. I assumed this would be handled in a similar way to GPUDirectStorage but with the GPU's own DMA engine (the HBCC itself) executing the transfers rather than the storage DMA engines doing the work like NV's solution. Is that not the case? AMD seems to hint at the CPU being bypassed in relation to HBCC.
 
with the GPU's own DMA engine (the HBCC itself) executing the transfers rather than the storage DMA engines doing the work like NV's solution. Is that not the case? AMD seems to hint at the CPU being bypassed in relation to HBCC.
I'm not sure if I misunderstood NVMe protocol, but it shouldn't have any addressable buffers on the storage, so it's not an applicable target for DMA by a different peer. You issue requests, and the NVMe device then performs DMA, pulling or pushing as required.

That's still bypassing page cache in the CPU entirely, but something has to issue the requests. It would be possible for the GPU to issue them, but if you need to be aware of file system implementation details (hello fragmentation, would be much easier when treating storage as raw block device), then you can at most handle a fault on CPU, which maps linear address space via filesystem to block, and then issues DMA request to storage. Maybe it would at least be possible to export the address mapping to the GPU (and keeping it locked in file system), so the GPU wouldn't need to be aware of implementation details? Would still requires the GPUs firmware to speak NVMe, which sounds quite complicated.
 
I'm not sure if I misunderstood NVMe protocol, but it shouldn't have any addressable buffers on the storage, so it's not an applicable target for DMA by a different peer. You issue requests, and the NVMe device then performs DMA, pulling or pushing as required.

That's still bypassing page cache in the CPU entirely, but something has to issue the requests.

Yes this aligns with what NVidia are saying the GPU DirectStorage blog. It seems I've just misunderstood how HBCC would work with NV storage. If it uses DMA at all then it seems it would have to rely on remote DMA engines.
 
https://wccftech.com/amd-big-navi-rdna-2-gpu-radeon-rx-graphics-cards-pc-first-consoles-later/


“There’s a lot of excitement for Navi 2, or what our fans have dubbed as the Big Navi“

“Big Navi is a halo product”

“Enthusiasts love to buy the best, and we are certainly working on giving them the best”.

“RDNA 2 architecture goes through the entire stack“

"it will go from mainstream GPUs all the way up to the enthusiasts and then the architecture also goes into the game console products... as well as our integrated APU products.

"This allows us to leverage the larger ecosystem, accelerate the development of exciting features like ray tracing and more."

via AMD's CFO, David
 
Well, it seems like this guy is waiting just a little longer to procure a GPU then. However I do wonder how quickly AMD will saturate the lineup with RDNA2 SKU's. Though I'm guessing they're launching their higher end first, giving time for current inventory to dwindle if they're set for an all out replacement. Would AMD have to do a nVidia-esque SUPER series come Ampere if RDNA2 launches well in advance though, that's my question. It's exciting times in the GPU business right now, that's for sure.
 
Well, it seems like this guy is waiting just a little longer to procure a GPU then. However I do wonder how quickly AMD will saturate the lineup with RDNA2 SKU's. Though I'm guessing they're launching their higher end first, giving time for current inventory to dwindle if they're set for an all out replacement.
That would seem reasonable, but parallel availability of generations in the market are also quite common, if nVidias Ampere cards come out in a wide span of segments, it makes sense to meet them with updated product across the board. I’m sure the bean counters at AMD and their partner manufacturers are busy trying to come up with an optimal timing strategy. As always, there will be deals to be had on the no-longer-so-new-and-shiny as well when the new lines come out, particularly on the second hand market as the compulsive upgraders buy the newest to tinker with.
 
Well, it seems like this guy is waiting just a little longer to procure a GPU then. However I do wonder how quickly AMD will saturate the lineup with RDNA2 SKU's. Though I'm guessing they're launching their higher end first, giving time for current inventory to dwindle if they're set for an all out replacement. Would AMD have to do a nVidia-esque SUPER series come Ampere if RDNA2 launches well in advance though, that's my question. It's exciting times in the GPU business right now, that's for sure.

I'd wager AMD will have their high end and then use bad dies for a model right under it like a vega 56/64 type deal. Then have something for the gap between the 5700xt and the halo product. If they are putting out a $1k card and maybe a $800 card they need something with raytracing under that. Nvidia has the 2000 series for that
 
https://wccftech.com/amd-big-navi-rdna-2-gpu-radeon-rx-graphics-cards-pc-first-consoles-later/


“There’s a lot of excitement for Navi 2, or what our fans have dubbed as the Big Navi“

“Big Navi is a halo product”

“Enthusiasts love to buy the best, and we are certainly working on giving them the best”.

“RDNA 2 architecture goes through the entire stack“

"it will go from mainstream GPUs all the way up to the enthusiasts and then the architecture also goes into the game console products... as well as our integrated APU products.

"This allows us to leverage the larger ecosystem, accelerate the development of exciting features like ray tracing and more."

via AMD's CFO, David
Who is David Kumar :rolleyes:?
AMD's CFO is Devinder Kumar.

Actual transcript is here.
https://seekingalpha.com/article/43...rica-securities-global-technology?part=single
 
“RDNA 2 architecture goes through the entire stack“
I wonder if this means Navi 10 and Navi 14 and will also be EOL'd in 2020 / 2021.
It would make these rather short-lived chips, if true. Especially for the current times where AMD has been dragging the same chips on rebrands for 3-4 year cycles.
The RX 5600 SKUs should be the fastest at going away, since they were introduced January of this year.
 
I wonder if this means Navi 10 and Navi 14 and will also be EOL'd in 2020 / 2021.
It would make these rather short-lived chips, if true. Especially for the current times where AMD has been dragging the same chips on rebrands for 3-4 year cycles.
The RX 5600 SKUs should be the fastest at going away, since they were introduced January of this year.
wonder what the demand for non ray traced video cards will be going into 2021 with two consoles supporting it. Also if they were able to increase performance and power it may make for better low end cards. That can use less power , require less expensive cooling and hit lower price points
 
I've been pondering if AMD will actually introduce fully fledged RDNA2 SKU's in the lower end. Below a certain performance level RT simply becomes detrimental to the overall experience as resolution and frame rate takes a hit as it's enabled. nVidia's 1660, while cheaper thanks to high ROI, is in line with that logic. Besides which the fusing off of RT hardware could be a potential energy saver, which is more important in low-end segments meant for set top boxes and NUC-like devices etc.

Of course, selling hardware with features it will never be able to use to tick marketing boxes has always been a thing.
 
Status
Not open for further replies.
Back
Top