AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Look at the latest RGT video, when he "doubles down" at the Infinity cache + 256 bit bus thing.
You mean the PS5 presentation where they compare RDNA2 CU's (=PS5) to PS4 CU's, RDNA2 (=PS5) FLOPS to PS4 FLOPS etc?
Why are you assuming 12MB of cache for the CPU? 8MB L3 + 512KB L2 per core? Don't the L1 count for ESRAM?

Desktop Zen2 has 32MB L3 + 8x (512KB + 64KB) = 36MB "GameCache".
That would leave 40MB total cache for the 52CUs iGPU, which when compared to Navi 10's 4.5MB total ESRAM it's already a beast.
IMO there's little reason to believe the CPU in the SeriesX doesn't have all 32MB L3 of the desktop/server version, especially since Microsoft will be using the SoC for Azure and XCloud.
MS confirmed at HotChips that they're using the "mobile variant" of Zen2 aka 4 MB L3 per CCX (=8 MB total), 512 KB L2 per core (=4MB total)
 
You mean the PS5 presentation where they compare RDNA2 CU's (=PS5) to PS4 CU's, RDNA2 (=PS5) FLOPS to PS4 FLOPS etc?

You're right, my bad. I must stop seeing videos late in the night. I sincerely read "PS5" and as they told several times they asked for quite a lot of customization in their APU I thought they asked for some extreme things there. Sorry for my misunderstanding.
 
Look at the latest RGT video, when he "doubles down" at the Infinity cache + 256 bit bus thing.

Since 4 months ago we can find SMU 11.7 FW has added USB support and I2C control to new power modules to power the USB device and there is addition of new Azalia devices for it. Also DCN added support to output via this device.

With this commit USB support is added specifically for Sienna.
https://lists.freedesktop.org/archives/amd-gfx/2020-September/053671.html
The Teaser render has USB connector. This is a good hint that chip is Sienna. Navy has no USB support in code (for now at least). Also Navi10 chip has USB support (WX5700) .

If it is Sienna it will be HBM. Remains to be seen if totally-not-RedGamingTech or totally-not-Paul is right or the Linux commits are.
 
Last edited by a moderator:
It's occurred to me that PS5 and Navi 2x might both have the "Infinity Cache" architecture (both seemingly being 256-bit) while XSX doesn't.

Microsoft designed a GPU/CPU split in the memory architecture, with 10GB dedicated to the GPU while receiving the full 560GB/s, but Sony decided to use a "flat" design. So Infinity Cache could be seen as salve for the "relatively low" 448GB/s PS5 has, and presumably Navi 2x (though Navi 21 with a 256-bit bus could have as much as 512GB/s).
Microsoft described the memory allocation as being GPU-optimized, but I don't think either region in memory is exclusive to either client. It's more a question of how many channels a given part of the address space is striped across, and possibly a difference in the granularity of the striping. The CPU side's interconnect bandwidth seems to be sized so that it cannot exceed the bandwidth of either mode, whereas the GPU can.

About limited bandwidth.
Could AMD use some kind of texture compression?
Compression is already done in various forms. There's DCC and compressed texture formats. There's a higher priority on random access and speed versus disk compression.

Sony bragged about compressing textures, to half size, lossless.
They use it to transfer data, from NVMe to Vram, at double speed.

I wonder if same technique could be used to transfer data from Vram to GPU?
Som kind av hardware circuit, to decompress, with very low latency, and maybe a memory pool, for decompressed data.
The console compression methods suitable for disk IO work with data rates that are on average <10GB/s, versus memory controllers running in the 200+ GB/s or TB/s on-die. Latency-wise, it's potentially up to milliseconds versus hundreds of nanoseconds. Scaling disk compression to that level is prohibitive in terms of the amount of hardware and overhead. A disk access from the perspective of the primary execution loop is still an intermittent and low-bandwidth operation, and the compression methods are designed to take advantage of the different latency and more linear access to achieve more sustained and greater compression.

I am quite skeptical both about the "Infinity cache" than the "clock instability" over PS5 clocks, also because in the same video of RGT there are screens of the PS5 presentation where it is explicitely stated that PS5's CUs are diffrent respect to desktop RDNA2 Cu's and that the latter are quite "beefier" than the PS5's ones. So as they are different, they are quite unlikely to have the same issues if not process related.
Every design has a clock ceiling. If RDNA2 is a common foundation for all these products, it wouldn't be unreasonable to see similar design maximums.

XSX has 76 MB of cache, if you take out the CPU cache it is around 64 MB for the 2 Shader Engines.
For a 4 SE 80 CU Navi21, 128 MB is nothing extra compared to the 64MB for 2 SE (32MB/SE) that XSX has.
At Hot Chips though, MS never said a thing where those 64MB of cache were located or used for.
Wasn't the number for SRAM, rather than cache?
 
Wasn't the number for SRAM, rather than cache?
To be honest I was relying on this Eurogamer astatement to mean cache, because of the phrase across the entire SoC. Some CPU vendors do call their eDRAM or SRAM as cache.

There are customisations to the CPU core - specifically for security, power and performance, and with 76MB of SRAM across the entire SoC, it's reasonable to assume that the gigantic L3 cache found in desktop Zen 2 chips has been somewhat reduced.
 
SRAM is a circuit type that stores state. For that reason, cache is predominantly SRAM, but cache is a particular use case of storage that is automatically mapped to memory. However, there are many other forms of stored state. If it's a register, buffer, queue, scratchpad, data share, table, node, backing store, or other object that is read/written, it's most often SRAM as well (if the state doesn't need to persist past power down).
 
Since 4 months ago we can find SMU 11.7 FW has added USB support and I2C control to new power modules to power the USB device and there is addition of new Azalia devices for it. Also DCN added support to output via this device.

With this commit USB support is added specifically for Sienna.
https://lists.freedesktop.org/archives/amd-gfx/2020-September/053671.html
The Teaser render has USB connector. This is a good hint that chip is Sienna. Navy has no USB support in code (for now at least). Also Navi10 chip has USB support (WX5700) .

If it is Sienna it will be HBM. Remains to be seen if totally-not-RedGamingTech or totally-not-Paul is right or the Linux commits are.
I'd believe Linux commits over "sources" every day of the week
 
SMU 11.7 FW has added USB support and I2C control to new power modules to power the USB device and there is addition of new Azalia devices for it.
Azalia as in Intel High Definition Audio?
Could this be USB-C Audio, similar to what we see in smartphones without jack?
Maybe there's a Tempest in RDNA2 PC GPUs? I mean TrueAudio already failed twice, but third time's a charm and this time they have raytracing.
Though honestly I don't know why we'd need sound coming from the graphics card. So far the motherboard's USB buses have been plenty enough.


New SRAM is a circuit type that stores state. For that reason, cache is predominantly SRAM, but cache is a particular use case of storage that is automatically mapped to memory. However, there are many other forms of stored state. If it's a register, buffer, queue, scratchpad, data share, table, node, backing store, or other object that is read/written, it's most often SRAM as well (if the state doesn't need to persist past power down).
But is non-cache SRAM ever a substantial amount compared to cache SRAM in an APU/CPU/GPU?
 
Azalia as in Intel High Definition Audio?
Could this be USB-C Audio, similar to what we see in smartphones without jack?
Maybe there's a Tempest in RDNA2 PC GPUs? I mean TrueAudio already failed twice, but third time's a charm and this time they have raytracing.
Though honestly I don't know why we'd need sound coming from the graphics card. So far the motherboard's USB buses have been plenty enough.



But is non-cache SRAM ever a substantial amount compared to cache SRAM in an APU/CPU/GPU?

Given the dead of Virtual Link, it would be hilarious if that was the use of the USB TypeC port.
 
But is non-cache SRAM ever a substantial amount compared to cache SRAM in an APU/CPU/GPU?
An RDNA CU has 256KB of SRAM in its register file alone. Navi 10's 40 CUs have roughly 10 MB of register file versus a 4MB L2.
GPUs generally have an inverted relationship between register file and cache compared to CPUs, with the recent exception being Nvidia's A100.
 
Aren't modern CPUs more than capable enough for decoding?

Depends on what you’re doing. For real time streaming fast H.264/5 encode and decode are critical. Will be interesting to see how Navi2x improves things there. If I’m not mistaken AMD has been lagging behind NVENC in both quality and performance for a few generations now.
 
Which is better for latency with streaming services live PS Now or Xcloud: CPU (software accelerated), IGP (hardware accelerated), or GPU decode?
 
If it is Sienna it will be HBM.
Supposedly the initial Linux-commit with HBMs was just placehoilder or something, and there has since been either Linux-commits or other information of similar nature saying it's GDDR. We'll see soon enough.
 
Microsoft described the memory allocation as being GPU-optimized, but I don't think either region in memory is exclusive to either client. It's more a question of how many channels a given part of the address space is striped across, and possibly a difference in the granularity of the striping. The CPU side's interconnect bandwidth seems to be sized so that it cannot exceed the bandwidth of either mode, whereas the GPU can.
My point was that Microsoft chose this architecture while also going for "relatively high" 560GB/s. My theory is that PS5/Navi target lower bandwidth in combination with this rumoured, humongous, Infinity Cache, but it's not present in XSX since Microsoft chose to solve the problem in a different way.

If true that would explain why the XSX die shot doesn't show an obviously humongous lump of cache (whether 128MB or smaller). Of course the cache might be "stealthed", distributed in a fine-grained way making it very hard to discern...

I'm simply trying to describe scenarios where the XSX die shot appears to confute the "128MB Infinity Cache" rumour.
 
My point was that Microsoft chose this architecture while also going for "relatively high" 560GB/s. My theory is that PS5/Navi target lower bandwidth in combination with this rumoured, humongous, Infinity Cache, but it's not present in XSX since Microsoft chose to solve the problem in a different way.

If true that would explain why the XSX die shot doesn't show an obviously humongous lump of cache (whether 128MB or smaller). Of course the cache might be "stealthed", distributed in a fine-grained way making it very hard to discern...

I'm simply trying to describe scenarios where the XSX die shot appears to confute the "128MB Infinity Cache" rumour.
It seems like a sizable investment to overcome a 20% bandwidth deficit, for a GPU that is in many metrics ~20% lower in performance.
Area-wise, I think the extra GDDR6 channels would be more compact.
Hopefully, such a cache is more generally useful than just enhancing the ROPs, or perhaps a PS5 with such a cache could opt for a smaller capacity since it may be operating at sub-4K in general versus the high resolutions enthusiast PC cards can be set to.
 
"Actually hardcore OC" made an interesting analysis, he said that the location of the bracket and and screws may suggest HBM because of how far apart they are. would be interesting to see a real one and make some measurements.

About the cache i am also skeptical...but Sony said they asked for specific features that the GPU most have and that if we see similar design in the retail GPU is because AMD saw potential for client use and decided to implement it in the retail version of the Arch. So if this "Infinity cache" is real it may be a Sony idea and therefore explain why it is not in the Xbox and why Sony is being so secret about its Die.
 
Supposedly the initial Linux-commit with HBMs was just placehoilder or something, and there has since been either Linux-commits or other information of similar nature saying it's GDDR. We'll see soon enough.
There are two chips, Navy Flounder and Sienna Cichlid, one is HBM the other is G6.
They are real code path and you can trigger it with ioctl.
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top