AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Within the patch, there are declarations with the term LLC, which is usually an abbreviation to last-level cache. Combined with terms like “no alloc” and “GCMC”, these patches sound like they are adding support for memory-side LLC bypass* on a per page basis, while some blocks (e.g. SDMA copy engine) can override the page level settings.

* probably like SLC=1 policy for L2: write no-allocate, read miss-evict
The L2 was at least implicitly the LLC in previous generations, so having a separate reference now may mean a distinct layer is present.
There may be other considerations. There are various no_alloc values, but the L2-related ones are no_alloc without LLC in the name, while others like SDMA have no-allocation values that do name the LLC.
Would that mean those flags are for bypassing the L2 in favor of a separate layer, or is it something like the L2 has no need for a redundant LLC designation since that is what it is?

Guess also worth noting that the earlier link contains a condition with a magic number: surface_size < 128 * 1024 * 1024.

So, ehm, maybe an interpretation is:
  • It has 128 MB last level cache
  • The hardware feature (& hence the flag) is called Memory Access at Last Level (MALL)
  • It can be turned on & off. (for lower idle power?)
  • The driver probably allows only render targets to allocate in LLC in some phases, in which the display controller can be assured that any <128MB RT to be presented always hit the LLC, and uses way tighter timing. (Eh, or maybe all the times? It is an IMR GPU after all)
  • Edit: ^^ is nonsense if you consider basics like double buffering... So maybe it is like what andermans said, MALL allows the 128MB LLC to be used as a scratchpad (hence "Memory Access"), while the GDDR6 pool is powered off?
?_?
Some kind of display controller self-refresh from a local memory might work.
It's a possible interpretation, although 128MB has shown up as a limit for buffers in compute or graphics in other instances.
Another possibility is that 128 * 1024 * 1024 isn't a size in bytes. Some references have values like maxTexelBufferElements = 128 * 1024 * 1024, which may explain the curious way of subdividing 2^27.
https://phabricator.pmoreau.org/file/data/5btjflw6ul4wk3qrodo2/PHID-FILE-s3kiruzwymgchgag3eid/file
That wouldn't point to a cache that's literally 128MB in size, just a possible addressing limit for some of the hardware that might require additional units or the driver to intervene, and that might be self-defeating going by code that has microsecond time constants and might be for low-power operation.
 
The L2 was at least implicitly the LLC in previous generations, so having a separate reference now may mean a distinct layer is present.
There may be other considerations. There are various no_alloc values, but the L2-related ones are no_alloc without LLC in the name, while others like SDMA have no-allocation values that do name the LLC.
Yeah, that is what I meant by "memory-side LLC".

Would that mean those flags are for bypassing the L2 in favor of a separate layer, or is it something like the L2 has no need for a redundant LLC designation since that is what it is?
I am leaning towards it independently controlling the access behaviours when it hits the memory-side LLC controller, whereas L2 as a GPU internal cache continues to be controlled by the instruction-level SLC bit (likewise for GLC/L0 and DLC/L1). Otherwise, if one assumes LLC=L2, it would mean a slight departure from GCN/RDNA 1's approach of per-instruction cache policy selection for all levels of GPU internal caches.
 
Some kind of display controller self-refresh from a local memory might work.
It's a possible interpretation, although 128MB has shown up as a limit for buffers in compute or graphics in other instances.
Another possibility is that 128 * 1024 * 1024 isn't a size in bytes. Some references have values like maxTexelBufferElements = 128 * 1024 * 1024, which may explain the curious way of subdividing 2^27.
https://phabricator.pmoreau.org/file/data/5btjflw6ul4wk3qrodo2/PHID-FILE-s3kiruzwymgchgag3eid/file
That wouldn't point to a cache that's literally 128MB in size, just a possible addressing limit for some of the hardware that might require additional units or the driver to intervene, and that might be self-defeating going by code that has microsecond time constants and might be for low-power operation.

I think you're linking to Intel though, which indeed has weird limits, but I don't think I've seen that particular limit before for GCN/RDNA? From the next patch actually initializing surface_size I think it clearly shows it is bytes: https://lists.freedesktop.org/archives/amd-gfx/2020-October/055215.html

Though I agree that the 128 Mi might only point to a cache limit but we do not know for sure. Furthermore, from just an enable/disable it is hard to know for sure what MALL is anyway.
 
I think you're linking to Intel though, which indeed has weird limits, but I don't think I've seen that particular limit before for GCN/RDNA? From the next patch actually initializing surface_size I think it clearly shows it is bytes: https://lists.freedesktop.org/archives/amd-gfx/2020-October/055215.html

Though I agree that the 128 Mi might only point to a cache limit but we do not know for sure. Furthermore, from just an enable/disable it is hard to know for sure what MALL is anyway.
I was giving the Intel link as an example of where that convention of breaking a size value down into that expression comes up for a non-capacity reason.

edit: nevermind, missed a parenthetical
That code seems to clarify what the source is.
 
So I guess I answered my own question, more than half a year later: RDNA2 will come with AV1 decode. I wonder how Vegas will run on RDNA2? GPU acceleration would be awesome for the stuff I do.

It was more or less a given that both Ampere and RDNA2 would have AV1 decode acceleration, especially for the consoles. I expect their Cezanne APU to also get the updated decode block even though it's rumored to be based on Vega.

~9 hours to go. The wait is almost over!
 
Status
Not open for further replies.
Back
Top