The L2 was at least implicitly the LLC in previous generations, so having a separate reference now may mean a distinct layer is present.Within the patch, there are declarations with the term LLC, which is usually an abbreviation to last-level cache. Combined with terms like “no alloc” and “GCMC”, these patches sound like they are adding support for memory-side LLC bypass* on a per page basis, while some blocks (e.g. SDMA copy engine) can override the page level settings.
* probably like SLC=1 policy for L2: write no-allocate, read miss-evict
There may be other considerations. There are various no_alloc values, but the L2-related ones are no_alloc without LLC in the name, while others like SDMA have no-allocation values that do name the LLC.
Would that mean those flags are for bypassing the L2 in favor of a separate layer, or is it something like the L2 has no need for a redundant LLC designation since that is what it is?
Some kind of display controller self-refresh from a local memory might work.Guess also worth noting that the earlier link contains a condition with a magic number: surface_size < 128 * 1024 * 1024.
So, ehm, maybe an interpretation is:
?_?
- It has 128 MB last level cache
- The hardware feature (& hence the flag) is called Memory Access at Last Level (MALL)
- It can be turned on & off. (for lower idle power?)
The driver probably allows only render targets to allocate in LLC in some phases, in which the display controller can be assured that any <128MB RT to be presented always hit the LLC, and uses way tighter timing. (Eh, or maybe all the times? It is an IMR GPU after all)- Edit: ^^ is nonsense if you consider basics like double buffering... So maybe it is like what andermans said, MALL allows the 128MB LLC to be used as a scratchpad (hence "Memory Access"), while the GDDR6 pool is powered off?
It's a possible interpretation, although 128MB has shown up as a limit for buffers in compute or graphics in other instances.
Another possibility is that 128 * 1024 * 1024 isn't a size in bytes. Some references have values like maxTexelBufferElements = 128 * 1024 * 1024, which may explain the curious way of subdividing 2^27.
https://phabricator.pmoreau.org/file/data/5btjflw6ul4wk3qrodo2/PHID-FILE-s3kiruzwymgchgag3eid/file
That wouldn't point to a cache that's literally 128MB in size, just a possible addressing limit for some of the hardware that might require additional units or the driver to intervene, and that might be self-defeating going by code that has microsecond time constants and might be for low-power operation.