aaronspink
Veteran
How is the BoB cache supposed to work ? As a per memory channel victim cache ? Or as a massive write coalescing buffer? (or both ?)
Cheers
Depends on what they want to use it for and if they want it to have multiple functionalities. Also depends somewhat on the level/type of RAS features the memory has.
Options include:
Segment/Prefetch caching - Depending on the side of the basic memory block level protection, they may need to load more data from DRAM than they actually need to deliver to the CPU. In that case, the cache or a portion of it can be used to hold the additional line(s) that aren't directly needed by the CPU. For instance if the CPU line size is 64/128B and the DRAM protection is 128/256B. In the case of mismatch cpu cacheline size and dram protection block size it also acts as the RMW buffer.
Hotline/bouncing line/frequent updates - basically there are some patterns of access that will cause lots of updates to memory to occur and/or reads of the same block in memory. In these cases the cache can act as a holding place for these lines, saving power, and reducing effective latency.
General read caching - what it sounds like. basically caching what is read and being able to cache data that through access patterns ends up overflowing the CPU caches.
Stream cache - the controller notices a pattern of a streaming sequence of reads and pre-caches them for reduced latency. Allows for better bank/rank management of the DRAMs esp if there are open DRAM cycles. Results in better latency and increased bandwidth. Same thing for writes, delay write and push them down during dead cycles for a given DRAM.
Victim Cache - cache CPU victim lines that will likely be re-read in the future as a capacity optimization.
It is doubtful that they will run them as simple dumb caches (esp since IBM historically has used 3-4 the number of cache states as others to squeeze out performance) since the capacity ratio is poor wrt to the CPU cache (96 MB in CPU Cache and 128 MB in memory cache). I would assume that they have multiple modes that the memory cache can operate in and that they aren't mutually exclusive. I wouldn't be surprised if it operated in 3-5 separate modes at once to maximize its effective capacity.
For example the segment/prefetch cache functionality would likely only take 1MB, the stream cache another MB, Hotline another MB, victim 8MB or so, etc. So there is certainly enough capacity to have multiple functions using it.