Yes... but Jaguar still had awfully slow shared L2 cache. Zen cores introduced similar fast dedicated L2 cache than Intel has had for long time. But the shared L3 cache still seems to be much slower than Intel's and it doesn't cover all cores. Cross cluster data movement is still a problem. Zen is a huge improvement over AMDs previous designs, but still a bit behind Intel in some uncore features.
There is also something else unusual about the L3 cache beyond the 8MB per CCX and how it should had been represented by AMD rather than as a 16MB shared L3.
Hardware.fr has shown that it is not behaving as expected when trying to use the L3 cache beyond certain thresholds on same CCX, and this is applicable to all CPUs tested apart from the 1600X.
As an example, the 4-Core 8MB (2 CCX) cache has a latency jump when the L3 cache is over 2MB rather the expected 4MB, and the others with 16MB (2 CCX) has a latency jump after 4MB rather than after 8MB.
So something unsual is happening but the plus side is the 1600X is not following the same quirk trends and so maybe this is fixable or the circumstances were perfect for the 1600X.
This suggests either:
- further partitioning/segmentation-mesh of the L3 Cache for same CCX.
- An unusual quirk of the 2x1MB per core.
- An unusual quirk-bug with how applications can use the L3, which needs clarification from AMD and hardware.fr is still waiting for a response on the subject of the L3 behaviour.
- The 'mostly reserved' L3 cache means that part of it is reserved permanently or dynamic in some way (which may also apply to point 1).
Separately the high latency jump after 4MB (green) and 8MB (other SKUs) comes down to using system memory, as victim cache for core and with thread affinity to be fixed the other CCX in this situation would not have the data, and it is unknown whether the 'shared' data can be pulled cross CCX with a penalty or must revert always to system memory.
Red 1600X is how one expects it to be as 2x8MB L3 caches in context mentioned, but this behaves different on Intel with its decoupled ring bus L3 cache.
Here is the simplified 1800X.
But the 1600X suggests a solution may be possible.
Cheers