Typically, such minor changes result in an average performance improvement of around 0% when tested across various benchmarks. Updated counters are likely here to address corner cases in a few titles, where they might improve the average framerate by a percent or two.
The better caches are those with lower latencies, higher bandwidth, and capacity. Of course, improved prefetching can also help reduce latencies in sensitive workloads where there is not enough work to cover the latencies by other means, but, again, it wouldn’t have a noticeable effect in practice since the current implementations are already quite effective at this. In some corner cases, there might be an additional improvement of a few percent per frame, which would effectively translate to 0% in your typical benchmark suite. And how on earth are the improvements to the counters supposed to affect the memory bandwidth requirements? Either larger caches or better delta compression are needed for meaningful improvements in this department.