Understanding XB1's internal memory bandwidth *spawn

Gipsel · Oct 8, 2013

I thought everyone understood it with the car analogy?

The average speed of a car will always be (a lot) lower than its theoretical maximum written down somewhere in the specifications. But the maximum achieved in real world usage tends to fluctuate a lot. You can go relatively slow in a city (hopefully not in a traffic jam) and a bit faster on average on a highway. If I decide to drive from Hamburg to Munich, my average speed would likely be 80mph for the whole distance (including a break for lunch

). But if the Autobahn is not too crowded (it doesn't have to be empty), no one hinders me (at least an some stretches) to go 120mph or even faster which could be awfully close to the theoretical maximum. That speed is indeed attainable in the real world (and not only on some closed testing grounds) for a small fraction of the total time.

The same with real games. No real game will use 150+GB/s from the eSRAM on average. It's not realistic to expect such a behaviour. The consoles are not driving multiple 4k screens at 60Hz. But could they reach this bandwidth for a fraction of a frame? Of course!

Exophase · Oct 8, 2013

Gipsel said:
A fillrate test is also representative of something.
Furthermore, each game or even each different phase of rendering a frame of the same game will be representative of something else, a something we don't know.
But here, they looked specifically for high bandwidth usage situations, so we are given a hint, for what it should be representative, the bandwidth.
Of course it does. At least given the circumstances like what they are talking about (high bandwidth usage scenarios and which specific one [blending] they think of!). Given the reasoning in that interview as well as by several people here in the thread, it would be quite hard to explain how else they got 150+GB/s out of the eSRAM.

What he said 140-150GB/s, that's a minor dispute over "150+".. but not that minor.

Now, I posit that a pure fillrate test with depth + blending (or just blending, depth completely off) should be capable of achieving much higher than 140-150GBs. The theoretical peak is over 100GB/s for read and write independently. That's already accounting for overhead that prevents you from using every cycle for write. Other than that, it shouldn't be that hard to saturate either the read or write interface, probably easier than you would GDDR5. And if you have something that's dealing almost entirely in RMWs you should be able to come close to saturating both of them. Hitting under 75% of that isn't really close.

It could be that this is still their highest burst period during a frame. It could be that this is done while blending is happening, but they're not coming closer to saturating the eSRAM bandwidth because of the other things they're doing. But it could also be happening during other parts of the frame, like during a depth pre-pass (especially if Hi-Z is off). And it could be that the actual burst periods, for some "reasonable" metric of how long a period has to be, exceeds the 140-150MB/s quoted. It's not outside the realm of capability for the system.

I agree that as a per-frame average 140-150MB/s sounds too high. But I don't really think there's enough information to assume much else than that. It's possible that 140-150MB/s is not the peak but the average bandwidth during periods where either read or write are saturated (or in some way the GPU "wants" to saturate them, ie it's bandwidth limited). This would be a pretty different metric, and would be illustrative of the tradeoff of having separate read/write channels instead of some faster read + write channel.

In the past, game companies have given pretty detailed overviews of performance profiling they've done on their games. Maybe we'll see some more indepth examples of bandwidth usage over a frame. Identifying this could be useful for optimizing what you keep in eSRAM and how you design some of your algorithms.

ERP · Oct 8, 2013

n the past, game companies have given pretty detailed overviews of performance profiling they've done on their games. Maybe we'll see some more indepth examples of bandwidth usage over a frame. Identifying this could be useful for optimizing what you keep in eSRAM and how you design some of your algorithms.

Often there aren't sufficient hardware performance counters to get good metrics at that level, you can get some indication of average load, but it's often difficult to isolate specific pieces of a frame without changing the results.

Exophase · Oct 8, 2013

ERP said:
Often there aren't sufficient hardware performance counters to get good metrics at that level, you can get some indication of average load, but it's often difficult to isolate specific pieces of a frame without changing the results.

But there could be sufficient performance counters this time around. Gipsel said GCN has them, and this is pretty small stuff compared to what CPUs do. I wouldn't be surprised if AMD added more on Sony and/or Microsoft's behest specifically since console developers tend to get deeper into that.

Gipsel · Oct 8, 2013

Exophase said:
Gipsel said GCN has them

The ones I mentioned are there since the R700 series.

edit:
And relating to that BW discussion, well, in a pure fillrate test you are likely able to achieve slightly higher bandwidth numbers. How high exactly depends on the intricacies of the SRAM controller. They obviously gimped it a bit on the pipelining (so it can't achieve the "full" peak rate of 218GB/s even without bank conflicts), so who outside MS does really know how much you can get out for the traffic caused by a fillrate test? I would assume you could get a bandwidth utilization of 90+% like on other AMD GPUs (I stated this before) as long as no pecularities of the eSRAM come into play. For instance they did mention there, that their peak fillrate without blending but with Z updates saturates the eSRAM at 164GB/s (that number got mentioned but is likely not going to be true outside of synthetic tests because of Z compression, but anyway). This is basically hitting the 109GB/s write limit, so a "partial saturation" if you want. On a DRAM interface, this would'nt be an issue (but it's a relatively minor one on the XB1 as one runs into the ROP throughput limit at about the same time).
Anyway, it makes no sense bickering about if MS measured this usage of 150GB/s (or maybe 140-150GB/s) over a few ten µseconds or one or two milliseconds. The whole intention of discussing this bandwidth stuff from MS' point of view was assuring people that concurrent read and write works and that the ROPs (and in extension the whole GPU) are not bandwidth starved. So the simplest assumption (the eSRAM hasn't some strange performance pitfalls, which sounds unlikely) leads us to the conclusion, that the eSRAM bandwidth use through the ROPs is as efficient as we are used to see from other AMD GPUs, if one keeps in mind that separate limits for read and write traffic apply. If one wants to cook it down to the real basic information one can safely ignore all the other spin surrounding it because it's irrelevant (for this point, and further details can't be learned from the interview).

Exophase · Oct 8, 2013

Gipsel said:
The ones I mentioned are there since the R700 series.

Okay, I think it's pretty likely that the consoles have them then.

Understanding XB1's internal memory bandwidth *spawn

Gipsel

Exophase

ERP

Exophase

Gipsel

Exophase

Similar threads