Understanding XB1's internal memory bandwidth *spawn

Yes. GPU Read bandwidth is 170gb. Write is 102.

http://www.vgleaks.com/durango-memory-system-overview/

Correct, so I'm assuming the GPU is reading from the ESRAM pool at 68 GB/s (because it can't transfer more data than that per second through the DRAM bus to main memory) and the GPU is also then writing to the DRAM pool at 68 GB/s for a total of 136 GB/s consumed over 2 buses. Is that not the correct interpretation of the table?
 
That assertion would apply to all max bandwidth figures. Relatively nothing pulls max bandwidth 100% of the time and context is rarely provided.

Sure but we aren't talking about a known technology say ddr3 or gddr5 or "regular" SRAM with known characteristics against known benchmarks. We of course WON'T know because it's proprietary knowledge or nda or whatever, what the ESRAM will do, but when we are discussing the abilities of an unknown quantity one has to keep an open mind when it comes to interpreting the numbers coming at ya.

We know enough of how a ddr3 system works to know what happens when you arbitrarily ask for some data. We can make some educated guess at bandwidth utilization based on that knowledge and actual experience of course. We can't say the same for the ESRAM.
 
Correct, so I'm assuming the GPU is reading from the ESRAM pool at 68 GB/s (because it can't transfer more data than that per second through the DRAM bus to main memory) and the GPU is also then writing to the DRAM pool at 68 GB/s for a total of 136 GB/s consumed over 2 buses. Is that not the correct interpretation of the table?

The diagram seems to infer that the gpu can read from both esram and dram at the same time for a total of 170 GBs of bandwidth (102.4 +68) while the gpu only writes to eSRAM (102.4). I not sure if thats totally true because that implies that the gpu never directly writes to dram or the coherent bus over the northbridge and maybe the DMAs dictates accesses to and from other devices using eSRAM as a destination and source.

You can't saturate the dram with 68 GBs worth of read accesses from eSRAM while simultaneously saturating the dram with 68 GBs of writes from eSRAM. That would imply that 136 GBs would be the maximum theoretical bandwidth of the DDR3. Its not.
 
The diagram seems to infer that the gpu can read from both esram and dram at the same time for a total of 170 GBs of bandwidth (102.4 +68) while the gpu only writes to eSRAM (102.4). I not sure if thats totally true because that implies that the gpu never directly writes to dram or the coherent bus over the northbridge and maybe the DMAs dictates accesses to and from other devices using eSRAM as a destination and source.

You can't saturate the dram with 68 GBs worth of read accesses from eSRAM while simultaneously saturating the dram with 68 GBs of writes from eSRAM. That would imply that 136 GBs would be the maximum theoretical bandwidth of the DDR3. Its not.

The GPU can write to DRAM as far as I can tell from any of the block diagrams.
 
You can't saturate the dram with 68 GBs worth of read accesses from eSRAM while simultaneously saturating the dram with 68 GBs of writes from eSRAM.
That sentence doesn't make sense in my opinion. Reads from eSRAM of course don't saturate the DRAM bandwidth, as DRAM isn't touched for this. Why shouldn't it be possible to write 68GB/s to DRAM while reading 68GB/s (or even more) from the eSRAM? The other way around should work too (reading 68GB/s from DRAM while writing 68GB/s or more to eSRAM). That's just a question if the connections to the DRAM and eSRAM somehow share some parts limiting the bandwidth. But as indicated by MS, only the connection to the DRAM and the CPU share bandwidth, i.e. the 30GB/s coherent bandwidth are included in the 68GB/s figure. The eSRAM bandwidth is independent of this (as there is no such thing as CPU cache coherent eSRAM access).
There may be an inconsistency with the 102GB/s write and 170GB/s write bandwidth in the older documentation leaked by vgleaks, as this would imply that the write bandwidth is shared between DRAM and eSRAM. This could indeed be true (usually more is read than written, so it shouldn't be too much of an issue) but it could also be an oversight, for example caused by considering only the write bandwidth available to the ROPs while the read bandwidth includes the complete memory hierarchy (traditionally one could only read through the path TMUs/L1/L2/Mem but it got more symmetric since Xenos and one has a read/write capability outside of [MEM/ROP] exports).

edit:
The write bandwidth is tied to the 4 render back ends (16*8bytes per clock).
As said above, this thinking could be the reason for the asymmetry. But it doesn't apply anymore to GCN GPUs. One can also write through the TMU(AGU) => L1 => L2 => mem hierarchy and not just through the ROPs.
 
That sentence doesn't make sense in my opinion. Reads from eSRAM of course don't saturate the DRAM bandwidth, as DRAM isn't touched for this. Why shouldn't it be possible to write 68GB/s to DRAM while reading 68GB/s (or even more) from the eSRAM? The other way around should work too (reading 68GB/s from DRAM while writing 68GB/s or more to eSRAM). That's just a question if the connections to the DRAM and eSRAM somehow share some parts limiting the bandwidth. But as indicated by MS, only the connection to the DRAM and the CPU share bandwidth, i.e. the 30GB/s coherent bandwidth are included in the 68GB/s figure. The eSRAM bandwidth is independent of this (as there is no such thing as CPU cache coherent eSRAM access).
There may be an inconsistency with the 102GB/s write and 170GB/s write bandwidth in the older documentation leaked by vgleaks, as this would imply that the write bandwidth is shared between DRAM and eSRAM. This could indeed be true (usually more is read than written, so it shouldn't be too much of an issue) but it could also be an oversight, for example caused by considering only the write bandwidth available to the ROPs while the read bandwidth includes the complete memory hierarchy (traditionally one could only read through the path TMUs/L1/L2/Mem but it got more symmetric since Xenos and one has a read/write capability outside of [MEM/ROP] exports).

edit:
As said above, this thinking could be the reason for the asymmetry. But it doesn't apply anymore to GCN GPUs. One can also write through the TMU(AGU) => L1 => L2 => mem hierarchy and not just through the ROPs.

The table states

ESRAM ---> DRAM Max bandwidth = 68 GBs
DRAM ---> ESRAM Max bandwidth = 68 GBs
Total Max bandwidth---> 136 GBs

The total max bandwidth listed means absolutely nothing because the max bandwidth of DRAM is still 68 GBs.

If you are talking about the gpu ability to read and write from multiple memory pools at bandwidths at 136 GBs and above, then thats one thing. But the table is stating max bandwidth of dram to dram, esram to dram, dram to esram and esram to esram. None of those scenarios should produce a max bandwidth of 136 GBs (excluding the new bandwidth of esram because the article was written before the increase) even when you consider some combination of the four. You have to involve other devices or memory pools to reach bandwidths at that level. Thereby it would be a poorly constructed table to produce bandwidth numbers without stating the scenarios that produce those numbers.
 
Last edited by a moderator:
The table states

ESRAM ---> DRAM Max bandwidth = 68 GBs
DRAM ---> ESRAM Max bandwidth = 68 GBs
Total Max bandwidth---> 136 GBs
It is a poorly constructed table in the sense that the listed maximum only applies to the copy scenario detailed there as an example. It is of course not the total max bandwidth for other scenarios.
But if you copy something from eSRAM to DRAM or the other way around the attainable total bandwidth peaks at 136GB/s, 68GB/s read and 68GB/s write (this is additive). I actually don't grasp, what you are trying to discuss.
 
It is a poorly constructed table in the sense that the listed maximum only applies to the copy scenario detailed there as an example. It is of course not the total max bandwidth for other scenarios.
But if you copy something from eSRAM to DRAM or the other way around the attainable total bandwidth peaks at 136GB/s, 68GB/s read and 68GB/s write (this is additive). I actually don't grasp, what you are trying to discuss.

Maybe Im mistaken. But if you have a 256 bit interface with 2133 Mhz DDR3 how does that setup allow you to simultaneously accommodate both read and writes to DRAM each at 68 GBs?

Does max bandwidth when discussing dram only account for data travel in one direction so the bidirectional bandwidth allowed in any one instance is 2X the stated max bandwidth of the dram in question?
 
Last edited by a moderator:
DDR Bandwidth is in transfers/sec, so read or write. Otherwise the bandwidth would be published as 136GB/s.
 
Last edited by a moderator:
Maybe Im mistaken. But if you have a 256 bit interface with 2133 Mhz DDR3 how does that setup allow you to simultaneously accommodate both read and writes to DRAM each at 68 GBs?

Does max bandwidth when discussing dram only account for data travel in one direction so the bidirectional bandwidth allowed in any one instance is 2X the stated max bandwidth of the dram in question?

DRAM doesn't do simultaneously read AND write.
256b = 32B, at 2133Mhz = 68256MBps = ~68GBps.
 
Last edited by a moderator:
The table states

ESRAM ---> DRAM Max bandwidth = 68 GBs
DRAM ---> ESRAM Max bandwidth = 68 GBs
Total Max bandwidth---> 136 GBs

The total max bandwidth listed means absolutely nothing because the max bandwidth of DRAM is still 68 GBs.

If you are talking about the gpu ability to read and write from multiple memory pools at bandwidths at 136 GBs and above, then thats one thing. But the table is stating max bandwidth of dram to dram, esram to dram, dram to esram and esram to esram. None of those scenarios should produce a max bandwidth of 136 GBs (excluding the new bandwidth of esram because the article was written before the increase) even when you consider some combination of the four. You have to involve other devices or memory pools to reach bandwidths at that level. Thereby it would be a poorly constructed table to produce bandwidth numbers without stating the scenarios that produce those numbers.

The DRAM and the ESRAM are different memory pools, they have their own pipes and own BW. It is also stated that the table is for a copy operation to demonstrate max BW.

DRAM to DRAM, half read, half write, gives your 34GBps R and 34 GBps W, at 68GBps total.

ESRAM to ESRAM, (assuming the early leak doesn't include simultaneous R/W), 51GBps each for R and W, total 102GBps.

ESRAM to/from DRAM, limited by DRAM max BW 68GBps, so 68GBps on the DRAM, 68GBps on the ESRAM, gives you 136GBps.
 
ESRAM to/from DRAM, limited by DRAM max BW 68GBps, so 68GBps on the DRAM, 68GBps on the ESRAM, gives you 136GBps.

You are counting the same bandwidth twice, maybe I don't understand what you are saying.

If you do a full dump from ESRAM to DDR3 you will saturate the full 68GB/s leaving 0GB/s for anything else (in theory).
 
You are counting the same bandwidth twice, maybe I don't understand what you are saying.

If you do a full dump from ESRAM to DDR3 you will saturate the full 68GB/s leaving 0GB/s for anything else (in theory).

So when reading from ESRAM, you don't count the BW being used for that?
 
I see!

So a ESRAM->DDR3 transfer at 68GB/s would look like 68GB/s+68GB/s (DDR3+ESRAM).

I read it like a copy from ESRAM to DDR3 and a copy from DDR3 to ESRAM at the same time, my fault, sorry.
 
So what you are saying is writing to the DRAM doesn't consume BW?

No I'm saying that reading from and writing to dram is the same 68GB whether that data is going to/coming from esram or the cpu or the hdd or "da cloud", You don't get to stand in the middle of the stream and add the flowrate in both directions.
 
Back
Top