Understanding XB1's internal memory bandwidth *spawn

dobwal · Sep 12, 2013

taisui said:
I don't know if you are trying too hard to disprove with bad analogy or you just didn't understand.
("Miles"PH is a velocity, hence the right analogy would actually be latency. Bandwidth would be how much food that I ordered from a takeout, i.e. lardwidth)

The 136GBps is not DRAM->DRAM in simultaneous R/W.

It's for DRAM->ESRAM, or ESRAM->DRAM, and because you can't write more data than you can read in a copy operation, the transfer rate is bound by the DRAM max BW, which is 68GBps, hence 68GBps * 2 = 136GBps.

Forget the bad analogy. The reality is that DRAM has a max bandwidth of 68 GBs. Regardless of the bandwidth available to the eSRAM or gpu. Both devices has to accomodate the bottleneck presented by DRAM.

Which means when you are reading from DRAM into the GPU it will happen at 68 GBs and the GPU is going to stream that data to eSRAM at 68 GBs unless you are buffering the data to more efficiently use eSRAM bandwidth. But that doesn't accelerate data being sourced from the DRAM.

As long as you are using the max bandwidth when reading from DRAM, the DRAM won't accommodate any writes so when the gpu is ultilizing whatever esram bandwidth not being used for copying data from the dram to esram, that data destined for DRAM has to sit somewhere and wait for the data from dram to sram to complete or you must reduce the bandwidth being used by the reads off dram to accommodate writes to dram.

Scott_Arm · Sep 12, 2013

This suddenly became the best thread. Hall of fame.

It started with quantum mechanics and now it's on to fast food analogies.

3dilettante · Sep 12, 2013

dobwal said:
Forget the bad analogy. The reality is that DRAM has a max bandwidth of 68 GBs. Regardless of the bandwidth available to the eSRAM or gpu. Both devices has to accomodate the bottleneck presented by DRAM.

What is the total amount of system bandwidth consumed if the GPU runs a shader that reads in 68 GB/s of data from RAM and does some random stuff to it internally, and at the same time a different shader is writing to the eSRAM--coincidentally at 68 GB/s?

taisui · Sep 12, 2013

dobwal said:
As long as you are using the max bandwidth when reading from DRAM, the DRAM won't accommodate any writes.

Right, and who said that it is capable of doing any write to the DRAM for such instance? I still don't understand what you are confused about.

You have no problem saying that DRAM BW can be in a configuration of 34GBps Read + 34GBps Write = 68GBps, but yet you have a problem with 68GBps Read from DRAM + 68 GBps Write to ESRAM = 136GBps?

NRP · Sep 12, 2013

Scott_Arm said:
This suddenly became the best thread. Hall of fame.

It started with quantum mechanics and now it's on to fast food analogies.

LOL! I was just thinking the same thing.

upnorthsox · Sep 12, 2013

3dilettante said:
What is the total amount of system bandwidth consumed if the GPU runs a shader that reads in 68 GB/s of data from RAM and does some random stuff to it internally, and at the same time a different shader is writing to the eSRAM--coincidentally at 68 GB/s?

The table states

ESRAM ---> DRAM Max bandwidth = 68 GBs
DRAM ---> ESRAM Max bandwidth = 68 GBs
Total Max bandwidth---> 136 GBs

This is what was being discussed, there's no gpu inbetween. In this scenerio the 68GB is full rate, 100% bandwidth, in or out, read or write, makes no difference. The 136GB is wrong because 200% bandwidth is wrong.

DrJay24 · Sep 12, 2013

I think we need to indicate who consumes this bandwidth at any given clock cycle.

An example (simplified of course, ignoring CPU, IO, DME).

1. DDR3 copy to ESRAM: 34GB/s DDR3 read, 34GB/s ESRAM write
2. ESRAM copy to DDR3: 34GB/s ESRAM read, 34GB/s DDR3 write
3. GPU read from ESRAM: 62GB/s ESRAM read
4. GPU write to ESRAM: 62GB/s ESRAM write

Steps 1-4 can be done all at the same time for 34+34+34+34+62+62=260GB/s total bandwidth, maxing 68GB/s DDR3 and using 192GB/s ESRAM. (numbers chosen for simplicity). This of course assumes the ESRAM can do a full read+write per cycle giving MS the benefit of the doubt with their "holes" talk.

3dilettante · Sep 12, 2013

upnorthsox said:
The table states

ESRAM ---> DRAM Max bandwidth = 68 GBs
DRAM ---> ESRAM Max bandwidth = 68 GBs
Total Max bandwidth---> 136 GBs

This is what was being discussed, there's no gpu inbetween. In this scenerio the 68GB is full rate, 100% bandwidth, in or out, read or write, makes no difference. The 136GB is wrong because 200% bandwidth is wrong.

There needs to be. No other client has the bandwidth to do this, and the memory pools don't initiate transactions (as far as has been leaked).

taisui · Sep 12, 2013

upnorthsox said:
The table states

This is what was being discussed, there's no gpu inbetween. In this scenerio the 68GB is full rate, 100% bandwidth, in or out, read or write, makes no difference. The 136GB is wrong because 200% bandwidth is wrong.

What? I quote, "This table shows an example of the maximum memory-bandwidths that the GPU can attain with different types of memory transfers." How else are you going to move data around without a processor?

dobwal · Sep 12, 2013

3dilettante said:
What is the total amount of system bandwidth consumed if the GPU runs a shader that reads in 68 GB/s of data from RAM and does some random stuff to it internally, and at the same time a different shader is writing to the eSRAM--coincidentally at 68 GB/s?

68 GBs

3dilettante · Sep 12, 2013

dobwal said:
68 GBs

Does the data movement to the eSRAM just not count?

taisui · Sep 12, 2013

dobwal said:
68 GBs

Answer this, What's the BW when:
GPU runs a shader that reads in 68 GB/s of data from RAM,
and GPU uses different shader is writing to the eSRAM at 100 GB/s at the same time?

dobwal · Sep 12, 2013

3dilettante said:
Does the data movement to the eSRAM just not count?

68 GBs is simply the rate of how much data can be moved over time. If I move 68 GB over to the gpu from the dram in one second and the next second I move that 68 GB to eSRAM my utilization of bandwidth over that total time still is 68 GBs.

taisui · Sep 12, 2013

dobwal said:
68 GBs is simply the rate of how much data can be moved over time. If I move 68 GB over to the gpu from the dram in one second and the next second I move that 68 GB to eSRAM my utilization of bandwidth over that total time still is 68 GBs.

OK...based on this argument, what if you read another 68GB from the DRAM in the 2nd second, at the same time? What's the total BW then?

DrJay24 · Sep 12, 2013

dobwal said:
68 GBs is simply the rate of how much data can be moved over time. If I move 68 GB over to the gpu from the dram in one second and the next second I move that 68 GB to eSRAM my utilization of bandwidth over that total time still is 68 GBs.

68GB is not a rate though. If you copy 68GB in one sec from ESRAM to DDR3 you use 136GB/s of BW. The same would be true on the PS4, if you read 68GB and wrote 68GB to and from GDDR5 in one sec you would use 136GB/s of the available memory bandwidth.

blakjedi · Sep 12, 2013

upnorthsox said:
The table states

ESRAM ---> DRAM Max bandwidth = 68 GBs
DRAM ---> ESRAM Max bandwidth = 68 GBs
Total Max bandwidth---> 136 GBs

This is what was being discussed, there's no gpu inbetween. In this scenerio the 68GB is full rate, 100% bandwidth, in or out, read or write, makes no difference. The 136GB is wrong because 200% bandwidth is wrong.

that's using 100% of available bandwidth in the DRAM memory system. Its either 68gb/s to write or to read. you can divide 68 gb into reading or writing. its never more than 68 Gbs of bandwidth total.

dobwal · Sep 12, 2013

taisui said:
OK...based on this argument, what if you read another 68GB from the DRAM in the 2nd second, at the same time? What's the total BW then?

68 GBs.

3dilettante · Sep 12, 2013

dobwal said:
68 GBs is simply the rate of how much data can be moved over time. If I move 68 GB over to the gpu from the dram in one second and the next second I move that 68 GB to eSRAM my utilization of bandwidth over that total time still is 68 GBs.

The GPU doesn't have 68 GB of storage, so copies don't work that way.
At a high level, the GPU is moving data a cache line or more at a time from source, and a few nanoseconds later that line is moving out over the other interface.

That means the acheived bandwidth is technically a billionths short of the actual peak, but it's okay to round up, or just extend the copy over several seconds and cut the wind-up out of the calculation.

dobwal · Sep 12, 2013

3dilettante said:
The GPU doesn't have 68 GB of storage, so copies don't work that way.
At a high level, the GPU is moving data a cache line or more at a time from source, and a few nanoseconds later that line is moving out over the other interface.

That means the acheived bandwidth is technically a billionths short of the actual peak, but it's okay to round up, or just extend the copy over several seconds and cut the wind-up out of the calculation.

I know that. You didn't specify and I am simply simplifying. If the GPU and is eating 256 bits from the DRAM in a cycle and then performing a operation and spitting that modified data to eSRAM at 256 bits per cycle. Then 256 bits per cycle is the bandwidth being consumed at any one time for that data.

When copying data from a source to a destination, your bandwidth concerns revolves around how fast that data makes it to its destination.

If I am copying data straight from eSRAM to DRAM and vice versa at 68 GBs. Tell how me how max bandwidth suddenly becomes 136 GBs when using the gpu as an intermediary stage.

When going from a source to destination the only bandwidth that matters is how fast the data is written into the destination. Inflating bandwidth numbers by simply throwing a bunch of different buses in between serves no purpose. Other than to add a bunch of latency while DRAM will still be limited to 68 GBs.

taisui · Sep 12, 2013

dobwal said:
I know that. You didn't specify and I am simply simplifying. If the GPU and is eating 256 bits from the DRAM in a cycle and then performing a operation and spitting that new data to eSRAM at 256 bits per cycles. Then 256 bit per cycle is the bandwidth being consumed at any one time.

It'll actually be 512 bits, because the data are being streamed in at 256bits and out at 256bits, concurrently.

dobwal; said:
When going from a source to destination the only bandwidth that matters is how fast the data is written into the destination.

In other words, you are saying that reading at 34Gbps and writing at 34Gpbs for the DRAM is actually just 34GBps, NOT 68GBps....?

dobwal; said:
If I am copying data straight from eSRAM to DRAM and vice versa at 68 GBs. Tell how me how max bandwidth suddenly becomes 136 GBs when using the gpu as an intermediary stage..

GPU read from ESRAM, consumes 68GBps, GPU write to DRAM, consumes 68GBps. Total BW consumed = 68GBps (R) + 68GBps (W) = 136GBps.
Suggesting that this is only 68GBps implies that either the read or write does not consume BW, which is actually way more awesome, because moving data would actually be free (though physically impossible).

Understanding XB1's internal memory bandwidth *spawn

Similar threads