Questions about Xbox One's ESRAM & Compute

It's from the Edge article they never said which dev it was.

As most dev/publishers are currently under NDA's with MS and PS4, most figured the devs in the Edge article would have most likely been Indies which would have a higher likelihood of having no prior experience with ESRAM or EDRAM from the 360. If you drop PC experience into the PS4 and X1, then you will get most immediate and higher rate of initial return on the PS4. Throw in that MS might be behind on their drivers (which has been alluded to via numerous sources), and the timing of when the Edge interviews took place, and there probably wasn't much API level support for utilizing the ESRAM as well.

I'm not so sure ESRAM will be an issue with teams at UBI, EA, etc..
 
of course it will be an issue...larger dev houses just have more resources to throw at obfuscated memory design than indies
 
of course it will be an issue...larger dev houses just have more resources to throw at obfuscated memory design than indies

Yes, meant to have "not as much of an issue", but do not have edit capabilities yet. ;)

At the end of the day, the extra work with ESRAM can provide some huge benefits however, so it is something that is worth taking the effort for, but MS also needs to keep updating their support around it so as to make it achievable and desirable to Indies and smaller dev teams as well.
 
I'm more interested in questions of bus contention, as data has to be moved. Everyone keeps adding bandwidth numbers without any consideration for the reads and writes to and from DDR3.

Compute is already a difficult problem for developers, but with esram, they also have to consider what compute is worth writing to the embedded RAM. What algorithms benefit latency, or bandwidth-wise, yet still need enough operations (3+?) to make esram beneficial?
 
As most dev/publishers are currently under NDA's with MS and PS4, most figured the devs in the Edge article would have most likely been Indies which would have a higher likelihood of having no prior experience with ESRAM or EDRAM from the 360. If you drop PC experience into the PS4 and X1, then you will get most immediate and higher rate of initial return on the PS4. Throw in that MS might be behind on their drivers (which has been alluded to via numerous sources), and the timing of when the Edge interviews took place, and there probably wasn't much API level support for utilizing the ESRAM as well.

I'm not so sure ESRAM will be an issue with teams at UBI, EA, etc..

I don't think this is the same situation as the Xbox 360 EDRAM.

I think the ESRAM is needed more this time around & devs are gonna have to try to get the most out of it to make up for the DDR3 68GB/s bandwidth.

Finding the best ways to use the ESRAM might be a pain right now for devs who hasn't figured out what work best when off loaded to ESRAM.
 
I'm more interested in questions of bus contention, as data has to be moved. Everyone keeps adding bandwidth numbers without any consideration for the reads and writes to and from DDR3.

Compute is already a difficult problem for developers, but with esram, they also have to consider what compute is worth writing to the embedded RAM. What algorithms benefit latency, or bandwidth-wise, yet still need enough operations (3+?) to make esram beneficial?

All of that was pretty well covered in another thread... the Durango thread possibly?

Look around the last month there. Or this might be a good enough place for it as it was nestled into another more general thread I think. If I find the original location, I'll post it.
 
I don't think this is the same situation as the Xbox 360 EDRAM.

I think the ESRAM is needed more this time around & devs are gonna have to try to get the most out of it to make up for the DDR3 68GB/s bandwidth.

Finding the best ways to use the ESRAM might be a pain right now for devs who hasn't figured out what work best when off loaded to ESRAM.

You are correct, there are significant differences in terms of capabilities and utilization. EDRAM was a simpler space as it was generally the final output buffer, wasn't it? ESRAM is much more active in terms of the entire process.
 
I'm more interested in questions of bus contention, as data has to be moved. Everyone keeps adding bandwidth numbers without any consideration for the reads and writes to and from DDR3.

Compute is already a difficult problem for developers, but with esram, they also have to consider what compute is worth writing to the embedded RAM. What algorithms benefit latency, or bandwidth-wise, yet still need enough operations (3+?) to make esram beneficial?

Haven't people beat this one to death? The 2 pools can be treated as 1 logical page, I don't see why the same ops would require the data to be copied with the ESRAM from the DRAM before processing begins. Aren't data being processed the same way if contained in the DRAM? Besides I don't think the ESRAM is exposed to the CPU, thus it should not suffer from contention (if there's any to begin with).
 
If the data is in the DDR, which most of it will be, and the operation is very bandwidth sensitive, it will have to be copied to eSRAM. Is there some magic where data in DDR can be processed with 100GB/s read and write bandwidth? And when that data is complete and needs to be replaced, it might be copied back to DDR? I don't quite understand the data flow for XB1, the 360 was simpler since it had less degrees of freedom.
 
If the data is in the DDR, which most of it will be, and the operation is very bandwidth sensitive, it will have to be copied to eSRAM. Is there some magic where data in DDR can be processed with 100GB/s read and write bandwidth? And when that data is complete and needs to be replaced, it might be copied back to DDR? I don't quite understand the data flow for XB1, the 360 was simpler since it had less degrees of freedom.

You use the eSRAM exactly as how you'll use with any DRAM.

You read from DRAM, write to eSRAM. (as opposed to, write back to DRAM again)
You read from DRAM and eSRAM, write to eSRAM. (or DRAM, if you like)
You can just output the final frame buffer directly from the eSRAM (you don't have to resolve it like on the 360)

If your algorithm needs to make copy, then you'll need to make copy in a DRAM-only design regardless, but on the X1 there's the Move Engine that saves you GPU cycles.
You can even have a frame that extends across both memory, logically they are one.
 
If your algorithm needs to make copy, then you'll need to make copy in a DRAM-only design regardless, but on the X1 there's the Move Engine that saves you GPU cycles.
You can even have a frame that extends across both memory, logically they are one.

I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

The Move Engines save you GPU cycles, but I don't really think many GPU's these days copy there own memory and waste cycles doing it that way :).
 
I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.
Given the capacity of the eSRAM, if whatever you're copying needs high bandwidth, it's getting read more than once.

In terms of power, it breaks even if you read the same location twice.
If dealing with reads and writes, it pays off in power and bandwidth after a read and a write.


There's also the option of GPU-generated intermediate buffers that may be created, modified, consumed, and discarded without ever leaving the eSRAM.
 
I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

The Move Engines save you GPU cycles, but I don't really think many GPU's these days copy there own memory and waste cycles doing it that way :).

Tiling and untiling is probably more significant there.

Splitting tiles across ESRAM busses and banks helps gain the simultaneous reads/writes necessary to approach anything like peak bandwidth.

GCN seems to like things in a "growing squares" order which seems a lot like Morton order.
 
I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

The Move Engines save you GPU cycles, but I don't really think many GPU's these days copy there own memory and waste cycles doing it that way :).

As if you read data from DRAM and "move stuff" back to the DRAM, you won't consume DRAM bandwidth?
 
Back
Top