Questions about Xbox One's ESRAM & Compute

onQ · Oct 4, 2013

Ceger said:
Which devs would that be? Honest question.

It's from the Edge article they never said which dev it was.

“Xbox One is weaker and it’s a pain to use its ESRAM,” concluded one developer.

Ceger · Oct 4, 2013

onQ said:
It's from the Edge article they never said which dev it was.

As most dev/publishers are currently under NDA's with MS and PS4, most figured the devs in the Edge article would have most likely been Indies which would have a higher likelihood of having no prior experience with ESRAM or EDRAM from the 360. If you drop PC experience into the PS4 and X1, then you will get most immediate and higher rate of initial return on the PS4. Throw in that MS might be behind on their drivers (which has been alluded to via numerous sources), and the timing of when the Edge interviews took place, and there probably wasn't much API level support for utilizing the ESRAM as well.

I'm not so sure ESRAM will be an issue with teams at UBI, EA, etc..

gurgi · Oct 4, 2013

of course it will be an issue...larger dev houses just have more resources to throw at obfuscated memory design than indies

Ceger · Oct 4, 2013

gurgi said:
of course it will be an issue...larger dev houses just have more resources to throw at obfuscated memory design than indies

Yes, meant to have "not as much of an issue", but do not have edit capabilities yet.

At the end of the day, the extra work with ESRAM can provide some huge benefits however, so it is something that is worth taking the effort for, but MS also needs to keep updating their support around it so as to make it achievable and desirable to Indies and smaller dev teams as well.

gurgi · Oct 4, 2013

I'm more interested in questions of bus contention, as data has to be moved. Everyone keeps adding bandwidth numbers without any consideration for the reads and writes to and from DDR3.

Compute is already a difficult problem for developers, but with esram, they also have to consider what compute is worth writing to the embedded RAM. What algorithms benefit latency, or bandwidth-wise, yet still need enough operations (3+?) to make esram beneficial?

onQ · Oct 4, 2013

Ceger said:
As most dev/publishers are currently under NDA's with MS and PS4, most figured the devs in the Edge article would have most likely been Indies which would have a higher likelihood of having no prior experience with ESRAM or EDRAM from the 360. If you drop PC experience into the PS4 and X1, then you will get most immediate and higher rate of initial return on the PS4. Throw in that MS might be behind on their drivers (which has been alluded to via numerous sources), and the timing of when the Edge interviews took place, and there probably wasn't much API level support for utilizing the ESRAM as well.

I'm not so sure ESRAM will be an issue with teams at UBI, EA, etc..

I don't think this is the same situation as the Xbox 360 EDRAM.

I think the ESRAM is needed more this time around & devs are gonna have to try to get the most out of it to make up for the DDR3 68GB/s bandwidth.

Finding the best ways to use the ESRAM might be a pain right now for devs who hasn't figured out what work best when off loaded to ESRAM.

Ceger · Oct 4, 2013

gurgi said:
I'm more interested in questions of bus contention, as data has to be moved. Everyone keeps adding bandwidth numbers without any consideration for the reads and writes to and from DDR3.

Compute is already a difficult problem for developers, but with esram, they also have to consider what compute is worth writing to the embedded RAM. What algorithms benefit latency, or bandwidth-wise, yet still need enough operations (3+?) to make esram beneficial?

All of that was pretty well covered in another thread... the Durango thread possibly?

Look around the last month there. Or this might be a good enough place for it as it was nestled into another more general thread I think. If I find the original location, I'll post it.

Ceger · Oct 4, 2013

onQ said:
I don't think this is the same situation as the Xbox 360 EDRAM.

I think the ESRAM is needed more this time around & devs are gonna have to try to get the most out of it to make up for the DDR3 68GB/s bandwidth.

Finding the best ways to use the ESRAM might be a pain right now for devs who hasn't figured out what work best when off loaded to ESRAM.

You are correct, there are significant differences in terms of capabilities and utilization. EDRAM was a simpler space as it was generally the final output buffer, wasn't it? ESRAM is much more active in terms of the entire process.

taisui · Oct 4, 2013

gurgi said:
I'm more interested in questions of bus contention, as data has to be moved. Everyone keeps adding bandwidth numbers without any consideration for the reads and writes to and from DDR3.

Compute is already a difficult problem for developers, but with esram, they also have to consider what compute is worth writing to the embedded RAM. What algorithms benefit latency, or bandwidth-wise, yet still need enough operations (3+?) to make esram beneficial?

Haven't people beat this one to death? The 2 pools can be treated as 1 logical page, I don't see why the same ops would require the data to be copied with the ESRAM from the DRAM before processing begins. Aren't data being processed the same way if contained in the DRAM? Besides I don't think the ESRAM is exposed to the CPU, thus it should not suffer from contention (if there's any to begin with).

DrJay24 · Oct 5, 2013

If the data is in the DDR, which most of it will be, and the operation is very bandwidth sensitive, it will have to be copied to eSRAM. Is there some magic where data in DDR can be processed with 100GB/s read and write bandwidth? And when that data is complete and needs to be replaced, it might be copied back to DDR? I don't quite understand the data flow for XB1, the 360 was simpler since it had less degrees of freedom.

taisui · Oct 5, 2013

DrJay24 said:
If the data is in the DDR, which most of it will be, and the operation is very bandwidth sensitive, it will have to be copied to eSRAM. Is there some magic where data in DDR can be processed with 100GB/s read and write bandwidth? And when that data is complete and needs to be replaced, it might be copied back to DDR? I don't quite understand the data flow for XB1, the 360 was simpler since it had less degrees of freedom.

You use the eSRAM exactly as how you'll use with any DRAM.

You read from DRAM, write to eSRAM. (as opposed to, write back to DRAM again)
You read from DRAM and eSRAM, write to eSRAM. (or DRAM, if you like)
You can just output the final frame buffer directly from the eSRAM (you don't have to resolve it like on the 360)

If your algorithm needs to make copy, then you'll need to make copy in a DRAM-only design regardless, but on the X1 there's the Move Engine that saves you GPU cycles.
You can even have a frame that extends across both memory, logically they are one.

Betanumerical · Oct 6, 2013

taisui said:
If your algorithm needs to make copy, then you'll need to make copy in a DRAM-only design regardless, but on the X1 there's the Move Engine that saves you GPU cycles.
You can even have a frame that extends across both memory, logically they are one.

I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

The Move Engines save you GPU cycles, but I don't really think many GPU's these days copy there own memory and waste cycles doing it that way

.

3dilettante · Oct 6, 2013

Betanumerical said:
I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

Given the capacity of the eSRAM, if whatever you're copying needs high bandwidth, it's getting read more than once.

In terms of power, it breaks even if you read the same location twice.
If dealing with reads and writes, it pays off in power and bandwidth after a read and a write.

There's also the option of GPU-generated intermediate buffers that may be created, modified, consumed, and discarded without ever leaving the eSRAM.

adev · Oct 6, 2013

Betanumerical said:
I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

The Move Engines save you GPU cycles, but I don't really think many GPU's these days copy there own memory and waste cycles doing it that way .

Tiling and untiling is probably more significant there.

Splitting tiles across ESRAM busses and banks helps gain the simultaneous reads/writes necessary to approach anything like peak bandwidth.

GCN seems to like things in a "growing squares" order which seems a lot like Morton order.

taisui · Oct 6, 2013

Betanumerical said:
I think hes talking about how you have to use DDR3 bandwidth to move stuff from the DDR3 to the eSRAM, theres no way of getting around this, even if you the algorithm doesn't need to copy the data but it needs high bandwidth it needs to be copied into the eSRAM from the DDR3.

The Move Engines save you GPU cycles, but I don't really think many GPU's these days copy there own memory and waste cycles doing it that way .

As if you read data from DRAM and "move stuff" back to the DRAM, you won't consume DRAM bandwidth?

Betanumerical · Oct 6, 2013

taisui said:
As if you read data from DRAM and "move stuff" back to the DRAM, you won't consume DRAM bandwidth?

You will but whats the point of that? other then making a copy.

taisui · Oct 6, 2013

Betanumerical said:
You will but whats the point of that? other then making a copy.

Exactly.

Betanumerical · Oct 6, 2013

taisui said:
Exactly.

What if your data needs more then ~68GB/s read, you have to have it in the eSRAM.

taisui · Oct 6, 2013

and how do you propose to feed the DRAM at that rate to begin with?

Betanumerical · Oct 6, 2013

taisui said:
and how do you propose to feed the DRAM at that rate to begin with?

You don't but you can certainly read it into the eSRAM then copy back to the DDR3.

Questions about Xbox One's ESRAM & Compute

Similar threads