Understanding XB1's internal memory bandwidth *spawn

it's also important to understand what percentage of that bandwidth can be achieved in the real world. And since the XB1 must rely on a perfect balance of read/write ops to the esram to achieve it's peak bandwidth while the PS4 does not have the same limitation, the PS4 will likely be able to achieve a higher percentage utilization of it's theoretical bandwidth than the XB1.

So PS4 doesn't need a perfect balance to reach peak BW? Any system does, peak is not sustainable in practical situations.

And I'm telling you that utilization percentage does not matter, what matters is the data rate.

If you have a 100Mbps runnint at 100% versus a 1000Mbps wire at 15%, which one is "better"?
 
So PS4 doesn't need a perfect balance to reach peak BW? Any system does, peak is not sustainable in practical situations.

And I'm telling you that utilization percentage does not matter, what matters is the data rate.
When you're talking about 180 GB/s versus 220 GBPs, say, then utilisation percentage can matter. But this is a fairly pointless OT discussion. They are different ways at looking at the memory systems, like any comparison - you can talk about 'difference' and 'ratio' and both are factually correct and relevant in context of what they mean.

As the peak BW of XB1's system has no bearing on how the eSRAM works, and we have a thread about the memory systems already spawned from this one, can we please just abandon that subject here now?
 
When you say "constantly streaming data" do you mean copying from DDR3 to ESRAM and from ESRAM to DDR3? What exactly do you mean when you say "constantly"?

Streaming is probably a bad term. If the ESRAM is idle, it can't help offload the DDR3 and hence the memory bandwidth will become a huge bottleneck for 12CUs. I assume the two pools will have to juggle data and the GPU will operate on data in both pools in parallel. Of course having the "right" data in ESRAM is the tricky part since it is only 32MB.
 
If the ESRAM is not constantly streaming data in and out then it is not being effectively used. If it is not being used, then the system bandwidth is going to approach 68GB/s. To alleviate the bandwidth of the DDR3, the two pools have to be used in parallel as much as possible.

The Move engines all share the same bandwidth (~25GB/s over all four), so why would you split up the job of moving data?

Not sure, I just got this thought in my head regarding the size, and then realized that the 4 move engines matches up perfectly with the 4 blocks of ESRAM and figured I'd toss it out there to see if it possibly means anything.

Yea, I already figured that data will constantly have to be streaming in and out, as you aren't getting the benefit of it otherwise, but what I mean to say is perhaps the amount of data that will be getting moved in and out, specifically between ESRAM and DDR3, may be so small per individual copy (since I suppose there's only so much that can go in and out of 32MB worth of memory at any one time), that perhaps much less time than we may think would actually be spent consuming the system's bandwidth in order to facilitate a copy between the two pools of memory.

Or maybe that's not the best way to put it. Maybe the time spent handling this kind of copy operation between the two pools will be a lot less taxing on the system's overall bandwidth than initially thought due to the size, and thus speed, at which such small amounts of data can be moved, and especially if two or even all move engines are assisting. I know they share the same 25.6GB/s worth of bandwidth, but what I mean is they don't necessarily all have to be doing the same type of copy, do they?

2 could be moving from ESRAM to DRAM at the same time that 2 are moving from DRAM to ESRAM, and I really didn't think of them like this until just now. I was treating them as if they all had to be doing the same exact thing, and in the same order (ESRAM to DRAM or DRAM to ESRAM. Didn't consider they could be doing both simultaneously). Can't believe I didn't consider this sooner, as I suppose it didn't make sense for the rest of the move engines to be idle if you're already using some to move data out of ESRAM to DDR3. It makes sense that at the same time this is happening, you'd want the move engines immediately bringing new data into ESRAM from DDR3 during the same exact cycle or whatever.

Then I'm maybe also not accounting for the possibility that devs may choose to leave specific pieces of data inside the ESRAM at all times, which should mean less data that has to be shuffled in and out.
 
Last edited by a moderator:
maybe off-topic..but is there a huge different between 512mb of DDR3 + 10mb of EDRAM?(Xbox 360) vs 8GB of DDR3 + 32MB of ESRAM?(Xbox one),because *developers* are saying is a pain to use the ESRAM.
 
maybe off-topic..but is there a huge different between 512mb of DDR3 + 10mb of EDRAM?(Xbox 360) vs 8GB of DDR3 + 32MB of ESRAM?(Xbox one),because *developers* are saying is a pain to use the ESRAM.

The eDRAM on the 360 are used for frame buffer and frame buffer only. The API wraps it up pretty well, so the tiling/predication/resolve is kinda transparent to the programmers.

The eSRAM is a separated pool of memory, which is general purpose but software managed, meaning you have to specifically program for it.
 
maybe off-topic..but is there a huge different between 512mb of DDR3 + 10mb of EDRAM?(Xbox 360) vs 8GB of DDR3 + 32MB of ESRAM?(Xbox one),because *developers* are saying is a pain to use the ESRAM.

Hey, their pain is our gain. Better question might be 5GB of DDR3 + 32MB of ESRAM, and yea, I think there's a massive difference. Probably about 4.5GB (rounded up) worth of difference if my math is right. And if you account for the 360 OS reservation, and whatever other reserves there may be, then this number could be higher. Basically, it's a ton more data for developers to try and micromanage between two pools of memory. Sorry if off topic.

Most of this thread has been so far above my head that I just observed from a distance. Once I thought the conversation turned a little bit more to something I can possibly understand, I saw an opening, so I jumped in lol. But I've probably already spent all the inspiration I may have received from reading the thread in my last few posts, so I may go back to just lurking. :)

edit:: I'm sure the devs will figure something out.
 
1Bone will require developers to work out what to put in esram and when to put it there, and that in turn could (should?) impact on when they schedule certain tasks. It's more work, and probably not something that developers of launch games will have much time to concern themselves with.

I can imagine a lot of launch titles using the esram like the 360's edram and just putting render targets in there and leaving a lot of esram related performance gains for later titles.
 
I think the baseline for multi-platform titles is: what do we get with a game that was ported with minimal effort. There will always be titles ported over that have this minimal effort taken. Going from there, another important factor is what will be the lead console platform.
 
So PS4 doesn't need a perfect balance to reach peak BW? Any system does, peak is not sustainable in practical situations.

If I'm understanding it correctly then it doesn't matter how the ratio of reads and writes are split when using GDDR5. You can use 100% of the bandwidth on reads, 100% on writes or anything inbetween. The esram doesn't have that flexibility so for example if your workload is 80% write and 20% read (4:1 ratio), with the PS4's 176 GB/sec you will still achieve 176 GB/s however with the esram you'll achieve 109+(109/4)= 136.25. Hence a lower utilization rate although once you add in the 68 GB/s from the main memory as well the overall bandwidth would still be greater at 204 GB/s.

And I'm telling you that utilization percentage does not matter, what matters is the data rate.

If you have a 100Mbps runnint at 100% versus a 1000Mbps wire at 15%, which one is "better"?

As I said earlier, I'm not trying to measure utilization rate in isolation. You say data rate is is what matters. Well how do you arrive at data rate? It's peak theoretical (which we already know for the esram) multipled by utilization rate (which is what I'm trying to determine). Clearly it is relevant.

With regards to your example, I have a better one. If you have a 176 GB/s running at 90% versus a 274 GB/s at 60%, which one is "better"?

I'm not saying those would be the real utilization rates, I'm just giving an example which highlights the importance of that metric.
 
If I'm understanding it correctly then it doesn't matter how the ratio of reads and writes are split when using GDDR5. You can use 100% of the bandwidth on reads, 100% on writes or anything inbetween. The esram doesn't have that flexibility so for example if your workload is 80% write and 20% read (4:1 ratio), with the PS4's 176 GB/sec you will still achieve 176 GB/s however with the esram you'll achieve 109+(109/4)= 136.25. Hence a lower utilization rate although once you add in the 68 GB/s from the main memory as well the overall bandwidth would still be greater at 204 GB/s.



As I said earlier, I'm not trying to measure utilization rate in isolation. You say data rate is is what matters. Well how do you arrive at data rate? It's peak theoretical (which we already know for the esram) multipled by utilization rate (which is what I'm trying to determine). Clearly it is relevant.

With regards to your example, I have a better one. If you have a 176 GB/s running at 90% versus a 274 GB/s at 60%, which one is "better"?

I'm not saying those would be the real utilization rates, I'm just giving an example which highlights the importance of that metric.

those are very random callouts. If each rate was at 90% utilization then your argument really falls apart...
 
those are very random callouts. If each rate was at 90% utilization then your argument really falls apart...

The point is that there are more reasons for the XB1 utilization to be lower than the PS4 utilization. 90% and 60% may be no-where near the actual utilization rates. They were simply selected as an example to prove the point.

This isn't a vs thing, I'm not interested in proving which console has more useful bandwidth available to it, I'm just trying to properly understand the strengths and weaknesses of the esram.

Simply accepting Microsofts claim of 276 GB/s blindly without assessing what caveats may be associated with that number is a little foolish IMO.
 
Hey, their pain is our gain. Better question might be 5GB of DDR3 + 32MB of ESRAM, and yea, I think there's a massive difference. Probably about 4.5GB (rounded up) worth of difference if my math is right. And if you account for the 360 OS reservation, and whatever other reserves there may be, then this number could be higher. Basically, it's a ton more data for developers to try and micromanage between two pools of memory. Sorry if off topic.

Most of this thread has been so far above my head that I just observed from a distance. Once I thought the conversation turned a little bit more to something I can possibly understand, I saw an opening, so I jumped in lol. But I've probably already spent all the inspiration I may have received from reading the thread in my last few posts, so I may go back to just lurking. :)

edit:: I'm sure the devs will figure something out.

One thing, about the 5GB talk, someone at Microsoft already commented in the press, that games can use more then 5GB. I'm pretty sure MS already stated that they could go up to a maximum of 7.5GB, which would mean some resident apps would have to be suspended if a game required that much memory.

It's irks me to see people keep saying there's 3GB reserved for the OS when MS has already stated otherwise. The 3GB was a rumor posted before the XB one reveal which isn't totally correct.
 
maybe off-topic..but is there a huge different between 512mb of DDR3 + 10mb of EDRAM?(Xbox 360) vs 8GB of DDR3 + 32MB of ESRAM?(Xbox one),because *developers* are saying is a pain to use the ESRAM.

Most likely because of what they are comparing it to.
The odds are effectively utilizing the ESRAM is more difficult than the single pool Sony provides.
Whereas compared to PS3, the difficulties of EDRAM were at best a minor inconvenience,
 
So xb1 had a minimum bandwidth about equal to ps4, and a maximum about 100gB/s higher, falling somewhere in between depending on how effectively simultaneous reads and writes can be maintained. Is that about where we are right now?
 
So xb1 had a minimum bandwidth about equal to ps4, and a maximum about 100gB/s higher, falling somewhere in between depending on how effectively simultaneous reads and writes can be maintained. Is that about where we are right now?

No, reality is much more complicated than adding theoretical numbers together.
 
One thing, about the 5GB talk, someone at Microsoft already commented in the press, that games can use more then 5GB. I'm pretty sure MS already stated that they could go up to a maximum of 7.5GB, which would mean some resident apps would have to be suspended if a game required that much memory.

It's irks me to see people keep saying there's 3GB reserved for the OS when MS has already stated otherwise. The 3GB was a rumor posted before the XB one reveal which isn't totally correct.

Err, you're going to have to source that one, guy.

Edit: seems to all be stemming from this post: http://www.giantbomb.com/forums/xbo...box-one-have-a-hypervisor-and-what-i-1437760/. Nothing official that I can see, but the theory seems intriguing.
 
So xb1 had a minimum bandwidth about equal to ps4, and a maximum about 100gB/s higher, falling somewhere in between depending on how effectively simultaneous reads and writes can be maintained. Is that about where we are right now?
A multipool system cannot be interpreted correctly from averages of aggregate. I'm sorry but there literally is no way to distil XB1 (or any eDRAM/ESRAM/multipool RAM console) 's bandwidth into a single figure for comparison with other platforms. This goes for PC too, with VRAM and DDR3, and PS3 with split pools. Once you have more than one pool, you have lots of different access types and workloads that hit different bottlenecks. A straightforward comparison won't happen. A real comparison would look at different usage scenarios between hardware solutions and evaluate which ones achieve better performance when. then, if one really wanted (although this is mostly just to appease forum warriors), you could develop a weighting system and score a platform a metric.
 
If I'm understanding it correctly then it doesn't matter how the ratio of reads and writes are split when using GDDR5. You can use 100% of the bandwidth on reads, 100% on writes or anything inbetween. The esram doesn't have that flexibility so for example if your workload is 80% write and 20% read (4:1 ratio), with the PS4's 176 GB/sec you will still achieve 176 GB/s however with the esram you'll achieve 109+(109/4)= 136.25. Hence a lower utilization rate although once you add in the 68 GB/s from the main memory as well the overall bandwidth would still be greater at 204 GB/s.
The reality is a bit more complicated than that. DRAM is heavily optimized for localized or linear accesses with writes and reads not being mixed together. Internally, the DRAM is heavily subdivided, slower, and it can't keep everything at the ready at all times. It also incurs a penalty whenever it has to switch from reads to writes.

The memory subsystem tries very hard to schedule accesses so that they hit as few bank and turnaround penalties as possible, but this isn't simple to do with other constraints like latency and balancing service to multiple clients.

Ideally, the eSRAM could dispense with all of this, and gladly take any mix that works within the bounds of its read and write ports.
However, the peak numbers and articles on the subject suggest that for various reasons there are at least some banking and timing considerations that make the ideal unreachable. The physical speed of the SRAM and the lack of an external bus probably mean that the perceived latency hierarchy is "flatter" than it would be if you were spamming a GDDR bus with reads and writes with poor locality.

This is where I assume the hinted advantages the eSRAM has for certain operations come in, where the access pattern starts interspersing reads with writes, or there is poor locality.
 
Back
Top