Understanding XB1's internal memory bandwidth *spawn

No, reality is much more complicated than adding theoretical numbers together.

Of course, but I see plenty of gross simplifications and assumptions in the absence of evidence so I thought I'd boil that down to the bare essentials myself. Why not when people are willing to entertain wild speculation.
 
A multipool system cannot be interpreted correctly from averages of aggregate. I'm sorry but there literally is no way to distil XB1 (or any eDRAM/ESRAM/multipool RAM console) 's bandwidth into a single figure for comparison with other platforms. This goes for PC too, with VRAM and DDR3, and PS3 with split pools. Once you have more than one pool, you have lots of different access types and workloads that hit different bottlenecks. A straightforward comparison won't happen. A real comparison would look at different usage scenarios between hardware solutions and evaluate which ones achieve better performance when. then, if one really wanted (although this is mostly just to appease forum warriors), you could develop a weighting system and score a platform a metric.

I don't want to bring in current gen, but isn't a bit of the same? You can't really compare PS3 to 360's memory bandwidth due to different architectures.

I keep seeing people refer to x1 and ps4 architectures to be the same, but I don't think that's true at all, maybe the cpu and gpu architectures are the same but taken as a whole, they are vastly different.

Perhaps the only proof of the pudding will be multi-platform games once again.
 
As a point of clarity, I remember reading that onion, garlic and onion plus are independent lanes of access to memory for the gpu and CPU. I also remember reading that data traversing one is inaccessible until the client writes back to main memory. How is bandwidth discussed wrt to that memory architecture?
 
The point is that there are more reasons for the XB1 utilization to be lower than the PS4 utilization. 90% and 60% may be no-where near the actual utilization rates. They were simply selected as an example to prove the point.

This isn't a vs thing, I'm not interested in proving which console has more useful bandwidth available to it, I'm just trying to properly understand the strengths and weaknesses of the esram.

Simply accepting Microsofts claim of 276 GB/s blindly without assessing what caveats may be associated with that number is a little foolish IMO.

There is no way you can calculate or justify using the utilization numbers you provide. There's zero basis to think one or the other device is 90 or 60 or anything. Zero. You can use whatever numbers you want but I just see no basis for your utilization decisions. Either you don't think that the architecture of the xb1 is reasonably efficient in order to extract 90% utilization or you think its so exotic that developers couldn't or wouldn't expend the energy to do so.

Other than that I get your argument.
 
Last edited by a moderator:
Err, you're going to have to source that one, guy.

Edit: seems to all be stemming from this post: http://www.giantbomb.com/forums/xbo...box-one-have-a-hypervisor-and-what-i-1437760/. Nothing official that I can see, but the theory seems intriguing.

You found the link for what I read a while back, although I could swear it was on another gaming website. Still it's a pretty big advantage of being able to use up to 7.5 gigs and vastly differs from what is being spread about the web (mainly at neogaf to be honest).
 
You found the link for what I read a while back, although I could swear it was on another gaming website. Still it's a pretty big advantage of being able to use up to 7.5 gigs and vastly differs from what is being spread about the web (mainly at neogaf to be honest).

But it was just a forum post, not from Microsoft. So I dont know if it's valid. Anyways I think this is OT.
 
If I'm understanding it correctly then it doesn't matter how the ratio of reads and writes are split when using GDDR5. You can use 100% of the bandwidth on reads, 100% on writes or anything inbetween. The esram doesn't have that flexibility so for example if your workload is 80% write and 20% read (4:1 ratio), with the PS4's 176 GB/sec you will still achieve 176 GB/s however with the esram you'll achieve 109+(109/4)= 136.25. Hence a lower utilization rate although once you add in the 68 GB/s from the main memory as well the overall bandwidth would still be greater at 204 GB/s.

Again, you are working on a set of really specific parameters with an overly simplified model and trying to reach a broad conclusion. It would just be as easy to find random examples of how a particular data flow would utilize more BW on one system over the other.

Simply accepting Microsofts claim of 276 GB/s blindly without assessing what caveats may be associated with that number is a little foolish IMO.

This is really getting into the straw man territory. I'm pretty sure those who understand how the simultaneous r/w works understands this, and had not claimed to that to be anything otherwise.
Heck, even MSFT themselves had spelled this out in the HotChips slides.
 
Last edited by a moderator:
I don't want to bring in current gen, but isn't a bit of the same? You can't really compare PS3 to 360's memory bandwidth due to different architectures.
Indeed. We had many cyclic arguments back in the day from people trying to compare BW!

I keep seeing people refer to x1 and ps4 architectures to be the same, but I don't think that's true at all, maybe the cpu and gpu architectures are the same but taken as a whole, they are vastly different.
Slightly different, and closer than any other consoles in history. Not only are the GPU and CPU's comparable, they are the same architectures capable of running the same code. Obviously the system software changes the portability of code, but the fundamental architectures in terms of CPU and GPU and workload distribution is the same on both platforms. The variations are fairly slight - ESRAM on XB1 warranting management, audio on a custom processor on XB1, apparently a little more compute support on PS4, and a load of small ancillary extras like how many display planes.

Perhaps the only proof of the pudding will be multi-platform games once again.
There are no clear reference points. A multiplatform title shows what a platform can achieve when targeted as such. A platform exclusive will show what developers with a different remit will achieve with the same hardware. Unless you have access to the game profiling tools, you can never really understand a hardware architecture. For forum wars, after a few years of cross-platform titles, a common superiority by one platform will be taken as consensus of that platform being superior, but that doesn't tell us anything about XB1's internal memory structure. ;)
 
The reality is a bit more complicated than that. DRAM is heavily optimized for localized or linear accesses with writes and reads not being mixed together. Internally, the DRAM is heavily subdivided, slower, and it can't keep everything at the ready at all times. It also incurs a penalty whenever it has to switch from reads to writes.

The memory subsystem tries very hard to schedule accesses so that they hit as few bank and turnaround penalties as possible, but this isn't simple to do with other constraints like latency and balancing service to multiple clients.

Ideally, the eSRAM could dispense with all of this, and gladly take any mix that works within the bounds of its read and write ports.
However, the peak numbers and articles on the subject suggest that for various reasons there are at least some banking and timing considerations that make the ideal unreachable. The physical speed of the SRAM and the lack of an external bus probably mean that the perceived latency hierarchy is "flatter" than it would be if you were spamming a GDDR bus with reads and writes with poor locality.

This is where I assume the hinted advantages the eSRAM has for certain operations come in, where the access pattern starts interspersing reads with writes, or there is poor locality.
+1
Very true! There is nothing to add.
 
The reality is a bit more complicated than that. DRAM is heavily optimized for localized or linear accesses with writes and reads not being mixed together. Internally, the DRAM is heavily subdivided, slower, and it can't keep everything at the ready at all times. It also incurs a penalty whenever it has to switch from reads to writes.
.

Any idea on what the penalty between a switch from a read to write is?

I assume when you have multiple clients like GPU, CPU, and maybe other units that further complicates things for the memory subsystem in trying to schedule read/writes. I'm also assuming that there maybe a priority scheme GPU over CPU requests for example. With the SRAM being dedicated to the GPU, all those scenarios should be more straight forward.

It's one thing to run an isolated benchmark on a single component vs a one in a fully loaded system (8 CPU cores, GPU processing, audio, networking, and whatever else maybe). I can really see scenarios where the X1 GPU can sustain good performance when executing out of the SRAM because it doesn't need to share it's bandwidth with anyone else.
 
Hey guys im pretty new here so dont give me to much of a hard time for this.
From the hot chips slide the esram 32mb is divided into four 8mb blocks each with a 256 bit bus. Being that the esram is divided as such can it not independently use these 8mb sections hence giving it the ability to read and write simultaneously.

Im not making any kind of statement more of a question
 
So what I'm taking from this is that Xbox One should be really good at rendering particles & effects on the screen & that maybe The Xbox One was designed with MegaTexture in mind.
 
If I'm understanding it correctly then it doesn't matter how the ratio of reads and writes are split when using GDDR5. You can use 100% of the bandwidth on reads, 100% on writes or anything inbetween.

Actually GDDR5 has twice IO pins than DDR3. So GDDR5 is read/write parallel. This could mean that a very achievable 352GB/s read/write workload is a easily achieved. Of course there is some pipelining within its 8N prefetch architecture.

EDIT:

Samsung states that GDDR5: double pumping(DDR) address rate/SDR command rate/QDR data rate.

DDR4 and GDDR5 Synopsis explanation: http://www.synopsys.com/Company/Publications/SynopsysInsight/Pages/Art5-ddr4-IssQ2-13.aspx
 
Last edited by a moderator:
Hey guys im pretty new here so dont give me to much of a hard time for this.
From the hot chips slide the esram 32mb is divided into four 8mb blocks each with a 256 bit bus. Being that the esram is divided as such can it not independently use these 8mb sections hence giving it the ability to read and write simultaneously.

Im not making any kind of statement more of a question

that's exactly how they came up with 274 number; its 68.5x4... j/k

On a serious note I'm curious if te XB1 might have an advantage with particle effects?
 
So all gddr5 is twice the bandwidth as have been advertised/marketed by samsung amd nvidia everyone!?

Erm, yeah, I think you've missed a key variable in your calculation there.
 
Actually GDDR5 has twice IO pins than DDR3. So GDDR5 is read/write parallel. This could mean that a very achievable 352GB/s read/write workload is a easily achieved.

No. You misunderstood.
 
I saw it mentioned previously he Panello had stated the 204 number was incorrect and that it was actually 218, making more sense for a dual ported architecture being 2x the 109. While it "makes sense" I'm not sure how it fits with the pre-upclock numbers of 102 and 192.

Any thoughts as to which is more likely? Do we have the more rational 2x rate as he suggested recently, and MS made some 7/8 math mistake TWICE, or are we still going to be left with some mystery astrophysics? Any bets? What's the line on this one...
 
I wonder how much further this conversation can go without a simple use case, with actual reasonable figures etc.
It may not be correct, but it would be a reasonable basis/frame to discuss around.
x amount of textures,
x amount of shadow maps,
x amount of geometry,
x amount of overdraw, and using esram,
for 1080p, 60fps etc

I know I would find it interesting.

Otherwise it's just random utilization figures, although I understand that they are being used to prove a point.
 
that's exactly how they came up with 274 number; its 68.5x4... j/k

On a serious note I'm curious if te XB1 might have an advantage with particle effects?

What? How are you getting the 68.5 x 4 out of my question about the esram being split into 4 blocks with 4 256 bit buses?

Im just trying to find out if it being divided into 4 sections gives it the ability to read and write at the same time
 
Back
Top