Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
He doesn't mean half he meant points 1-2 in the slide. This one

eqwaeegqd6icqtk723ve.jpg


One thing I wonder about "put the sky in DRAM" thing is, what if in the middle of a big chaotic battle, one looks down just enough where the sky is not onscreen? Will framerates plummet?

I suppose you'd likely then have a big patch of inactive "ground" on the bottom. Which you could theoretically put in ESRAM instead of sky. But would this be dynamically possible? Or what about if just the whole screen is filled with action and effects blotting out the sky?
 
That presentation is interesting.

Interestingly they list all the specs as we know them, 853 mhz GPU etc, but list the ESRAM as 102 GB/s (not 109 or 204), with the caveat "sometimes faster in practice".

Also interesting to me was they called out the DDR3 as "low latency". Which we haven't heard a lot. Although did not say compared to what.

And if they write it that way, then going forward we shouldn't deviate from that number until they specify again otherwise.

102 + 67 it is.

2nd wave of titles tackling pts 3-4 is interesting. Pt 4 in particular sounds like it is what makes the bandwidth "increase" above 102 GB if you are going to Async DMA resources in and out of ESRAM while it's rendering on ESRAM.
 
Well it should be at least 109 I think after the upclock. Probably just using old specs there.

"Sometimes faster in practice" probably equals the extra read/write bandwidth, 140-150 GB in practice, up to 204 in theory (and we use theoretical max in every other bandwidth spec)
 
He doesn't mean half he meant points 1-2 in the slide. This one
Exactly, sorry for not being clear XD

One thing I wonder about "put the sky in DRAM" thing is, what if in the middle of a big chaotic battle, one looks down just enough where the sky is not onscreen? Will framerates plummet?

I suppose you'd likely then have a big patch of inactive "ground" on the bottom. Which you could theoretically put in ESRAM instead of sky. But would this be dynamically possible? Or what about if just the whole screen is filled with action and effects blotting out the sky?

Yeah, I would like more details on that too. But I can't see that being a feasible suggestion unless you can re-split the buffer very fast, like each frame so you never see prolonged frame drops.

I did find interesting that pretty much all their performance tips were something like: Use the esram for everything you can. And coordinate the DMEs so they can help using ddr3 as a cache... It does seem that the issues on framerate/resolution are to blame on esram.

The completely lack of mention of tiling also got me thinking: I always wondered why so many games, even ones that had their engines adapted to tiling on 360 were having issues hitting 1080p on xbone... Then it hit me: The way tiling as was done in 360 most likely needed hardware support, and since xbone's gpu has virtualized memory access, allowing even a buffer to be split on ddr3 and esram, there was no need to support tiling in hardware, but that also means that the engines that were adapted for tiling on 360 probably won't work on the xbone... That would explain the talks about how some developers found that 32mb just wasn't enough...
 
It has always been communicated that 102/109 is "guaranteed" BW to the ESRAM with higher potential for the dual porting when you code specifically for it. The fact of the matter is that they've measured typical at 140s with real games. It's a freaking dead horse at this point and frankly I just don't understand the insistence on doing this over and over again every single month as if this is some sort of conspiracy and some of you "discovered the truth".
 
Last edited by a moderator:
It has always been communicated that 102/109 is "guaranteed" BW to the ESRAM with higher potential for the dual porting when you code specifically for it. The fact of the matter is that they've measured typical at 140s with real games. It's a freaking dead horse at this point and frankly I just don't understand the insistence on doing this over and over again every single month as if this is some sort of conspiracy and some of you "discovered the truth".

Well, if you know how to do it then please contact professional developers because it looks as if some games could really use that extra bandwidth.

No bandwidth is ever guaranteed. Even the PS4 hovers around 110-130GB/sec during most operations.
 
2nd wave of titles tackling pts 3-4 is interesting. Pt 4 in particular sounds like it is what makes the bandwidth "increase" above 102 GB if you are going to Async DMA resources in and out of ESRAM while it's rendering on ESRAM.
Async DMA means they'll start using the DME to balance the bandwidth usage between ESRAM and DDR3, moving stuff around. This would actually waste some bandwidth from each pool, not create more. But it would help using both pools closer to their max, instead of having a bottleneck.
 
Well, if you know how to do it then please contact professional developers because it looks as if some games could really use that extra bandwidth.

No bandwidth is ever guaranteed. Even the PS4 hovers around 110-130GB/sec during most operations.

I'm reluctant to go against the words of the system's designers, multiple other disclosures, and measured bandwidth numbers. The usage cases for the numbers seemed pretty typical and reasonable for graphics loads, and the slide's number seems to be out of date or not completely explained.

As far as guaranteed bandwidth goes, the ESRAM is on-die and not subject to the same level of variability that external DRAM is. On-die storage with a bidirectional interface is typically considered close enough to take the given specifications as a good baseline, if only because it would require additional reasons why it should have problems meeting its minimum.
 
Async DMA means they'll start using the DME to balance the bandwidth usage between ESRAM and DDR3, moving stuff around. This would actually waste some bandwidth from each pool, not create more. But it would help using both pools closer to their max, instead of having a bottleneck.

oh snap, I guess i was completely out to lunch with that comment.

For some reason I thought the idea was to leverage the DDR3 as cache using DME to move information for processing over to ESRAM so that the GPU wouldn't waste cycles moving information around.
That the GPU had access to ESRAM only 1024b wide which gave the 102GB/s, and that the 4 DMEs could move bits in and out of there on their own rate @ 256b per cycle (27GB/s), since there should be 31MB of cache remaining.

Guess not though lol, I suppose the bus would be 2048b wide if that was the case.
 
DME bandwidth is less than 30 GB/s in each direction, and that is a common path for all 4 DMEs and sundry units like the display engine and video decode blocks.
 
DME bandwidth is less than 30 GB/s in each direction, and that is a common path for all 4 DMEs and sundry units like the display engine and video decode blocks.

Right thanks for clarification! Finished with the bathroom where I do this type of thinking. Wow, it must be incredibly hard to hammer the bandwidth to that level.

I'm guessing due to Zero Bus Turnaround no cycle is wasted on switching from read/write, so on the up clock, you'd read data into GPU registers, and on the down clock you'd write back to ESRAM from completed work. This is how the theoretical above 102GB/s is achieved.

The maximum amount of work bandwidth would be read 1024 bits and write back 1024 bits in a single clock, how long could this process be reasonably sustained?
Sounds incredibly hard to line that data like that continually, eventually you need to work on items that are not contained within that 32MB. So the DMEs are basically moving data in and out when the GPU isn't queued up with reads and writes, and that would help bring the overall GPU saturation higher, to a degree.


Each move engine can read and write 256 bits of data per GPU clock cycle, which equates to a peak throughput of 25.6 GB/s both ways. Raw copy operations, as well as most forms of tiling and untiling, can occur at the peak rate. The four move engines share a single memory path, yielding a total maximum throughput for all the move engines that is the same as for a single move engine. The move engines share their bandwidth with other components of the GPU, for instance, video encode and decode, the command processor, and the display output. These other clients are generally only capable of consuming a small fraction of the shared bandwidth.

Read more at: http://www.vgleaks.com/world-exclusive-durangos-move-engines/
 
Last edited by a moderator:
Well, if you know how to do it then please contact professional developers because it looks as if some games could really use that extra bandwidth.

No bandwidth is ever guaranteed. Even the PS4 hovers around 110-130GB/sec during most operations.

...so essentially you are suggesting that we should say that this system only has 130GB of BW and that 176GB is a lie then? :oops:
Frankly I don't think you have the technical knowledge in the things that you are talking about.
 
...so essentially you are suggesting that we should say that this system only has 130GB of BW and that 176GB is a lie then? :oops:
Frankly I don't think you have the technical knowledge in the things that you are talking about.

just like 204 GB/s is the max for esram, 176 is the max for the PS4 memory. In pratice, you can use something like ~130GB for PS4 and 150 for esram (+ DDR3) . it is just the difference of the theoretical maximum and the practical.
according to sonys own presentation, the cpu also steals bandwith (around the double it needs). so in pratice there should be 100-130GB/s for the GPU, depending on what the CPU does.
PC GPUs have the same issue with the theoretical bandwith.
 
just like 204 GB/s is the max for esram, 176 is the max for the PS4 memory. In pratice, you can use something like ~130GB for PS4 and 150 for esram (+ DDR3) . it is just the difference of the theoretical maximum and the practical.
according to sonys own presentation, the cpu also steals bandwith (around the double it needs). so in pratice there should be 100-130GB/s for the GPU, depending on what the CPU does.
PC GPUs have the same issue with the theoretical bandwith.

Preaching to the choir, my friend.
It's not so much "stealing" it's just contention. PC GPUs mostly have dedicated RAM don't really have such issue, ESRAM on the X1 doesn't have this issue either.
 
Last edited by a moderator:
just like 204 GB/s is the max for esram, 176 is the max for the PS4 memory. In pratice, you can use something like ~130GB for PS4 and 150 for esram (+ DDR3) . it is just the difference of the theoretical maximum and the practical.
according to sonys own presentation, the cpu also steals bandwith (around the double it needs). so in pratice there should be 100-130GB/s for the GPU, depending on what the CPU does.
PC GPUs have the same issue with the theoretical bandwith.

There are worse cases for external DRAM. Though the hardware tries mightily to arrange traffic for non-ideal access patterns and read/write mixes, it has a finite capability to massage things for the DRAM bus and devices.

Barring additional disclosures on the ESRAM indicating additional weaknesses, it doesn't have a reason to drop an order of magnitude below peak.
 
Well, if you know how to do it then please contact professional developers because it looks as if some games could really use that extra bandwidth.

No bandwidth is ever guaranteed. Even the PS4 hovers around 110-130GB/sec during most operations.

Right, we should just be sure to compare like for like. Normally the numbers thrown around are theoretical max, regardless of practicality.
 
There was an official Sony developer document (presentation?) chart on one of the pages, showing that if the CPU was eating 20GB/sec, then total bandwidth could drop to 110GB (for the whole system that is)

It was a slide demonstrating that the GPU bandwidth was affected disproportionately the more CPU bandwidth was in use. Saturating the CPU bus is not presented as typical so you should often be well above that 110GB/s low water mark for the GPU. That's hardly the figure for "most situations" as you initially put it.
 
It was a slide demonstrating that the GPU bandwidth was affected disproportionately the more CPU bandwidth was in use. Saturating the CPU bus is not presented as typical so you should often be well above that 110GB/s low water mark for the GPU. That's hardly the figure for "most situations" as you initially put it.

Here it is on the slide titled: "CPU and GPU Bandwidth Interaction"

http://develop.scee.net/files/presentations/gceurope2013/ParisGC2013Final.pdf

It shows the maximum bandwidth with no CPU utilisation even at under 140GB/sec. (I'd say 135GB) That should resemble "most situations" then, no?

Make no mistake, I am truly a Sony fan, I just brought it up :)

Btw I initially stated:

No bandwidth is ever guaranteed. Even the PS4 hovers around 110-130GB/sec during most operations.
 
Status
Not open for further replies.
Back
Top