Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
At the end of the day if Mark Cerny did the math and worked out that 1TB of bandwidth from EDRAM wouldn't offset the advantages of a pure DDR5 solution, then what is the extra 88% worth really? Other than trying desperately to conjure some magic good will juju.
Cerny's decision was based, in his own words, on the ease of programming to developers, not the theoretical performance.
 
What can I say.., that's fascinating stuff.

Perhaps your theory is the correct answer because the model of data management that the ESRAM provides programmers with means their memory turn around is in the order of a couple sets of ten cycles -or just ten- which opens up more algorithms and operations.

I've made some possibly overbroad assumptions to gloss over a bunch of unknowns, such as the port count and banking.
The basic idea is that the queue fills up and at some point a write has to commit to the eSRAM and blocks the extra read or reads.

Cycle counting may have some onerous requirements, since other requestors can interrupt the stream and throw off the count, and the graphics functions may not come at the times expected, since there are caches and the GPU's own scheduling that can flip things around.
The pipeline itself is longer, so this is also guessing about the state of the system many cycles in the future.

That may explain why the real-world numbers don't hit the theoretical max, since there could be restrictions and other request events that knock the count off. A tight blending operation that dominates the mix of requests can tilt the odds in its favor.
 
Does the fact MS engineers are putting this info out there mean that Donald Duck's question:

Another interesting question is for what the ESRAM could be used in XBO architecture (including OS).

Scratch pad, high level cache, etc. Who owns it, how their space is accesible, is the programmer who decides what to put there, it is obscured, etc.

... could partially be answered. It will most likely be a "scratchpad" as opposed to being "simply" a large cache (Haswell ) ? Oh and I always assumed the ESRAM was for the gaming VM since it would do you more good there than "partitioning" some of it off to the App VM. I assume Apps don't need such a cache they just need a couple of CPU cores a few GPU resources and a chunk of memory.
 
Has the "5 billion transistors" been explained in any way yet?
1.6 billion for eSRAM, sure, but that's still a good 400 million more left than PS4 has for total?
 
I've made some possibly overbroad assumptions to gloss over a bunch of unknowns, such as the port count and banking.
The basic idea is that the queue fills up and at some point a write has to commit to the eSRAM and blocks the extra read or reads.

Cycle counting may have some onerous requirements, since other requestors can interrupt the stream and throw off the count, and the graphics functions may not come at the times expected, since there are caches and the GPU's own scheduling that can flip things around.
The pipeline itself is longer, so this is also guessing about the state of the system many cycles in the future.

That may explain why the real-world numbers don't hit the theoretical max, since there could be restrictions and other request events that knock the count off. A tight blending operation that dominates the mix of requests can tilt the odds in its favor.

Yeah, I get the feeling that if they only discover the behavior this late in the cycle, it's probably something the developers need to setup specifically/carefully for it to see the light (instead of a general solution).
 
Yeah, I get the feeling that if they only discover the behavior this late in the cycle, it's probably something the developers need to setup specifically/carefully for it to see the light (instead of a general solution).

Or implemented it the DF article did mention MS teams are still busy implementing features and optimize them later on?
 
Or implemented it the DF article did mention MS teams are still busy implementing features and optimize them later on?

If it's planned all along, they would have mentioned it in the specs. It sounds like something they stumbled upon during optimization.

After all they talk about cloud computing power when the servers are not up yet, and the APIs are not fully there yet.
 
Or implemented it the DF article did mention MS teams are still busy implementing features and optimize them later on?

The behavior is in the memory subsystem, and it might be revealing some of the peculiarities of the implementation most of those involved in higher-level development and marketing had little interest in exploring--until they needed bigger numbers.

On the other hand, the engineers of the system itself that didn't write game code might have looked at this and gone "well, duh".
There are memory controllers out there that can satisfy requests from pending writes, possibly many of them in this performance bracket.

If there is a variant of a memory controller's queueing functionality in place, the memory interface engineers would have seen this as being an assumed part of the system's behavior and would not have made particular note of it any more than they would have for any other memory controller they've made.

The eSRAM stands out and breaks through the mid-level abstractions and makes its presence known to programmers, and there is an asymmetry in the write/read paths that allow it to be visible.
If the paths were symmetrical, there either would have been a bigger number to start with or this "discovery" wouldn't have happened.


I am curious if this leaks out a bit of the underlying details of the system, such as where in the hierarchy you can find the eSRAM and what tech AMD co-opted to interface with it.
The other is possibly just a numerological coincidence, but the theoretical peak bandwidth numbers are what Sony would have had before it downgraded the GDDR5 speeds. It might not be that big a coincidence since the consoles had the same tech used for the GPU's interface with its memory subsystem.
 
One last random bit of unsubstantiated speculation on the bandwidth improvement:

The change from preproduction to production may have coincided with a change in the values loaded into the control registers for the memory controllers.
Until fully validated, extra optimizations and bypass paths may have been kept off as functionality was analyzed and tweaked. Dev kits could still run on the known good settings.

Once validated, the control register loadouts were changed to enable optimizations for the memory controllers.
If the eSRAM's control logic has familial ties to a memory controller, what's good for one could have been considered good for the other in the design process, and the switch could have been flipped for similar reasons.
 
Any thoughts on the silicon for the new Kinect? How many chips, where they are (in the Kinect module or the Xbox One) and how many transistors they might take up?
No idea. The original plan was to have silicon in the One taking care of the Kinect, but I have no clue if that was in the final design.
 
Any confirmation from Leadbetter yet about the clocks?

I can feel the speculators getting rejuvenated by this news!
This part:
While none of our sources are privy to any production woes Microsoft may or may not be experiencing with its processor, they are making actual Xbox One titles and have not been informed of any hit to performance brought on by production challenges. To the best of their knowledge, 800MHz remains the clock speed of the graphics component of the processor, and the main CPU is operating at the target 1.6GHz
 
The article makes a point of indicating that the baseline bandwidth of the interface is still 102.4 GB/s, so it's not hinting at anything different than the 800 MHz clock already assumed.

My interpretation of the following:
Well, according to sources who have been briefed by Microsoft, the original bandwidth claim derives from a pretty basic calculation - 128 bytes per block multiplied by the GPU speed of 800MHz offers up the previous max throughput of 102.4GB/s. It's believed that this calculation remains true for separate read/write operations from and to the ESRAM. However, with near-final production silicon, Microsoft techs have found that the hardware is capable of reading and writing simultaneously.

The distinction of separate operations sounds like the already given 102.4 GB/s doesn't change if the reads and writes go to different places.

I'm interpreting a read and write not being separate as them having the same target addresses, which allows for forwarding.
 
At the end of the day if Mark Cerny did the math and worked out that 1TB of bandwidth from EDRAM wouldn't offset the advantages of a pure DDR5 solution, then what is the extra 88% worth really? Other than trying desperately to conjure some magic good will juju.

Well, I've been told that the ESRAM solution was just something they thought of and discussed, but there wasn't a design or prototype for a PS4 based ESRAM architecture - so it wasn't like they had two competing designs for PS4, one with GDDR5 the other with ESRAM.

Mark was just saying it was something they thought of and discarded because one of the primary tenants of the design has always been ease of development.
 
No idea. The original plan was to have silicon in the One taking care of the Kinect, but I have no clue if that was in the final design.

Bkilian, are you being sarcastic on SHAPE and the audio cores not taking up that many transistors?

...

I will ask Richard to clarify what he means by this article, if anyone (shifty, 3dilettante) wants to give me a set of technical questions I can get him to respond to that'd be good.

And there's no downclock for either GPU or CPU (I have that from a source other than Richard)
Don't know about the ESRAM clock though, and whether that has to be tied to the GPU clock or if a 50 MHz downclock wouldn't matter.
 
No, it's clear enough. No one who should be in the know seems to know anything about a downclock.

An Xbox exec would play down spec talk, since they are behind on paper.

Clear would be a spec sheet pdf with TF and/or clock figures. Until then I believe it's in their best interests to keep speculation going on as long as possible. Just like how we'll be forced to speculate how 133gb/s average with alpha blending ops or 192gb/s peak can be achieved over the next few months. :)
 
I will ask Richard to clarify what he means by this article, if anyone (shifty, 3dilettante) wants to give me a set of technical questions I can get him to respond to that'd be good.

Is the base bandwidth still the same? The article seems contradictory by listing a lower number as the "peak" when the "real" value is higher.

If he can clarify what the article meant by "separate" reads and writes versus non-separate, it would help clear up the cases where higher bandwidth is possible.

Any details about how the higher number can be derived, or details like how many reads and writes can be sent at a time would bring clarity. I'm not certain that amount of detail would leak out.
 
Status
Not open for further replies.
Back
Top