Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
No, I would explain it to the fullest extent possible if I felt people were unfairly underestimating the performance of my product. But because they aren't and are also releasing PR to downplay specs and talk up cloud I will naturally think the opposite like any technical minded person would! Especially when they've haven't been shy to talk about and compare specs in the past, as they did with 360.

I don't give the benefit of the doubt to any marketing or PR speak. Concrete numbers will do, thanks. :)


In terms of what this all means with regards multi-platform titles launching on both next-gen consoles,
our information suggests that developers may be playing things rather conservatively for launch titles while dev tools are still being worked on.
This is apparently more of an issue with Xbox One, where Microsoft developers are still in the process of bringing home very significant increases in performance from one release of the XDK development environment to the next.
Our principal source suggests that performance targets are being set by game-makers and that the drivers should catch up with those targets sooner rather than later.
Bearing in mind the stuttering performance we saw from some Xbox One titles at E3 such as Crytek's Ryse (amongst others), this is clearly good news.

As the performance levels of both next-gen consoles are something of a moving target at the moment, differences in multi-platform games may not become evident until developers are working with more mature tools and libraries.
At that point it's possible that we may see ambitious titles operating at a lower resolution on Xbox One compared to the PlayStation 4.


seeing as it is still Premature to extrapolate true performance, pure numbers will not do for a fair real world comparison based on what they are learning about their unique design. it will take time to see the results similarly to how the 360 was said to be Xbox 1.5 until 2 years in and by comparison to Ps3 titles and the numbers all went out the window
 
Is the base bandwidth still the same? The article seems contradictory by listing a lower number as the "peak" when the "real" value is higher.

If he can clarify what the article meant by "separate" reads and writes versus non-separate, it would help clear up the cases where higher bandwidth is possible.

Any details about how the higher number can be derived, or details like how many reads and writes can be sent at a time would bring clarity. I'm not certain that amount of detail would leak out.

At a high level, I think we only need to know 2 things.

1. Any spec change ? What is it now ?

2. Under what condition can a dev exceed the paper bandwidth (as in doing read and write in parallel or overlapped) ? How common/general are these conditions ?

EDIT:
Look at their cloud stuff. They have gone into some amount of detail about how it works, what it actually means, how it differs from the colloquial terminology of 'cloud gaming', and offered scenarios to look forward to. Yet the overwhelming majority on forums assume literally everything they say on that topic (due to MS struggling to convey nuance) is a lie or PR fluff and whatnot. Panello's comments were pre-E3 when X1 as a platform was still mired in confusion. Based on the reaction to their cloud ambitions I don't blame them for wanting ppl to settle down a bit first.

Do you have a link to their best explanation ?
 
If you had a particular vision for designing a highly efficient graphics pipeline for a new console that had its charm expressed only in nuance you wouldn't want to engage in a complicated specs/architecture debate either. Not because your machine would be weaker, but because you'd then be forced to explain nuance to an internet culture that recoils at the thought of complexity. This is no different than modern politics. There's a reason bumper stickers exist. ;)

Post E3 and post BUILD it seems MS's design is much more nuanced and thoughtful than the internet wants to give them credit for.

The only crowd worth talking about the nuance of either system is at GDC and they are all under NDA. The mindless speculation is pointless when the ONLY thing that matters is the end result, and we are 5 months away from that.

If anything, the only thing people should care about is a DF face-off. Everything else is moot.
 
I am afraid end result may not be the only thing that matters. As Cerny put it, the gamers have extended their gaming experiences beyond the game box. They want to talk and share stuff about games. It's been happening for some time now because of the Internet.

Even for PS3 Cell, people want to know how things are done, whether there'd be room for improvement, why does this game look so bad, etc.
 
I don't suppose it would be possible to clarify how the bandwidth is being measured?
Were any corrections made for any cache effects in the outlined scenario?
 
why suddenly something like that is said and then comes random maths by random forum users subjecting downclock's when the article doesn't say anything about that?
 
This is the technical thread. Leave the non-technical and feel comments out of it.
 
It seem we are trying to make the numbers work. That is the simplest solution. The article Itself is not very clear.


Occam razor....

Primitive arithmetic that indulges in getting the numbers to a certain kind of "right" is definitely not occam's razor FYI.
 
Primitive arithmetic that indulges in getting the numbers to a certain kind of "right" is definitely not occam's razor FYI.

Bingo. It's obnoxious seeing ppl routinely misrepresent Occam's Razor too. It does NOT suggest that 'the simplest solution is the most likely solution'...it asserts that 'the solution with the fewest assumptions is the most likely solution'. When your 'solution' to explaining the figures results in assuming that DF's reporting is wrong, that their dev sources are wrong, that MS's engineers are wrong, and/or assuming that a clock of 750MHz is suddenly a magical number worth paying attention to...well, those assumptions start piling up.

The attempts to 'make the math fit' are leaning heavily on a factor of 2 that comes from a misunderstanding of what the article is describing. I also find it strange that just because that fuzzy math nets you a nice round number like 750MHz therefore it is supposed to be something meaningful all the sudden.

If anything it seems to me to make more sense for the math to work out at 800MHz seeing as 7/8 of the cycles see that read + write bandwidth coming through simultaneously, based on the article. 7/8 seems like it could make sense for an 800MHz setup in the context of what the article is describing. I wonder what role the eSRAM's low latency plays in this (if any).

The article's claims of what devs were told makes sense. Final hardware is being optimized thoroughly before final dev kits ship this summer, in the process MS discovers there is enough room left over in these holes for useful operations to fit possibly due to the eSRAM's low latency, you can pack those spare cycles with ops to perform 7 cycles out of 8. Result, you net yourself 88% more cycles to work with, but they aren't consistently packed together temporally enough to really be useful for modern ops beyond ~30% of the time, hence 133GB/s doing standard ops like alpha blending. No additional assumptions necessary.



FWIW, I was told by a person I trust just after the reveal in May that the eSRAM bandwidth was much, much higher than the leaked 102.4GB/s figure.
 
Last edited by a moderator:
Seem you have emotional investment in this console. Do you work on the project/games?

I agree.

I hope we can get past this and find out what exactly in the system. Thank goodness for the leaks or we wouldnt have a clue.
Anyway, if you don't like what I said pour some sugar into it. Come on..., isn't that term a bit odd for me?

I was momentarily very outspoken about what I had seen, that's basically it. Twas' momentary and it faded away.

On a different note, here is the full video featuring the talk about Tiled Resources, there is some awesome, very interesting stuff in it.

 
Bingo. It's obnoxious seeing ppl routinely misrepresent Occam's Razor too. It does NOT suggest that 'the simplest solution is the most likely solution'...it asserts that 'the solution with the fewest assumptions is the most likely solution'. When your 'solution' to explaining the figures results in assuming that DF's reporting is wrong, that their dev sources are wrong, that MS's engineers are wrong, and/or assuming that a clock of 750MHz is suddenly a magical number worth paying attention to...well, those assumptions start piling up.

The attempts to 'make the math fit' are leaning heavily on a factor of 2 that comes from a misunderstanding of what the article is describing. I also find it strange that just because that fuzzy math nets you a nice round number like 750MHz therefore it is supposed to be something meaningful all the sudden.

If anything it seems to me to make more sense for the math to work out at 800MHz seeing as 7/8 of the cycles see that read + write bandwidth coming through simultaneously, based on the article. 7/8 seems like it could make sense for an 800MHz setup in the context of what the article is describing. I wonder what role the eSRAM's low latency plays in this (if any).

The article's claims of what devs were told makes sense. Final hardware is being optimized thoroughly before final dev kits ship this summer, in the process MS discovers there is enough room left over in these holes for useful operations to fit possibly due to the eSRAM's low latency, you can pack those spare cycles with ops to perform 7 cycles out of 8. Result, you net yourself 88% more cycles to work with, but they aren't consistently packed together temporally enough to really be useful for modern ops beyond ~30% of the time, hence 133GB/s doing standard ops like alpha blending. No additional assumptions necessary.



FWIW, I was told by a person I trust just after the reveal in May that the eSRAM bandwidth was much, much higher than the leaked 102.4GB/s figure.

Is what you are describing a common way for dual rate I/O ESRAM to function (I.e., bidirectional 7 cycles out of 8 and unidirectional the other cycle)?
 
Is what you are describing a common way for dual rate I/O ESRAM to function (I.e., bidirectional 7 cycles out of 8 and unidirectional the other cycle)?

I've no idea. I'm only reiterating what my interpretation of the article is. I can tell you that I feel my interpretation makes sense to me and that the math some are using around the net to assert downclocks is totally misguided and contradicted by the same sources that math is deriving its (incorrect) assumptions from in the first place.

I'd recommend 3dillitante's posts on the subject though. He seems to know the most of anyone who has posted on the topic as of yet imho. I just wanted to note that his fraction of 7/8 not only is correct in terms of the real math, but it seems like it could possibly reflect a clockspeed of 800MHz itself. Just my interpretations and their accompanying speculation though. :smile:
 
I wonder what role the eSRAM's low latency plays in this (if any).
The article is covering a primarily throughput-related issue, so if a particular sequence of operations yields a desired result, you can for the sake of theory pad out the sequence so that it compensates for latency. If you need three loads and a write in every 101 through 103rd cycle to get what you want, the numbers will average out over billions of cycles.

The article's claims of what devs were told makes sense. Final hardware is being optimized thoroughly before final dev kits ship this summer, in the process MS discovers there is enough room left over in these holes for useful operations to fit possibly due to the eSRAM's low latency, you can pack those spare cycles with ops to perform 7 cycles out of 8. Result, you net yourself 88% more cycles to work with, but they aren't consistently packed together temporally enough to really be useful for modern ops beyond ~30% of the time, hence 133GB/s doing standard ops like alpha blending. No additional assumptions necessary.
A memory pipeline isn't that loosely thrown together. The 102.4 GB/s was described as the being the width of the eSRAM's interface times its clock speed. There aren't spare cycles unless it has more than 800 million cycles to play with, and the article doesn't explain how it can state in one sentence that the clock and width are correct while at the same time getting read and write traffic in excess of what the eSRAM's ports can physically deliver.

If the interface is wider than described, or faster than described, it should have been noticed. The eSRAM has to serve multiple clients without constant hand-holding, so it's not like software can avoid utilizing a doubly fast interface without the controller scheduling things so that bandwidth exceeds 102.4 GB/s.

There's either a very misleading description of the kinds of transactions the eSRAM supports and what its interface is, or there is something besides the eSRAM's own ports that can provide data, such as cache effects or bypassing. Those could be things that are either not being controlled for in testing, or they can be turned on and off.
 
The article is covering a primarily throughput-related issue, so if a particular sequence of operations yields a desired result, you can for the sake of theory pad out the sequence so that it compensates for latency. If you need three loads and a write in every 101 through 103rd cycle to get what you want, the numbers will average out over billions of cycles.


A memory pipeline isn't that loosely thrown together. The 102.4 GB/s was described as the being the width of the eSRAM's interface times its clock speed. There aren't spare cycles unless it has more than 800 million cycles to play with, and the article doesn't explain how it can state in one sentence that the clock and width are correct while at the same time getting read and write traffic in excess of what the eSRAM's ports can physically deliver.

If the interface is wider than described, or faster than described, it should have been noticed. The eSRAM has to serve multiple clients without constant hand-holding, so it's not like software can avoid utilizing a doubly fast interface without the controller scheduling things so that bandwidth exceeds 102.4 GB/s.

There's either a very misleading description of the kinds of transactions the eSRAM supports and what its interface is, or there is something besides the eSRAM's own ports that can provide data, such as cache effects or bypassing. Those could be things that are either not being controlled for in testing, or they can be turned on and off.

But how in the world would MS's hardware engineers be unaware of these attributes this late in the game? This seems highly unlikely. I'm wondering if the journalist here is misinterpreting a technical explanation he heard through the grapevine. Seems more plausible to me.
 
The article is covering a primarily throughput-related issue, so if a particular sequence of operations yields a desired result, you can for the sake of theory pad out the sequence so that it compensates for latency. If you need three loads and a write in every 101 through 103rd cycle to get what you want, the numbers will average out over billions of cycles.


A memory pipeline isn't that loosely thrown together. The 102.4 GB/s was described as the being the width of the eSRAM's interface times its clock speed. There aren't spare cycles unless it has more than 800 million cycles to play with, and the article doesn't explain how it can state in one sentence that the clock and width are correct while at the same time getting read and write traffic in excess of what the eSRAM's ports can physically deliver.

If the interface is wider than described, or faster than described, it should have been noticed. The eSRAM has to serve multiple clients without constant hand-holding, so it's not like software can avoid utilizing a doubly fast interface without the controller scheduling things so that bandwidth exceeds 102.4 GB/s.

There's either a very misleading description of the kinds of transactions the eSRAM supports and what its interface is, or there is something besides the eSRAM's own ports that can provide data, such as cache effects or bypassing. Those could be things that are either not being controlled for in testing, or they can be turned on and off.

I think part of the problem is we have multiple sources whose information could in fact be in conflict, but Leadbetter is presenting a theory that attempts to reconcile them. A source inside MS could be leaking that the ESRAM interface can actually read and write simultaneously presenting a new peak figure (alongside a real-world figure) while omitting (or purposefully obfuscating) the detail of how those figures were calculated. Meanwhile external devs could be telling Richard they haven't heard anything about a downclock. Both could be telling the truth, but that doesn't mean the downclock is real. For one so minor (only 50 Mhz to reconcile the math) MS may just not have disseminated the change yet to third parties. Which is to say we don't have to assume everyone is wrong or lying to arrive at the conclusion that 750 Mhz is indeed the new clock and the ESRAM has simultaneous read/write capabilities previously not well publicized.

EDIT: I'd also guess that the problem with actually reaching the peak read/write combined bandwidth for the ESRAM is the sources for data just aren't fast enough. ROPs can only write so fast and data can be copied in or written out from other memory busses only so fast. You probably just run out of things to write to or read from before you hit the theoretical limits.
 
But how in the world would MS's hardware engineers be unaware of these attributes this late in the game? This seems highly unlikely. I'm wondering if the journalist here is misinterpreting a technical explanation he heard through the grapevine. Seems more plausible to me.

Some possibilities that came to mind was that there are secondary functions to the memory controllers that weren't working or weren't validated yet until a later stepping.

Turning it on later would have lead to improvements earlier test silicon wouldn't have shown. Memory and uncore performance is one the things we can see marked improvement with between engineering samples and released products, because a lot of tweaking and validation is done until late in the process. The hardware has a lot of internal control registers and firmware settings that can be toggled.


Another is that there's a gap in what was described as the eSRAM's bandwidth earlier, and what is being measured today. It could be that the theoretical peak and the new test numbers are not measuring the same thing, but are being treated as if they are.

If they are measuring bandwidth that is augmented by secondary bandwidth sources, those may have been left out of the original description for reasons like simplicity or because they are considered routine features that provide an occasional bonus without radically changing the picture.
 
I think part of the problem is we have multiple sources whose information could in fact be in conflict, but Leadbetter is presenting a theory that attempts to reconcile them. A source inside MS could be leaking that the ESRAM interface can actually read and write simultaneously presenting a new peak figure (alongside a real-world figure) while omitting (or purposefully obfuscating) the detail of how those figures were calculated. Meanwhile external devs could be telling Richard they haven't heard anything about a downclock. Both could be telling the truth, but that doesn't mean the downclock is real. For one so minor (only 50 Mhz to reconcile the math) MS may just not have disseminated the change yet to third parties. Which is to say we don't have to assume everyone is wrong or lying to arrive at the conclusion that 750 Mhz is indeed the new clock and the ESRAM has simultaneous read/write capabilities previously not well publicized.

EDIT: I'd also guess that the problem with actually reaching the peak read/write combined bandwidth for the ESRAM is the sources for data just aren't fast enough. ROPs can only write so fast and data can be copied in or written out from other memory busses only so fast. You probably just run out of things to write to or read from before you hit the theoretical limits.


102.4 GB/s is specified as the baseline ESRAM BW, so 800 mhz it is. 192 GB/s is a much more shadowy figure of unclear derivation. In addition, Richard clarifies in the article that sources say there's been no downclock from 800 mhz, so it cant get much clearer there.

You're taking one shadowy number and taking it as truth against multiple stronger evidences in the same article of 800 mhz, as well as other sources like interference and Senjetsu. So far we have zero credible sources citing a downclock. Even CBOAT (who is borderline as a "credible source" realistically, but at least outranks "thuway" and "Proelite") refuses to state there is a downclock no matter how many times he was point blank asked in the past to confirm one.

The whole DF article is weird for the reasons 3D is pointing out though. Someone should ask him to clarify about the 192 GB/s thing people are hoping means a downclock (although, in the end, all he can do is ask his sources again, and presumably they'll say no downclock again, as they already did in the article).

Lets not forget the rumors of 800 gflops downclock were already terribly wrong, so we should be cynical here.

Finally, I'm not sure if this new "discovered" ESRAM BW wouldn't outweigh a 50 mhz GPU core downclock anyway and lead to a more powerful system all things considered. (750 mhz+133 GB/s>800 mz+102 GB/s?) But the whole thing smells funny, as asking us to believe there was "undiscovered" BW is just...weird.

Anyways it's near July so the clock specs should have to be locked soon I assume if not already past, Digitimes aready said PS4 components were shipping weeks ago, so call this "downclock rumor's last gasp". We really only have July, Aug, Sep, Oct left to manufacture a bunch of these things, get them to warehouses in the USA, etc etc.
 
102.4 GB/s is specified as the baseline ESRAM BW, so 800 mhz it is. 192 GB/s is a much more shadowy figure of unclear derivation. In addition, Richard clarifies in the article that sources say there's been no downclock from 800 mhz, so it cant get much clearer there.

No. That's just an assumption he makes. He doesn't know first hand. Maybe most of his sources don't know the final clock yet either. You can't keep parroting his assumption that 800mhz remains the target as fact. The story he's breaking pretty specifically throws that figure into doubt, whether or not he recognized the incongruity.
 
Status
Not open for further replies.
Back
Top