Xbox One (Durango) Technical hardware investigation

Rangers · Feb 25, 2014

multi-core scaling and SIMD design (up to 70% of the instructions executed on a frame are SIMD). It is really fast - it can play 4K video frames (3840x2160) in 4 ms PCs and 11 ms PS4/Xbox One using the CPU only (or 1.4 ms PC and 2.3 ms PS4/Xbox using GPU acceleration)!

I dont even want to try to decipher this massive syntax debate that's been going on, but that appears to simply state it's 1.4ms on PC GPU and 2.3 ms on both Xone/PS4 GPU.

Which isn't in line with the big GAF brouhaha at all.

mosen · Feb 25, 2014

pjbliverpool said:
Okay, so where on the original site does it state 1.3ms for the CPU component of the PC CPU+GPU time?

And where does it state 1.6ms for the GPU component of the PS4 CPU+GPU time.

This is new information.

I've already read and fully understood the sections you quoted. You haven't read and/or fully understood my previous post so please go and take another look.

1.3 ms was a typo that is already corrected, which you saw that yesterday (see here).

There is another time of 1.3ms further down the page for the PC CPU+GPU but I assume that's just a typo.

But that 1.6 ms number is a typo from gamingbolt alongside other WRONG stuff they wrote on their Update (Consoles/PC CPU times are WRONG, too). Actually after second update (today) on Bink 2 official page there is no more reason for us to speak about gamingbolt's Original Story/Update.

If you still have any doubt about what I said it's better for you to ask Bink 2 developers directly.

pjbliverpool · Feb 25, 2014

mosen said:
1.3 ms was a typo that is already corrected, which you saw that yesterday (see here).

They have indeed corrected it since yesterday, maybe they're reading this thread

Nevertheless, you are assuming Gamingbolts 1.3ms CPU timing is coming from that typo and I see no reason to assume that. Even before the correction the correct PC CPU+GPU time of 1.4ms was stated further up the page and that was reflected in Gamingbolts broken down figure of 1.4ms for CPU and 1.3ms for GPU - i.e. 1.4ms total.

But more importantly, Gamingbolt specifically say they have been in touch with the developer who has provided them with the additional information.

But that 1.6 ms number is a typo from gamingbolt alongside other WRONG stuff they wrote on their Update (Consoles/PC CPU times are WRONG, too). Actually after second update (today) on Bink 2 official page there is no more reason for us to speak about gamingbolt's Original Story/Update.

If you still have any doubt about what I said it's better for you to ask Bink 2 developers directly.

I've no idea why you would assume this. 1.6ms isn't mentioned anywhere on the original site. So your assumption is that Gamingbolt have provided 2 new numbers but that both of them are typo's despite the fact that they specifically say those numbers came from the developers and explain what they mean. It's not impossible but it seems highly unlikely and far less plausible than the more obvious and much simpler explanation.

Let's examine the evidence.

1. Gamingbolt make the following statement with regards to the new numbers: "The developer also told us that Bink on GPU uses both the CPU and the GPU. The developer shared the following numbers with us"
2. They also tell us this, again in relation to the new numbers: "time is still limited by the CPU and GPU on both the Xbox One and PS4, resulting into an effective speed of 2.3 ms on both consoles"
3. So they have confirmed the 2.3ms total CPU+GPU time of both consoles just like the original site says, but they also confirm that that number is broken up into two components, a - CPU time and b - GPU time, with the longest of these equating to the total CPU+GPU time.
4. They then proceed to quite clearly break down the CPU+GPU times for each system providing us with information that was never supplied on the original site which only told us the total CPU+GPU time, not the separate component times.
5. The GPU time difference between the PS4 and XB1 that Gamingbolt provides, as provided to them by the developers is very close to the overall shader throughput difference between the two consoles - adding further validity to the accuracy of those numbers rather than them being, as you say, coincidental typo's.

What more do you need?

LightHeaven · Feb 25, 2014

DrJay24 said:
I guess you don't recall the 14+4 debates? To some more CUs does not mean much more performance, even MS implied as much. But yes, more CUs means a faster GPU, I'm glad everyone can finally agree on something so basic.

But since they are both limited to the same performance (2.3ms) for whatever reason, doesn't this cement MS' implication that more CUs != more framerate? In this case all the extra CUs on Ps4 gpu didn't offset the other bottleneck and performance is still the same...

3dilettante · Feb 25, 2014

LightHeaven said:
But since they are both limited to the same performance (2.3ms) for whatever reason, doesn't this cement MS' implication that more CUs != more framerate?

For Bink, at any rate.
Not every workload is going to have a convenient timing floor from the CPU that just happens to meet the exact time of the GPU.
The analysis is more complicated and workload-specific, and while it is true more parallel units runs into Amdahl's Law, the counterpoint is that the workloads the CUs target tend to be the ones that suffer less from it.

JPT · Feb 25, 2014

LightHeaven said:
But since they are both limited to the same performance (2.3ms) for whatever reason, doesn't this cement MS' implication that more CUs != more framerate? In this case all the extra CUs on Ps4 gpu didn't offset the other bottleneck and performance is still the same...

Or that the algorithm does not scale with available resources?

LightHeaven · Feb 25, 2014

3dilettante said:
For Bink, at any rate.
Not every workload is going to have a convenient timing floor from the CPU that just happens to meet the exact time of the GPU.
The analysis is more complicated and workload-specific, and while it is true more parallel units runs into Amdahl's Law, the counterpoint is that the workloads the CUs target tend to be the ones that suffer less from it.

Of course. But that is exactly what Ms claimed it did. That for their tests the extra CUs wouldn't yield the same performance increase as if they spent those transistors elsewhere...

Obviously, any one who looks at the current state of the game right now would concur to them being wrong, but that's besides my point, I only meant that imo, this results corroborate more Ms claims than that, all else being equal, performance would scale with CUs.

JPT said:
Or that the algorithm does not scale with available resources?

Which again, was exactly their point. They said not having the same amount of CUs wouldn't mean a big gap in performance because not all software would scale well to them, and you have to balance the system so one part isn't much ahead of the others.

taisui · Feb 25, 2014

LightHeaven said:
Of course. But that is exactly what Ms claimed it did. That for their tests the extra CUs wouldn't yield the same performance increase as if they spent those transistors elsewhere...

Obviously, any one who looks at the current state of the game right now would concur to them being wrong, but that's besides my point, I only meant that imo, this results corroborate more Ms claims than that, all else being equal, performance would scale with CUs.

Which again, was exactly their point. They said not having the same amount of CUs wouldn't mean a big gap in performance because not all software would scale well to them, and you have to balance the system so one part isn't much ahead of the others.

Except that's not what the architects had said, they specifically refers to the CUs in the context of graphics. Decoding is gpgpu ops which scale with the CUs, just because the numbers look aligned doesn't imply that they are related.

the bottleneck of the system is always the slowest part, hence the name, "bottleneck," in this particular case, the Juguar CPU is the weakest link, so it doesn't matter if you have more GPU power, the whole system can only compute as fast (or slow) as the CPU can.

3dilettante · Feb 25, 2014

LightHeaven said:
Of course. But that is exactly what Ms claimed it did. That for their tests the extra CUs wouldn't yield the same performance increase as if they spent those transistors elsewhere...

I don't think anyone seriously made the argument that adding a few more CUs would be better than anything else for every workload, and in this case, Microsoft used hundreds of millions of transistors elsewhere and got the exact same performance as if it hadn't.
This puts Bink as a counterpoint to a swath of games.
A more appropriate argument is to determine what's the best design decision that gives the most benefit to as much as possible.
A pre-rendered video has certain limits to how much gain it can get or will ever need.

Which again, was exactly their point. They said not having the same amount of CUs wouldn't mean a big gap in performance because not all software would scale well to them, and you have to balance the system so one part isn't much ahead of the others.

This has been shown to be the case in Bink.
Exactly how fast do you want your somewhat blocky cut scenes to get?

DrJay24 · Feb 25, 2014

LightHeaven said:
But since they are both limited to the same performance (2.3ms) for whatever reason, doesn't this cement MS' implication that more CUs != more framerate? In this case all the extra CUs on Ps4 gpu didn't offset the other bottleneck and performance is still the same...

Must be why AMD makes 30CU cards. Of course it scales, just like CUDA cores scale. How they scale is the only real question. 12 in no magic number, unless you consider die space, heat and cost.

Rangers · Feb 25, 2014

That for their tests the extra CUs wouldn't yield the same performance increase as if they spent those transistors elsewhere...

They didn't say anything like that. What they did was build a console that was meant to be cheaper and less powerful.

They said they tested the benefit of an upclock vs enabling two redundant CU's and found in favor of the upclock. Different issue. They still would have been fundamentally limited to 16 ROPs etc.

and in this case, Microsoft used hundreds of millions of transistors elsewhere and got the exact same performance as if it hadn't.

Right. Those extra transistors mainly enabled the use of much cheaper DDR3. A cost issue, not performance. They then put the extra money and then some in Kinect (an expensive piece of kit)

(((interference))) · Feb 26, 2014

So the Bink benchmark has both CPUs set the same time, despite the XB1's being clocked 9.3% higher.

It would seem then, that there are virtualisation overheads eating into the higher clock advantage.
I don't think the PS4 is clocked higher as we haven't heard anything like that and Rangers tells me was some presentation from November last year with it pegged at 1.6 Ghz.

-tkf- · Feb 26, 2014

LightHeaven said:
But since they are both limited to the same performance (2.3ms) for whatever reason, doesn't this cement MS' implication that more CUs != more framerate? In this case all the extra CUs on Ps4 gpu didn't offset the other bottleneck and performance is still the same...

The only thing it cements is that in one specific implementation of a video codec / player that most likely isn't even taking advantage of the extra cu it doesn't matter.

It couldn't be farther from cementing anything.

Starx · Feb 26, 2014

There were definitely some things [Microsoft] said that I'm not allowed to talk about that I was like, 'Wow! That's interesting. Why don't you tell people that?
But I guess they want to make sure everything all goes together and that

http://www.eurogamer.net/articles/2...l-release-console-discussions-horses-and-rust

liolio · Feb 26, 2014

Starx said:
http://www.eurogamer.net/articles/2...l-release-console-discussions-horses-and-rust

I noticed that a couple of days ago, I've my idea about what it could be and the reasons why MSFT doesn't speak about it.

MrFox · Feb 26, 2014

What's the reasoning for using a software based codec like Bink, considering both consoles have a hardware decoder available? The hardware is not good enough or flexible enough? Cross-platform consistency?

taisui · Feb 26, 2014

iirc Bink's codec is pretty good (bit rate, memory foot print, decoding), but i suppose the major advantage is that it targets all platforms and if you are shipping multiplat it makes the asset management easier.

Lalaland · Feb 26, 2014

taisui said:
iirc Bink's codec is pretty good (bit rate, memory foot print, decoding), but i suppose the major advantage is that it targets all platforms and if you are shipping multiplat it makes the asset management easier.

I would beg to differ, Bink is good solely on the basis of compressed size and even at that I thought I read that h264 beats it. THe IQ off bink is awful with macro blocking a go go on anything daring to move at more than a snails pace across the screen.

Starx · Feb 26, 2014

liolio said:
I noticed that a couple of days ago, I've my idea about what it could be and the reasons why MSFT doesn't speak about it.

Whats that idea?

Lalaland · Feb 26, 2014

Starx said:
Whats that idea?

I'll bet $10 it's not a dGPU or stacked SoC.

Xbox One (Durango) Technical hardware investigation

Rangers

mosen

pjbliverpool

B3D Scallywag

LightHeaven

3dilettante

JPT

LightHeaven

taisui

3dilettante

DrJay24

Rangers

(((interference)))

-tkf-

Starx

liolio

Aquoiboniste

MrFox

Deludedly Fantastic

taisui

Lalaland

Starx

Lalaland

Similar threads