Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
At least some of that answer is dependent on exactly how the game environment is virtualized.
Functions that need permissions reserved for the hypervisor or host OS are going to take longer to execute. There are ways to grant some awareness of the virtualized system, or having the main OS present a virtualized environment to the game processes.

It's also a question of the hardware enhancements for supporting fast transitions from virtual to host address spaces. Jaguar's improvements here haven't been benchmarked.
 
At least some of that answer is dependent on exactly how the game environment is virtualized.
Functions that need permissions reserved for the hypervisor or host OS are going to take longer to execute. There are ways to grant some awareness of the virtualized system, or having the main OS present a virtualized environment to the game processes.

It's also a question of the hardware enhancements for supporting fast transitions from virtual to host address spaces. Jaguar's improvements here haven't been benchmarked.

Worst and best scenario?
 
I don't have enough familiarity with this sort of environment to say.
CPU benchmarks have put less efficient virtualization schemes without x86 virualization extensions with performance loss at 20-30%.
With hardware enhancements and various optimizations, it has been shown to be in the single digits.

I don't know what virtualizing the GPU would cost. It hasn't been fully shared until recently.
 
I don't have enough familiarity with this sort of environment to say.
CPU benchmarks have put less efficient virtualization schemes without x86 virualization extensions with performance loss at 20-30%.
With hardware enhancements and various optimizations, it has been shown to be in the single digits.

I don't know what virtualizing the GPU would cost. It hasn't been fully shared until recently.
I tried to research figures on the topic, it is pretty tough to find anything relevant.
Anandtech has a series of articles on the matter that back your claim.
I found an article that shows how AMD has improved performance of MemCache by 300% by using virtualization (pretty extreme case).
Then there is the whole matter of what MSFT presents to the devs, and if devs can play with the numbers of virtual cores, their characteristics, etc.

It seems that the scope of what you could try is pretty enormous though the question "with which success?" is a lot trickier.

If I get the various stuffs I read properly, MSFT may not have to presented the devs with 6 virtual cores that are "virtual" jaguar. They may present them more cores, or different type of cores with different characteristic than the underlying jaguar, I guess they even let the devs decide (though that sounds unlikely).

Out of my head (if I get what the tech is about), for example if there is a lot of task in games that have low IPC, that don't tap into the SIMD much, and that run often, you could create 4 virtual cores that you could map on a single core. If I get it you might lose perf on a per task basis but overall achieving a higher utilization of that core /throughput. You can play with the virtual cache size I guess depending of on your task use it. For example if there will be a lot of contentions, you may create cores with 8kb of I$ and D$ and possibly different L2 size, you can alter everything even the amount of RAM one core see. If there are less contention (/cache usage not that high), you could go 16KB for both, so pretty much mapping 16x2x4 KB of cache onto 64KB of physical resources.

If I take those 4 virtual CPU, for example and I have long running subsystem that are SIMD intensive, especially if they run lot of long running AVX instructions on 8 wide vector (double pumping the ALU), the scalar part of the cores won't do much.
I'm under the impression that you could sort emulate some Round Robin execution for example, again "with which success?" it is a mystery to me. It is likely that both tasks are going to run slower, though the overall throughput?? Mystery.

It seems to me that this tech is extremely convenient in server when you have subsystems that run pretty much all the time, which are not that computational intensive with lot of idle time (/improve hardware utilisation).
How that relevant to games? I would think not that much.

That could be why MSFT ultimately may just present 6 cores with the same characteristic as jaguar cores, to quote anandtech:
If your app spends a lot of time in the kernel and has high amounts of I/O going on, the performance hit may be high (15-30%). But that does mean your application will have to suffer this performance hit. If you spend more time on optimizing (database buffering, jumbo frames) and if you use paravirtualized drivers (VMXnet, PVSCSI) the performance will get a lot smaller (5-10%). In short, performance hit can be high if you just throw your native application in a VM, but modern hypervisors are able to keep the performance hit very small if you make the right choices and you take some time to tune the app and the VM.
If your application is not I/O intensive, but mostly CPU intensive, the performance hit can be unnoticeable (1-2%).
Though I wonder if MSFT could present a given number of cores as Virtual jaguar alter ego, and may be few tweaked cores on which would be pinned whatever tasks that are unlikely to keep a single CPU core busy.

Edit
/it gives me headache especially once you consider that the performance of a given subsystem is a different matter as the overall throughput of your multi-core CPU /system.
 
Last edited by a moderator:
First the question:
We know that Jaguar cores have no FMA units, though Jaguar support somehow AVX instructions.
So the question is "does Jaguar core support the FMA instruction"?
No, it does not support FMA.

I'm not sure that a VM is the best way to do hardware abstraction, if that is the only end goal. There has to be a lot more of a reason to do it than just that. I can understand if there were a multitude of other reasons.

Look, I don't understand entirely why they're using VMs and what all the specifics are.

Besides the resource sharing and security benefits (which are the reasons i've been given) it could also facilitate easy BC for the next Xbox (if there is one).
Or allow you to play Durango games on your PC or Surface etc
Or allow a transition to a cheaper to manufacture hardware design later on, as liolio was speculating.

How would this work exactly? Can VMs talk to each other? Wouldn't the game VM require some light-weight OS to handle the DirectX API calls and system services? In which VM do services like voice chat, parties, matchmaking reside? If my game is in one VM and my services for chat and parties are in another, how does the game know which chat channel to use (party, in game) etc? Does each VM run a version of the same service? How are they synchronized?
The VMs can talk to each other through the hypervisor (Host OS) which handles inter-OS communication and all hardware access, and hosts the other two OS instances (Title and System) in VMs.

The Host and Title OS (which the game runs in) are both stripped down (similar to the 360 kernel) and as mentioned before, they have Dave Cutler working on low overhead virtual drivers and stuff to optimise VM performance for games.

The other System OS runs the Win8 kernel modified in a similar way to Windows Phone 8, allowing only RT stuff, with no desktop capabilities.

So i'm guessing parties, voice chat etc reside in the System OS with data passed to/from the title OS as required.
 
Last edited by a moderator:
Thanks Interference I won't ask how you got those information but it is welcome. Putting away the "always on" drama or compared merits of competing hardwares, it seems that MSFT came with a marvel of software engineering ( bis :LOL: ).

EDIT
Assuming your source is correct obviously, though that makes sense to me.
 
Last edited by a moderator:
I was only mentioning the near 100% negativity that's reported from articles, speculations and shared rumors. Every bit of information from hardware specs to the slightest nuance is spun with doom and gloom.

It's all relative, see the Durango news/specs would be received largely positively if it was the only next gen console we had rumours/leaks on and we were comparing it to say, the Wii U.

However we also have a lot of info on PS4 and so everyone automatically compares what it can and can't do to that - which is where all the negativity comes from.
 
New Rumor

Ok, moving on. Have you read the VGLeaks article about the Durango specs? Yes? Good because everything you read in that article was 100% correct. Except, for one tiny little detail that MS kept guarded from most devs until very recently. That detail being that every Durango ships with a Xbox 360 SOC.

There was a reason why MS hired so many former IBM and AMD employees. I'll admit I'm not an electrical engineer (I'm in software) so I won't pretend to know the ins and outs of how the 360 SOC integrates into the Durango motherboard. All I know, and all I need to know about this new change is that I (or a game dev) can use the 360 SOC in parallel with the original Durango hardware.

What does this mean in basic terms? Well, apart from Durango having 100% BC with the 360, it also increases Durango's processing power a fair amount.

http://www.neogaf.com/forum/showthread.php?t=541176

sorry I,m not a tech guy I come here in hope of trying to understand things but what that guy said comes close to an old rumor. Now what I am wondering on the slim chance that the new rumor is true and I know it is slim what would be the benefit if it worked like how it is said to work in this rumor ( http://www.vg247.com/2012/04/02/xbox-720-detailed-blu-ray-inside-always-on-netcon-required/ )if the just switch out one of the gpu to mean the 360 soc...
 
It's all relative, see the Durango news/specs would be received largely positively if it was the only next gen console we had rumours/leaks on and we were comparing it to say, the Wii U.

However we also have a lot of info on PS4 and so everyone automatically compares what it can and can't do to that - which is where all the negativity comes from.


what?

please, the Xbox should compare very favorably to PS4 indeed.

To suggest that the negativity is warranted due to what we know is ridiculous... we all know where it is coming from... people fueling the fire based on their own biases
 
what?

please, the Xbox should compare very favorably to PS4 indeed.

To suggest that the negativity is warranted due to what we know is ridiculous... we all know where it is coming from... people fueling the fire based on their own biases

I'm not saying the negativity is warranted but that the negativity stems from comparisons to the PS4 rather than evaluating the system on its own merits
 
So the question is "does Jaguar core support the FMA instruction"?
The pipeline diagrams published by AMD show that Jaguar has a two vector pipes, one is capable of floating point add and one is capable of floating point mul (http://www.3dcenter.org/dateien/abbildungen/AMD-Jaguar-Presentation-Slide09.jpg). There is no FMA pipe (Haswell and BD/PD have FMA pipelines). You cannot easily do a proper FMA with separate mul and add pipes, because the intermediate result must be of "infinite" precision (rounding happens after both mul and add have been calculated). Sandy/Ivy bridge do not support FMA instruction set either (no native single cycle support, or microcode "emulated" support with add + mul + magic rounding trickery).

For current PC software (incl games), FMA is not a big deal, as currently Intel doesn't have a single chip in the market with FMA support. It will still take several years until FMA optimized software and games are commonplace. Even without FMA Jaguar matches BD/PD vector flops per cycle / per core (and that assumes BD/PD is running a code that contains 100% FMA instructions). Not bad for a small power optimized core.
 
No, it does not support FMA.



Look, I don't understand entirely why they're using VMs and what all the specifics are.

Besides the resource sharing and security benefits (which are the reasons i've been given) it could also facilitate easy BC for the next Xbox (if there is one).
Or allow you to play Durango games on your PC or Surface etc
Or allow a transition to a cheaper to manufacture hardware design later on, as liolio was speculating.


The VMs can talk to each other through the hypervisor (Host OS) which handles inter-OS communication and all hardware access, and hosts the other two OS instances (Title and System) in VMs.

The Host and Title OS (which the game runs in) are both stripped down (similar to the 360 kernel) and as mentioned before, they have Dave Cutler working on low overhead virtual drivers and stuff to optimise VM performance for games.

The other System OS runs the Win8 kernel modified in a similar way to Windows Phone 8, allowing only RT stuff, with no desktop capabilities.

So i'm guessing parties, voice chat etc reside in the System OS with data passed to/from the title OS as required.

Interesting. If this is true, they could, in theory, update the system OS or the hardware and the game VMs would still run as long as the same API functionality was exposed. They could upgrade the System VM to Windows 9 with massive changes, and it would not affect the games at all, if functionality was designed correctly. Or they could release a new console 2 years from now (please don't) and the games, in theory, could still work.

This model gets a little confusing to me when you start to talk about the DirectX runtime and drivers, and where they would reside. VMs run with "virtual" drivers that talk to the Host OS where the real drivers reside, correct?

Edit: So if this is modeled like Microsoft's Hyper-V, the title VM will have a light-weight OS with a synthetic drivers that talks the real device drivers through the hypervisor in the Host OS (root partition). The System VM would have pretty much the full blown Windows 8 OS with synthetic drivers as well.

If this VM stuff is true, it really looks like a fast upgrade cycle with backwards compatibility would be a big reason to do this.
 
The pipeline diagrams published by AMD show that Jaguar has a two vector pipes, one is capable of floating point add and one is capable of floating point mul (http://www.3dcenter.org/dateien/abbildungen/AMD-Jaguar-Presentation-Slide09.jpg). There is no FMA pipe (Haswell and BD/PD have FMA pipelines). You cannot easily do a proper FMA with separate mul and add pipes, because the intermediate result must be of "infinite" precision (rounding happens after both mul and add have been calculated). Sandy/Ivy bridge do not support FMA instruction set either (no native single cycle support, or microcode "emulated" support with add + mul + magic rounding trickery).

For current PC software (incl games), FMA is not a big deal, as currently Intel doesn't have a single chip in the market with FMA support. It will still take several years until FMA optimized software and games are commonplace. Even without FMA Jaguar matches BD/PD vector flops per cycle / per core (and that assumes BD/PD is running a code that contains 100% FMA instructions). Not bad for a small power optimized core.
Thanks for those extra details it is always welcome.
I though that FMA was part of the AVX instruction, was wrong.
Jaguar looks indeed really good both on paper and from the few early benchmarks that floats across the web. It seems both SOny and MSFT made the right decision on that one.
 
Or possibly licensing the xbox platform to 3rd party hardware manufacturers??

Well, the upgrade cycle is pure speculation on my part, based on a rumour that might not even be true, so what I think about 3rd parties is really another level of speculation that is also just as likely to be meaningless.

I do wonder about those Move Engines, and how transparent they are to the developers. If they're accessed explicitly, I'm not sure how you release an upgrade a couple years later unless it follows the same model.
 
Well, the upgrade cycle is pure speculation on my part, based on a rumour that might not even be true, so what I think about 3rd parties is really another level of speculation that is also just as likely to be meaningless.

I do wonder about those Move Engines, and how transparent they are to the developers. If they're accessed explicitly, I'm not sure how you release an upgrade a couple years later unless it follows the same model.
I was thinking of lowering cost (/use more cost efficient solution as technology evolves), I'm a bit wary with upgrade cyle, I see no reason to do it, the business model for such a thing looks super risky to me.
I don't think either that MSFT could have third parties to make their own box. Though digging that line of thinking, I think that MSFT could deploy/sell the next box as a software on PC via their app store. The market is more and more about laptop, and you can only pack that much power in your average laptop especially as if MSFT were to that I could see them make that available to APU only.
It would be a pretty disruptive with possibly quiet some consequences on the PC market and actors like Steam on the software end and Nvidia on the hardware side.

We don't know the definitive clock speed for either Sony or MSFT system (that type of factors changed prior release in the past), but something like Kaveri is already not that far from what Durango offers, it is bandwidth starved, might not be as power efficient but with the jump to "~14nm" process definitely meeting the requirements to run something like a durango virtual machine, is not fantasy neither something that could end limited to pretty high end laptops.

This makes me even iffier about a short upgrade cycle.

For the move engines, if MSFT wants freedom I would think that like for the rest the devs have no access to the metal, so I would think that they will deal with it through the API.
If there ae a lot less reasons to move stuff around, I would think that with a proper mapping of the virtual memory to a new high bandwidth memory pool (haswell style), the move from the 0xxxxx to 0xxxx1 could be intercepted and discarded.
But that (if you still need DME) could be an issue to deploy the virtual machine on "bog standard" PC/Laptop.
 
Last edited by a moderator:
I'm pretty sure there is already a thread about a short upgrade lifecycle (2-4 years) somewhere, so I'm not going to get too far into that, but it does allow them to keep their price high if they're selling the units at a profit (instead of a loss like normal). Just saying this model, if true, may allow them to do that.
 
You are right there is indeed a thread for that topic, let not derail that one :)

As a side note the hardware next-gen being pretty standard on both side (though I don't state this as "it sucks") I find that discussion about the software side of thing pretty interesting /refreshing, I've to say that whereas the decisions MSFT and Sony made are as sound as it gets from both a business and hardware pov, the geek in me still feels orphan of those UFO like system we had more than often in the console realm, not too mention that for some reasons, to be honest looking at my low level of knowledge I can't find another way to qualify it other than to say it is a strong bias... , I feel betrayed by the "non paradigm shift, non many-cores" direction taken by the industry (games and more globally, at least for now).
Damn it I guess presentations like the "end of the GPU road map, and "the Zen in many cores rendering" got me in a "locked-in" type of stance (no disrespect to the one that could suffer that terrible disease).
 
Last edited by a moderator:
If Durango's platform includes the possibility of running applications from an app store, video feeds, and extended net services, some form of memory and IO isolation is a start at securing it.

Various forms of hypervisor or low-level monitoring of resources obviously have been done before for consoles, and this could be an evolved form of it that might allow for a wider range of apps without every one of them being an immediate threat vector.

The implementation of the scheme would determine just how insulated game code is from bare metal. Since the system is no longer obligated to tell software the truth about what it is running on, or what else is running, the host system has a more complicated job.

Not to make this a comparison post, but I'd be curious what kind of isolation strategy the PS4 will have. It may not be at as high a level, but here again we have common goals, constraints, and hardware features.
 
Status
Not open for further replies.
Back
Top