Xbox One November SDK Leaked

... It's true when you have no VM inside, but when you start one, all requests and interrupt need to be handle by the hypervisor

That's not true. You can selectively 'pass' on interrupt and instruction families so that the hypervisor wont trigger up. Actually, as far as we know, XB1 has the WinOS, the GameOS, a HostOS which manage hardware access (but GPU rings execution, or GameOS would lie dead) for both topmost OSes, and an Hypervisor on top of them.

The hypervisor would be the '4th' OS, likely, if you count them that (wrong) way.
However, Since it is unlikely that you could do the GPU init inside the VM (hard to believe it can be virtualized, honestly) plus the fact that both WinOS _and_ GameOS needs to access the GPU at the same time (and Kinect, likely) - makes me think that GPU memory (re)assignment is handled by the hypervisor as well, on the fly.

As a 5th OS you have probably (since it was told there were an AMD guy in linkedin with a ARM core+XB1 reference) the trusted OS, which would happen if they used this AMD technology available "on selected APU":
http://www.uefi.org/sites/default/f...rity_and_Server_innovation_AMD_March_2013.pdf
 
If the second CP isn't described in the SDK, doesn't that pretty much confirm it's for OS use only? If it was being exposed for devs to use, it'd need to be documented. With XB1's complex OS structure, one pipeline for preemptive GPU control makes sense to me.

It's said in the XDK that XB1 GPU has eight graphics contexts which seven of them are available for games (after July 2014):

GPU synchronization

The Xbox One GPU has eight graphics contexts, seven of which are available for games. Loosely speaking, a sequence of draw calls that share the same render state are said to share the same context. Dispatches don’t require graphics contexts, and they can run in parallel with graphics work.

from "What's New in the July 2014 Xbox One XDK":

Graphics
GPU Performance in general is improved by 3.5%
Overall GPU performance for typical game workloads has improved from June to July by an average of 3.5% across titles. The amount of performance improvement depends on the game workload, so your measured performance may be more or less. This improvement was made through the following:

  • An additional GPU hardware context has been made available to title rendering, bringing the available number of contexts up from 6 to 7. Doing so helps performance in portions of scenes where the GPU is context-roll bound.
  • Shaders are now automatically prefetched by the GPU. Whenever a shader is set into the pipeline, as with PSSetShader, the GPU’s Command Processor block is instructed to automatically prefetch the shader instructions into the GPU L2 before the next Draw is executed. This helps with draw performance by eliminating shader stalls due to instruction fetches.
  • Various internal GPU configuration parameters were tuned to optimize game workloads.

Also, they described two graphics command processors (in their articles) like both of them are the same:

The graphics core contains two graphics command and two compute command processors. Each command processor supports 16 work streams.

For comparison Cerny didn't mentioned second system reserved GCP on PS4 (which is different from GCN's original GCP and has less features):

  • Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."
http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1

While XB1 GPU has eight graphics context (in hardware) which seven of them are available to title. Also, many parts of the XB1 GPU (two geometry primitive engines, 12 compute units, and four render backend depth and color engines) support two independent graphics contexts should be new to GCN. From Eurogamer interview:

Andrew Goossen: The number of asynchronous compute queues provided by the ACEs doesn't affect the amount of bandwidth or number of effective FLOPs or any other performance metrics of the GPU. Rather, it dictates the number of simultaneous hardware "contexts" that the GPU's hardware scheduler can operate on any one time. You can think of these as analogous to CPU software threads - they are logical threads of execution that share the GPU hardware. Having more of them doesn't necessarily improve the actual throughput of the system - indeed, just like a program running on the CPU, too many concurrent threads can make aggregate effective performance worse due to thrashing. We believe that the 16 queues afforded by our two ACEs are quite sufficient.

So, my hypothesize is that each GCP in XB1 GPU supports 4 graphics queues/hardware-contexts and only one of them is reserved for the system. While XB1 GPU hardware can operate on two independent graphics contexts on any given time. If it's not the case, then why should they describe XB1 GPU like this?

vh26w9eddxy3brz9kfkc.jpg

Also, there are far more information in the XDK that we have access to. For example I found '"Xbox One GPU Memory Access: ONION, GARLIC, and More" link under "Xbox One Platform Technologies > Graphics > Graphics Overview > DirectX > CPU Access to DEFAULT Usage Resources" which I have no access to.
 
Mosen: I may be reading this incorrectly but what you are thinking is that having 2 GCP running independently is like stuffing (making use of idle time) the Graphics pipeline much like How ACE is meant to take advantage of idle time of the CUs ?

How many graphics contexts are normally used in a game btw? Does anyone know?
 
@iroboto

Not only CUs, but also geometry primitive engines and render backend depth and color engines. Here is an example from XB1 architects:

To facilitate this, in addition to asynchronous compute queues, the Xbox One hardware supports two concurrent render pipes. The two render pipes can allow the hardware to render title content at high priority while concurrently rendering system content at low priority. The GPU hardware scheduler is designed to maximise throughput and automatically fills "holes" in the high-priority processing. This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.
 
Last edited:
That example talks specifically about two render pipes to facilitate (low priority, rather than high) OS rendering. I suppose that's for performance non-critical docked apps. It's more important to draw your game than Skype, for example.
 
So if the 'Bone has 8 Graphics contexts, how many does the PS4 have? Eight also?

If going from 6 -> 7 has a ~3.5% increase in performance, then it seems reasonable to assume that 5 -> 6 would have a greater than ~3.5% increase, and that 7 -> 8 would have a smaller than 3.%% increase. Diminishing returns and all that.

Anyway, if the Xbox one originally has 6 (or less) graphics contexts available for games while the PS4 had 7, then it be another reasons that helps explain why launch Xbox games performed poorly compared to their PS4 counterparts.
 
Mosen's earlier post suggests it takes two GCP's to enable 8 graphics contexts. TBH I don't really get what's going on!
 
That example talks specifically about two render pipes to facilitate (low priority, rather than high) OS rendering. I suppose that's for performance non-critical docked apps. It's more important to draw your game than Skype, for example.

That example is about filling the holes in the high priority rendering. You can do that by low priority OS rendering or title rendering, as well as async compute. They didn't said that it's only for low priority OS rendering.

Mosen's earlier post suggests it takes two GCP's to enable 8 graphics contexts. TBH I don't really get what's going on!

I'm thinking of each graphics hardware context in GCPs as asynchronous compute queues (which XB1 artictect discribed them as hardware "contexts") in ACEs.
 
So if the 'Bone has 8 Graphics contexts, how many does the PS4 have? Eight also?

If going from 6 -> 7 has a ~3.5% increase in performance, then it seems reasonable to assume that 5 -> 6 would have a greater than ~3.5% increase, and that 7 -> 8 would have a smaller than 3.%% increase. Diminishing returns and all that.

Anyway, if the Xbox one originally has 6 (or less) graphics contexts available for games while the PS4 had 7, then it be another reasons that helps explain why launch Xbox games performed poorly compared to their PS4 counterparts.
I think the XB1 launch games performance deficit compared to the other system has more to do with the fact that the systems to the metal mono driver wasn't finished.
From the DF 4a interview it is stated that Metro Redux used the legacy DX 11 driver as opposed to the mono driver. 4a stated that if they had more time they could have hit 1080p on the X one with the use of the mono driver. Sure there is a performance difference between the hardware, but it isn't as extreme as what we saw at launch with COD ghosts. The Mono driver was still in preview mode until the middle of 2014.
 
I think the XB1 launch games performance deficit compared to the other system has more to do with the fact that the systems to the metal mono driver wasn't finished.
From the DF 4a interview it is stated that Metro Redux used the legacy DX 11 driver as opposed to the mono driver. 4a stated that if they had more time they could have hit 1080p on the X one with the use of the mono driver. Sure there is a performance difference between the hardware, but it isn't as extreme as what we saw at launch with COD ghosts. The Mono driver was still in preview mode until the middle of 2014.

Definitely. I think b3d managed to come to that conclusion some time back and the leak of this SDK verified it. As @function said it, Xbox One might be ready to finally launch @ e3 2015. Though as another put it, it was never really to launch until it had win 10 and dx12 so really looking at q4 2015.

Hoping to hear more about this second GCP and what the several context are for but it's not fully clear what that second pipeline is for still. Sebbbi and MJP and others clearly have indicated that GCN based boards are meant to leverage compute while the graphics pipeline is busy doing something else. At this moment it's very difficult to separate MS speak from what normally comes with GCN. @mosen And if we are going with 1 async thread is a context than ps4 has 64. So I don't think that's the right way to look at it unless I'm not reading carefully.
 
That's Async Compute using the CCPs.
Right so mosen's theory is X1 GPU is doing the same thing as the CCPs but on the rendering side so the GCPs are stuffing the pipeline when it's available?

Sounds good on paper. I guess the next thing is to learn how graphics contexts work or what should go into a context in order to see if his theory makes sense.
 
leave a comment saying "please credit Beyond 3D as the source, where the picture was published and discussed" or something like that?
 
Well, it's not the pictures that matter really. Anyone can crop it however they want, but looking at his images I know he just read through this entire thread. His blog post just about summarized this thread and he wrote a blog about it. I feel bad because likely a great deal of my posts are likely factually false.

edit edit: sigh.
 
Last edited:
seeing as we discussed the 3OS's earlier (1 page back) ...

here were my notes from when Frank talked at the past \\build\


Xbox-One-runs-Windows-8.jpg


Host OS
- owns and manages all resources, memory/cpu/gpu/controllers/traffic direction ..
- its the security layer
- OS ?!

Shared OS partition
- indistinguisable to Win8
- xbox shell (xaml app)
- system services (eg. networking stack, audio, kinect processing)
- 1.x - 2 cpu cores
- 1.5 - 3 GB ram
- on xb1 boot the HostOS launches the SharedOS

Exclusive OS partition
- Win8 that has gone on a massive diet
- fires up only when a game is launched
- needs low level HW access it goes through exclusive channels directly to HostOS ...
- l&m OS (lean and mean OS)
- win32 and winrt api's
- communicates thru
  • - shared partition to access "system services"
  • - host OS (exclusive channels)

VM's - > Partitions
- VM's were hand customized/tuned to make them as fast as humanely possible, and are now called "Partitions"
- CPU utilization is insanely low, communication to cores is super fast
- running off the VM's started out as 5-10 frames / second, of which most of a frame was spent in CPU passing commands across the boundary. At launch the they got that to 200-300 frames / second
- CPU utilization is so small thru these exclusive channels now

ref : http://channel9.msdn.com/events/Build/2014/2-651
 
- 1.x - 2 cpu cores
- 1.5 - 3 GB ram

Your notes on cpu seem bang on with recent sdk revelation, I wonder of the second part is coming some time later.

As an aside I was wondering why processing voice for the OS was a separate slice of cpu reserve / power over voice for the game. Is the phrase "Xbox record that" much easier than "archers release" for Ryse given they are processed in the shared os.
 
Back
Top