Xbox One November SDK Leaked

IIRC, some Xbox function would be unpractical if you remove the system voice command, like recording.
They have some quick shortcuts to record ... double hit xbox button then choose directional pad to record that or snap other apps.
 
maybe add a warning at game boot if the game allowed to use such extreme system resource? just like PS Vita says "WIFI is DISABLED while you are playing this game" that allows the GPU to be overclocked. So xbox says "Voice command is disabled while you are playing this game".
 
System voice commands cannot be disabled, as far as I can tell. It's only removing the ability to add custom commands to your game.
 
hope MS sticking with this. Keep pushing voice command and make it even more easier to use until people will miss voice command when its unavailable.
 
  • Like
Reactions: NRP
Caches and Coherency on the Xbox One GPU

With the introduction of fast object semantics contexts and asynchronous compute contexts in DirectX 12, the responsibility for managing resource hazards is in the hands of title developers. Because the Xbox One GPU requires explicit synchronization when data is produced through one type of cache or DMA and then is consumed through another cache or DMA, it is especially important to understand the different mechanisms of synchronization on the GPU.

This topic looks at the GPU as a “pipeline” and describes data coherency and synchronization in terms of moving through the pipeline. It will be especially useful for graphics programmers who want to get correct and performant rendering out of DirectX 11.X fast object semantics contexts and DirectX 12.

GPU synchronization
The Xbox One GPU has eight graphics contexts, seven of which are available for games. Loosely speaking, a sequence of draw calls that share the same render state are said to share the same context. Dispatches don’t require graphics contexts, and they can run in parallel with graphics work.

Command Lists, Draw Bundles, and Deferred Contexts

When a title is render-thread bound, command lists and draw bundles are two techniques for potential performance improvements. Correctly using command lists and/or draw bundles can significantly improve CPU rendering performance.

Command lists and draw bundles have the following similarities:

  • They both are encapsulations of state setting and draw calls.
  • They are recorded using a deferred context (draw bundles can also be recorded on the immediate context through a future API update).
  • They are typically executed from an immediate context.
If a title is render-thread bound and has parallelizable rendering tasks, you can use a deferred context on another CPU thread to perform some of these rendering tasks and record the graphics commands (state settings and draw calls) into a command list. The command list is then executed on the immediate context at a later time. Furthermore, multiple command lists can be recorded in parallel on different threads, giving even more performance improvements. By doing this, you allow the rendering tasks to be effectively spread out over multiple CPU threads.

In addition to finding out parallelizable rendering tasks, you can also identify reusable and repetitive rendering tasks, such as rendering a character. These tasks can be recorded into a draw bundle from either a deferred context (supported right now) or an immediate context (supported through a planned future API update). Using draw bundles saves you CPU time because executing a draw bundle costs significantly fewer CPU cycles than issuing the graphics commands that are contained in the draw bundle directly.

Please note that these technologies are for improving CPU performance. They won’t help if your title is GPU bound, because the amount of work that the GPU has to perform remains the same whether you use these technologies or not.

Xbox One GPU Overview

Command Processor
Xbox One supports multiple GPU command streams interleaved in hardware. Some command streams contain rendering commands, and others are compute-only. Both types of command stream progress through the GPU at the same time, sharing compute and bandwidth resources and making it possible to decouple compute work from rendering work. In particular, compute tasks can leapfrog past pending rendering tasks, enabling low-latency handoffs between CPU and GPU.
 
Xbox One GPU Overview

Command Processor
Xbox One supports multiple GPU command streams interleaved in hardware. Some command streams contain rendering commands, and others are compute-only. Both types of command stream progress through the GPU at the same time, sharing compute and bandwidth resources and making it possible to decouple compute work from rendering work. In particular, compute tasks can leapfrog past pending rendering tasks, enabling low-latency handoffs between CPU and GPU.

So is this something specific to the Xbox One implementation of GCN (they have a custom bus iirc) or is it part of PC and PS4 implementations too?

I recall an audio developer talking about high latency for compute tasks limiting what they could do using compute - I think for PS4 - and at least superficially this would seem to be at odds with that.
 
alias "Xbox One / GCN supports multiple wavefronts in flight per CU"
It's regular stuff, just MS not using tech words.
 
Yes, I understand that all GCN support multiple wavefronts and can interleave rendering and compute, but this part:

"In particular, compute tasks can leapfrog past pending rendering tasks, enabling low-latency handoffs between CPU and GPU."

... wasn't something I remember reading a before. It makes it sound like you can explicitly push chosen compute tasks to the front of the queue, rather than add them and just let the GPU prioritise based on unused resources in the CUs.

But perhaps that's just a standard part of GCN. Being able to chose the order and priority of things in the queue would be useful on all platforms, I guess.
 
Yes, I understand that all GCN support multiple wavefronts and can interleave rendering and compute, but this part:

"In particular, compute tasks can leapfrog past pending rendering tasks, enabling low-latency handoffs between CPU and GPU."

... wasn't something I remember reading a before. It makes it sound like you can explicitly push chosen compute tasks to the front of the queue, rather than add them and just let the GPU prioritise based on unused resources in the CUs.

But perhaps that's just a standard part of GCN. Being able to chose the order and priority of things in the queue would be useful on all platforms, I guess.
I'm not sure if this helps answer the question, because from the sounds of it, the bolded above is ultimately about efficiency when you require it.

From the architects interview the customizations to the command processor are the following:

We also took the opportunity to go and highly customise the command processor on the GPU. Again concentrating on CPU performance... The command processor block's interface is a very key component in making the CPU overhead of graphics quite efficient. We know the AMD architecture pretty well - we had AMD graphics on the Xbox 360 and there were a number of features we used there. We had features like pre-compiled command buffers where developers would go and pre-build a lot of their states at the object level where they would (simply) say, "run this". We implemented it on Xbox 360 and had a whole lot of ideas on how to make that more efficient (and with) a cleaner API, so we took that opportunity with Xbox One and with our customised command processor we've created extensions on top of D3D which fit very nicely into the D3D model and this is something that we'd like to integrate back into mainline 3D on the PC too - this small, very low-level, very efficient object-orientated submission of your draw (and state) commands.
 
maybe add a warning at game boot if the game allowed to use such extreme system resource? just like PS Vita says "WIFI is DISABLED while you are playing this game" that allows the GPU to be overclocked. So xbox says "Voice command is disabled while you are playing this game".
That would be a pretty shitty thing to do to people who paid extra for Kinect only to have the advertised universal functions taken away.
 
I use the Kinect every time. Snapping and unsnapping apps through Kinect is pretty useful when double-tapping the guide button doesn't work too well (and there are plenty of times when I double-tap the guide button and it sends me to the dashboard instead of snapping).
 
There is a page on a split render target sample, but unfortunately it doesn't have much info. The picture shows about 70% of the render target in ESRAM and 30% in DRAM.

Seems the new SDK eSRAM flexibility the Dying Light devs spoke of lets them put things into DRAM much easier than before. Hopefully this allows many more games to get to 1080p instead of trying to stuff everything into eSRAM and ending up around 900p.
 
I'm not sure if this helps answer the question, because from the sounds of it, the bolded above is ultimately about efficiency when you require it.

From the architects interview the customizations to the command processor are the following:

Thanks. It really does seem that MS have made some none standard additions to the command processor. Whether these do relate to gaming, and the aforementioned queuejumping of compute I dunno. Hopefully DF can get them to spill more beans over time.

The common (mis)conception is that because the X1 is less powerful, it means that MS didn't do as much work as Sony. On the contrary, MS seem to have put far more work into customising the AMD IP. Unfortunately them aimed a little low with their performance target ...
 
Okay, so here's a question for those that unlike me have read the leaked docs:

Would the virtual addressing for GPU buffers allow developers to control precisely which sections of a buffer were is sram? I'm talking about at a much finer level than just "the top 30%".

For example, the sections of a render buffer that would be obscured by the hud. They an energy bar wouldn't be represented by a single range of virtual memory, but (probably?) by several strips/linear ranges of memory. If you knew which these addresses would be and could put these completely obscured ranges in main memory then you could save on the amount of sram needed to store the sections of the buffer that required R/W.

If, say, 10% of your screen was obscured by the HUD, then putting that section in low BW memory and simply not writing to it might be useful ...?
 
If, say, 10% of your screen was obscured by the HUD, then putting that section in low BW memory and simply not writing to it might be useful ...?

I don't see anything in the leaked docs that proves it's possible. Not to mention that linear page addressing will get you more or less to the same scenario: just use top and bottom parts of a render target, the HUD is usually there anyway.
 
Yes, I know that a HUD is normally to the top and bottom, but I was wondering about fine control over which pixels you put where. For example, driving through fire and water at ground level (potentially lots of overdraw) wouldn't be a good fit for putting the bottom part in main ram. If you could still put most of the none hud area in sram it'd work better.
 
I don't see anything in the leaked docs that proves it's possible. Not to mention that linear page addressing will get you more or less to the same scenario: just use top and bottom parts of a render target, the HUD is usually there anyway.


It appears that any part of a pixel within your render target can be selectively placed in esram vs ddr3 .. up to the dev an how much optimizing they want to do ..

A more complex option is to enable 4x multisampling with the first two fragments of every pixel in ESRAM and the last two fragments in DDR3.
 
Back
Top