Xbox One November SDK Leaked

Oh OK cool, I stand corrected, on a previous post it was mentioned that vgleaks were unproven and should not be referred to

On the Xbox Diagrams from vgleaks it shows each CU or SC as they refer to it having it's own L1 cache but on the PS4 dev slides shows it's shared between 3 CU units, what advantages or disadvantages does this have or could it point to those CU having come from another range of cards. That might be where some of the R&D money went to and not the dual threading ? just an idea.

Vanilla GCN has each CU with its own 16 KB vector cache. What is shared between CU clusters is 16 KB of L1 scalar cache and 32 KB of L1 instruction cache.

What you are probably seeing is not a difference in hardware but a difference in the way the overviews of the hardwares were composed.
 
Shifty my thinking with the separate OSs was freeing local resources for other things.
A local OS wouldn't free resources. All OSes are running on the same finite hardware. It'd just move the cloud aspect of a game from the RAM and GPU set aside for the game to the RAM and GPU set aside for the System/cloudOS, and add headaches. You're far better off letting the game handle everything and shrink the OS footprint, giving those OS resources to the game to use on cloudy goodness.
 
Yes the second GCP is being used by OS and Snap functions.Microsoft is working towards opening up for low level priority instructions for games.It will not happen to the Xbox One Next OS is released.

That's why the whispers are 2016 and limited situations.

PS sorry I'm just now replying was waiting on my source at Redmond to get back in touch with me.
 
Interesting. Thermphenix if you don't mind but this type of thing might be better suited for the Xbox rumours thread as opposed to this one but yes I understand why you posted in this one though. We can't verify your source unless you want to show shifty or something. Awkward situation lol thank goodness I'm not a mod! :)
 
Interesting. Thermphenix if you don't mind but this type of thing might be better suited for the Xbox rumours thread as opposed to this one but yes I understand why you posted in this one though. We can't verify your source unless you want to show shifty or something. Awkward situation lol thank goodness I'm not a mod! :)

If Shifty remembers there's a lot I leaked before launch.

GPU upclock Xbox One
CPU upclock Xbox One
I hinted at the flash in Xbox One
I also stated the PS4 had the same ram memory reserve with the flex of 512 mb.(But the post was deleted because it broke NDA)
 
How will this benefit games? Why would devs want to use it?

From Christophe Riccio article:

There is a lot of room for tasks parallelism in a GPU but the idea of submitting draws from multiple threads in parallel simply doesn’t make any sense from the GPU architectures at this point. Everything will need to be serialized at some point and if applications don’t do it, the driver will have to do it. This is true until GPU architectures add support for multiple command processors which is not unrealistic in the future. For example, having multiple command processors would allow rendering shadows at the same time as filling G-Buffers or shading the previous frame. Having such drastically different tasks live on the GPU at the same time could make a better usage of the GPU as both tasks will probably have different hardware bottleneck.

http://www.g-truc.net/doc/Candidate features for OpenGL 5.pdf

sebbbi post from DX12 thread:

Couldn't agree more with that list. Many very important features listed there.

3.10... "ballotAMD" :). For the people who don't see it right away (it wasn't mentioned in this article either), this allows you to do efficient inside wave prefix sums (without atomics / serialization). Super useful for many things.

Quote from the OpenGL 5 candidate feature list:

Good example of this is shadow map rendering. It is bound by fixed function hardware (ROPs and primitive engines) and uses very small amount of ALUs (simple vertex shader) and very small amount of bandwidth (compressed depth buffer output only, reads size optimized vertices that don't have UVs or tangents). This means that all TMUs and huge majority of the ALUs and bandwidth is just idling around while shadows get rendered. If you for example execute your compute shader based lighting simultaneously to shadow map rendering, you get it practically for free. Funny thing is that if this gets common, we will see games that are throttling more than Furmark, since the current GPU cooling designs just haven't been designed for constant near 100% GPU usage (all units doing productive work all the time).

https://forum.beyond3d.com/threads/...ming-space-specifically-the-xb1.55487/page-21

On XB1 12 CUs, two geometry primitive and four render backend depth and color engines support two independent graphics contexts. So, It may let devs to use most/all of XB1 GPU resources easily for graphics. As sebbbi and others explained it's possible to use ACEs for graphics, too (in some special circumstances). But it may be easier or more efficient (since it makes it possible to use fixed function hardwares and synchronous compute instead of async compute as well) to use second GCP for graphics.
 
Last edited:
But surely Compute fills the idle ALUs? I just can't see this generating much in tangible gains, especially if dual GCPs are limited to consoles and only one allows dev access. Designing games to use compute makes them portable across all devices.

I find it curious that the 2D Architecture thread on this subject has generated so little interest. It's not as though the GPU-interested elite of B3D are enthusiastically talking about the possibilities enabled by 2x GCPs!
 
Didn't that dink from Stardock say DX12 would allow draw calls from different threads to be submitted in parallel? It'd be interesting to know if any PC parts already have two command processors, or if the next wave of them does.

Edit:
I don't really view this guy as an expert, but what he's saying fits with what mosen linked.
http://www.littletinyfrogs.com/article/460524/DirectX_11_vs_DirectX_12_oversimplified

DirectX 12: Every core can talk to the GPU at the same time and, depending on the driver, I could theoretically start taking control and talking to all those cores.

That’s basically the difference. Oversimplified to be sure but it’s why everyone is so excited about this.

What he's talking about sounds similar. He does not mention submitting commands to multiple command processors, but he is talking about multiple threads ("cores") submitting commands.
 
Last edited:
From Christophe Riccio article

Pfff, the solution is simple: use only one draw call. It's possible. Just not through the current PC (or even console) API.
I.e. just stop beating the CPU horse, it was already dead 5 years ago, pity that DX12 architects do not understand that (or do not want to?).
 
Pfff, the solution is simple: use only one draw call. It's possible. Just not through the current PC (or even console) API.
I.e. just stop beating the CPU horse, it was already dead 5 years ago, pity that DX12 architects do not understand that (or do not want to?).
I'm not understanding how making it use only one draw call makes sense, or how that's even possible.

If I want to draw 50 squares on the screen at different locations, I can't do it using 1 draw call, or at least I don't know how.
 
I'm not understanding how making it use only one draw call makes sense, or how that's even possible.

If I want to draw 50 squares on the screen at different locations, I can't do it using 1 draw call, or at least I don't know how.

One mesh with degenerate triangles. One mesh and transparent texture. Etc.
Multiple meshes, "created" by GPU, and submitted to itself through circular command buffer. Etc.

P.S. http://timothylottes.blogspot.com/2014/06/easy-understanding-of-natural-draw-rate.html
 
Last edited:
Every core in the CPU can now talk to the GPU filling all it's cores , as well as the GPU can self-feed it's cores with work using the new "multi-draw indirect" api's in the xdk

New MultiDraw APIs (May 2014)
There are new APIs for dispatching multiple draws with a single call are now available in the ID3D11DeviceContextX Interface. The new methods are ID3D11DeviceContextX::MultiDrawIndexedInstancedIndirect, ID3D11DeviceContextX::MultiDrawInstancedIndirect, ID3D11DeviceContextX::MultiDrawIndexedInstancedIndirectAuto, and ID3D11DeviceContextX::MultiDrawInstancedIndirectAuto. The new API's functionality is similar to OpenGL's multi_draw_indirect.

For more information, see Multi-Draw calls.

So yes the GPU can reach near saturation ...

And as mentioned many times before .. all this in truly parallel/async
 
And incase anyone wanted to know how we get multiple-CPU cores doing rendering, as opposed to the current state of Dx where only 1 is rendering..

If a title is render-thread bound and has parallelizable rendering tasks, you can use a deferred context on another CPU thread to perform some of these rendering tasks and record the graphics commands (state settings and draw calls) into a command list. The command list is then executed on the immediate context at a later time. Furthermore, multiple command lists can be recorded in parallel on different threads, giving even more performance improvements. By doing this, you allow the rendering tasks to be effectively spread out over multiple CPU threads.

And as I've mentioned numerous times before, "Deferred Contexts" are created by the DeviceContext and you they can be an explicit DefferedContext (traditional graphics pipeline with draw/dispatch), or the new ComputeContext (compute dispatch only pipeline)
 
And when I mean true Async..

1. API's are now async with a "fast path"
2. We get separate DeviceContexts for graphics (ImmediateContext/DeferredContext) and compute (ComputeContext) and multiple "CommanProcessors" in HW
3. There is now an async "Presentation Queue" freeing the CPU/GPU from this task
On Xbox One, presentation runs on an asynchronous mechanism. The system maintains a separate back-buffer presentation queue that is managed in parallel with the CPU and GPU.

By default, presentation of back buffers on Xbox 360 is synchronous and is performed by the GPU. The CPU first inserts a command into a GPU command buffer to stall the GPU until vertical synchronization (VSync) occurs, then inserts a GPU command to swap (“flip”) buffers. This means that the next frame can’t start on the GPU until the last frame’s back-buffer flip is finished. However, Xbox 360 also supports asynchronous swaps that work off the vertical blanking interval (VBlank) interrupt on the CPU, helping to ensure that no processor is unduly stalled.

The Xbox One presentation system uses a mechanism that is similar to these asynchronous swaps on Xbox 360. It maintains a separate presentation queue for back buffers, which is managed in parallel with the CPU and GPU. This provides automatic hazard detection for back buffers and front buffers on both the GPU and the CPU
 
One mesh with degenerate triangles. One mesh and transparent texture. Etc.
Multiple meshes, "created" by GPU, and submitted to itself through circular command buffer. Etc.

P.S. http://timothylottes.blogspot.com/2014/06/easy-understanding-of-natural-draw-rate.html

hmm.. my limited knowledge of it, I wrote a small game in Lua for PSN. I built controllers that would loop through arrays of entities that would call their render functions (I know this is poor on the memory management and the draw call side of things). This method would be faster, basically you've taken data oriented design and gone a step further from the little I can interpret. I mean lol without building your own engine I don't think you can deploy this method.

Wouldn't this method be hard for prototyping features?
 
On XB1 12 CUs, two geometry primitive and four render backend depth and color engines support two independent graphics contexts. So, It may let devs to use most/all of XB1 GPU resources easily for graphics. As sebbbi and others explained it's possible to use ACEs for graphics, too (in some special circumstances). But it may be easier or more efficient (since it makes it possible to use fixed function hardwares and synchronous compute instead of async compute as well) to use second GCP for graphics.

I'm speculating here, but I would be curious at to why it wouldn't be harder to use the second GCP in rendering towards a unified output.
The ACEs were designed and marketed from the outset as being better-virtualized and capable of coordinating amongst themselves. Due to their simplified contexts, prioritization, context switching, and recently some form of preemption were rolled out for them first.

The graphics front end has not kept up, and significant portions of the fixed-function pipeline have not kept up with this.
It will not be until Carizzo that preemption finally rears its head for the graphics context, and the paranoia over having the GPC being DOSed has been a point of contention for Kaveri's kernel development discussion for Linux. If a platform is paranoid about a game DOS-ing the GPU, or it needs some level of responsiveness, one way to get in edgewise is to have a secondary front end that can sneak something in.
(fun fact: It's not just graphics. The SDK cautions against having long-running compute kernels active when the system tries to suspend. If it takes too long to respond, it's a reboot. Similar GPU driver freakouts can occur on the PC.)
I may be pessimistic, given AMD's slower progress on this front, but it may be harder to get proper results out of a front end that has never needed the means to coordinate with an equivalent front end before.

The delay until the new OS rollout might be another indicator of the complexity involved. The ability to properly virtualize a GPU without serious performance concerns is recent, and both the VM and hardware need to be up for it. If the older OS system model predates these changes, it may have leveraged a secondary GPC as a shortcut to present a "second" GPU for the sake of a simpler target and improved performance.

The quoted passage on cooling solutions is "meh" to me. Unless its a ROG or other boutique solution, why would a cooler be specced to dissipate a power level greater than a level that would likely blow out the VRMs of a PCIe compliant device?
No modern GPU of significant power consumption is physically capable of full utilization without a proper GPU clamping down clocks or voltages almost immediately.

Didn't that dink from Stardock say DX12 would allow draw calls from different threads to be submitted in parallel? It'd be interesting to know if any PC parts already have two command processors, or if the next wave of them does.
Maybe. However, it takes quite a bit to saturate the command processor, particularly if other bottlenecks come into play. If one of the motivations for the two command processors in the consoles was better QoS and system accessibility, Carrizo's introduction of graphics context switching might be a case where upcoming APUs have less of a need for a duplicated GPC. The other reason may be that upcoming APUs will probably bottleneck way before the gains from a second front end could be realized.
 
Halo Master Chief Collection is undergoing a beta test for it's next multiplayer patch. Cites a large scale change.
Given the scale of the update, which includes changes to the matchmaking experience and party system, we are expanding testing to include select members of the Xbox One Preview program to ensure the official release is the best possible experience for all players.

I wonder if this was what you were referring to @Scott_Arm about 2015 multiplayer platform change indicated in the SDK documents. We should keep on eye on this if everything suddenly changes. I hope this type of thing continues, every major game that has a massive community on PC always releases beta versions of their next patch before releasing it to the larger population. I hope this continues to be adopted.

https://www.halowaypoint.com/en-us/community/blog-posts/1-23-15-mcc-content-update-beta-test-faq
 
Last edited:
Back
Top