DirectX 12: The future of it within the console gaming space (specifically the XB1)

all I know is that at one time it was frowned upon to say that xbox one was designed with dx12 in mind. Now Phil Spencer has said that they indeed knew about what dx12 was doing at the time of xbox one development. Now does that mean it was designed to take advantage of Dx12 with its architecture? we will find out at GDC. Thrilling reading all the posts and analysis.
 
This is really the last unknown we have about Microsoft's console. After this [knowing it all] I imagine the discussion will focus on rendering techniques and algorithms, how the hardware is used. And that's going to be a great read, one that I can't participate in unfortunately lol.

It's been a year+ building up to this moment, so I'm actually relieved that all will be revealed within the next 6 months. The back and forth was and is necessary as a lot of people read this site; the Xbox One thread has by far the most views of any thread on this forum. It's important to not spread misinformation if it can be avoided, but equally acknowledge evidence when it's presented.
 
[B]ThatBuzzkiller[/B] said:
Volume tiled resources is just partially resident textures in 3D and GCN does support 3D textures so support for VTR can be easily extended ...
Um... no. The key hardware requirement of tiled resources is swizzle patterns that work out to ~square/cubic shapes at the page size (i.e. 64KB). My memory is fuzzy, but doesn't GCN not even have a 3D texture swizzle? i.e. it stores volume textures basically just like 2D texture arrays, in slices? Could be wrong and I'm thinking of another architecture but that certainly would not be compatible with volume tiled resources.

Edit: I've been told there is actually a 3D swizzle in GCN so I must be remembering a different arch. That said, that is still not sufficient in and of itself for tiled volumes. GCN may well support that, but we don't know that simply from the fact that is has page tables + 3D textures...

[B]ThatBuzzkiller[/B] said:
Conservative rasterization can be performed in the geometry shader so there's no worries there ...
Yes, very slowly. And to be clear, you also need pixel shader code to basically implement the additional edge/corner tests that the rasterizer would ideally be doing.

[B]ThatBuzzkiller[/B] said:
Raster ordered views is just a fancy way to rename Intel's pixelsync and the feature's name in OpenGL is Intel_fragment_shader_ordering which by the way GCN already supports so ROVs are already covered with GCN ...
Last time I tried that extension on AMD it was incredibly slow... like it took 300ms to render a single trivial test frame when enabled with no overlap/contention at all. That may have just been some buggy initial code but unless someone has run - for instance - the Intel sample on GCN lately and determined that it's actually fast now, I wouldn't count on this being particularly usable... certainly they'd have to do better to legitimately claim hardware support in my opinion.

I don't think there is one. :)
ALU is a hardware, memory controller is a hardware, but "hardware conservative rasterization" is either a feature of the shader compiler, or a feature of the command processor microcode, both are "software" as far as I see it.
See above... it's a modification to the rasterization hardware (coverage tests), plain and simple. It has nothing to do with the shader compiler or command processor.
 
Last edited:
The new DirectX 11.3/12 features do not allow anything radically new. You can do software virtual texturing for 3d texture tiles, you can do conservative rasterization with geometry shaders and you can do rasterizer ordered view techniques with linked lists (and other constructs). The important thing is that all these software solutions are either considerably slower and/or have issues in corner cases. But same can be said about some important DirectX 10 and 11 features. You could do tessellation also with geometry shaders and multipass stream out (very slowly... just like the conservative rasterization emulation). You could also emulate indirect draw calls with stream out and DrawAuto as well (super slowly).

I am personally super exited about these new features. All of these features are very useful for real games. You can't say the same thing about geometry shaders or some other buzzword additions. These features feel like they solve real problems requested by real developers.
 
I don't know if some feature is slow because of the hardware, or because of how directx is built (architecture). And don't see how anybody can argue about that without actual access to driver code (and NDA).
Therefore I don't think it's useful to debate which features of DX12 are "in hardware" and which are not. Something that is slow within API constraints, can be fast without that API (yes, on other console).
I'm having a deja vu here from the last gen: "but, but, but PS3 is not DX10" talks. Which were very silly. Sorry. :)
 
Thanks to Andrew and Sebbbi to give clarification about this. For console gamer all this hardware feature will probably be available and useful for next generation hardware PS5 and Xbox two. If there is a next generation console.

And it will probably useful on a very few exclusive PC games before next generation.
 
Last edited:
And don't see how anybody can argue about that without actual access to driver code (and NDA).
You got answers from people in the industry that work with graphics hardware, drivers and APIs... is that not enough? If you're talking about the pixel sync extension then yeah, who knows why it is slow but it remains to be seen if they can make it usable on current hardware or not. Suffice it to say it's not safe to assume that they can without hardware changes.

It's possible GCN supports this stuff in hardware and it was just a secret all this time. What's not possible is that a software implementation will be anywhere near the performance of proper hardware.
 
Last edited:
Last time I tried that extension on AMD it was incredibly slow... like it took 300ms to render a single trivial test frame when enabled with no overlap/contention at all. That may have just been some buggy initial code but unless someone has run - for instance - the Intel sample on GCN lately and determined that it's actually fast now, I wouldn't count on this being particularly usable... certainly they'd have to do better to legitimately claim hardware support in my opinion.
Could you provide link to the Intel sample you're referring to?
 
Yep, want to see more of the 380x and 390x for sure before I upgrade GPU's. In relation to X1, we could see this provide more stable fps? The architects mentioned this in the past, but it did not make sense as outside of a few titles we have seen fps dip and plenty of screen tearing.
 
Yep, want to see more of the 380x and 390x for sure before I upgrade GPU's. In relation to X1, we could see this provide more stable fps? The architects mentioned this in the past, but it did not make sense as outside of a few titles we have seen fps dip and plenty of screen tearing.

I suppose if the game was CPU bound sure we'll see difference.In scenarios where it's GPU bound I'm wondering if some things can be moved back onto the CPU since it's been freed up a bit. But considering there are only 6-7 low powered cores available it's likely going to require a little bit more to max out the GPU than what was benchmarked.

IIRC forza 5 had great graphics until they added more physics and more cars eventually resulting in the downgrade of the audience among other things.

Wrt forza, I suspect there will be a big improvement over 5.
 
Last edited:
I suppose if the game was CPU bound sure we'll see difference.In scenarios where it's GPU bound I'm wondering if some things can be moved back onto the CPU since it's been freed up a bit. But considering there are only 6-7 low powered cores available it's likely going to require a little bit more to max out the GPU than what was benchmarked.

IIRC forza 5 had great graphics until they added more physics and more cars eventually resulting in the downgrade of the audience among other things.

Wrt forza, I suspect there will be a big improvement over 5.
Yeah, I also though about cpu bound scenarios, but than I saw the 260x in the test. Even this card improves >150%. This card should not be cpu-bound like this. For me ot looks like star swarm even uses some dx12/mantle specific features to render more efficient, else sich an improvement should not be possible.
But nice to see how much better batch-submission will be, and how bad amd drivers are right now on that

On the other side, amd did never optimize dx11 drivers (only mantle) for this benchmark. Nvidia did.
 
Yeah, I also though about cpu bound scenarios, but than I saw the 260x in the test. Even this card improves >150%. This card should not be cpu-bound like this. For me ot looks like star swarm even uses some dx12/mantle specific features to render more efficient, else sich an improvement should not be possible.
But nice to see how much better batch-submission will be, and how bad amd drivers are right now on that

On the other side, amd did never optimize dx11 drivers (only mantle) for this benchmark. Nvidia did.
For PC benchmarks you also need to take into factor that the driver overhead is super low compared to directX11. So you need to keep that in mind, this gain shouldn't apply to Xbox one or at least as much since DX11 fast semantics is considered a low driver overhead version for Xbox one.

It's going to be quite interesting to see the games of the future. GPUs will be running full tilt as the CPU crams it. Exciting times for all gamers.
 
From article:
Oxide Games has emailed us this evening with a bit more detail about what's going on under the hood, and why Mantle batch submission times are higher. When working with large numbers of very small batches, Star Swarm is capable of throwing enough work at the GPU such that the GPU's command processor becomes the bottleneck. For this reason the Mantle path includes an optimization routine for small batches (OptimizeSmallBatch=1), which trades GPU power for CPU power, doing a second pass on the batches in the CPU to combine some of them before submitting them to the GPU. This bypasses the command processor bottleneck, but it increases the amount of work the CPU needs to do (though note that in AMD's case, it's still several times faster than DX11).

As far as I understand, there is a GPU bottleneck in the command processor , Is it possible the reason for xbox one's multiple command processors?
 
Not for certain yet Chris1515. Our forum insider says differently; that it will change for the future. But OS is the current mode. Let's say nothing changes.

Mosen did highlight on several occasions that MS customized the GCP and compute processors such that we will see their customization back in the PC space. So if the second GCP isn't assisting with that, the customization could be.

edit:
We also took the opportunity to go and highly customise the command processor on the GPU. Again concentrating on CPU performance... The command processor block's interface is a very key component in making the CPU overhead of graphics quite efficient. We know the AMD architecture pretty well - we had AMD graphics on the Xbox 360 and there were a number of features we used there. We had features like pre-compiled command buffers where developers would go and pre-build a lot of their states at the object level where they would [simply] say, "run this". We implemented it on Xbox 360 and had a whole lot of ideas on how to make that more efficient [and with] a cleaner API, so we took that opportunity with Xbox One and with our customised command processor we've created extensions on top of D3D which fit very nicely into the D3D model and this is something that we'd like to integrate back into mainline 3D on the PC too - this small, very low-level, very efficient object-orientated submission of your draw [and state] commands.


I'm satisfied with the customizations to the command processors being related to removing the bottleneck of insanely high draw calls. I guess it's about reaching the absolute ceiling. Really focus and ensure the bottleneck is ALU? The ability to support a lot of small jobs that would help fill holes in async compute?
 
Last edited:
I didn't see this slide before:

introduction-to-direct-3d-12-by-ivan-nevraev-31-1024.jpg


http://www.slideshare.net/DevCentralAMD/introduction-to-dx12-by-ivan-nevraev

Mosen, this might be related more here
 
Last edited:
Back
Top