DirectX 12: The future of it within the console gaming space (specifically the XB1)

he speaks from the experience of working with low level api's like dx12 and vulkan, and he clearly has developers close to him that are writing code for both and for consoles.. ... so I tend to listen to him ...

Also he definitely sounds like an approved spokesperson for AMD HW, on numerous times AMD have called him out in agreement with his observations ..

He or is team did wrote on Mantle (although one Demo, Star Swarm, isn't really a lot of experience). But I doubt he ever used Vulkan.
As for the team writing code for consoles... They are... Using the same API as in PC.
But how can they compare with the performance of older APIs if they never used them?
Besides, wasn't Wardell that said Microsoft, AMD and Nvidia were hiding the true benefits of DX 12? The creators and most interested parts on DX 12 were hiding it's benefits... shure, shure!
 
He or is team did wrote on Mantle (although one Demo, Star Swarm, isn't really a lot of experience). But I doubt he ever used Vulkan.
As for the team writing code for consoles... They are... Using the same API as in PC.
But how can they compare with the performance of older APIs if they never used them?
Besides, wasn't Wardell that said Microsoft, AMD and Nvidia were hiding the true benefits of DX 12? The creators and most interested parts on DX 12 were hiding it's benefits... shure, shure!
Not just a demo, but engine and a demo for that engine, and now a game on that engine.
 
Everything about DX12 is fully revealed. There are no more hidden secrets anymore, and we are left with a lot of speculation on what the API means for performance progression; we are waiting to see how these developers leverage these new tools.

It is far too early to suggest the performance points of DX12 are known when every benchmark of GPU utilization only looks at the 3D engine of a GPU; which is all encompassing of every aspect of the GPU. Even in straight 3D engine utilization metrics we see that DX12 outpaces DX11 by measurable points. However, these utilization benchmarks do not measure the utilization of the Copy engine and the Async Engine, which can run in parallel of the 3D engine. Until new tools are released, it's extremely difficult to gauge how far some of these GPUs can go when they are running software made specifically to leverage all of DX12s features.

It should be clear that running Copy and Async engine in parallel (as opposed to just running it all through 3D engine) will likely move bottlenecks for certain GPUs because the access patterns for memory can now change. And because of that we have little insight into it's future performance of existing hardware, at the very least I doubt anyone here could properly put a percentage point on how much better DX12 will improve for Xbox One for every single circumstance. We will just need to wait and see.
 
He or is team did wrote on Mantle (although one Demo, Star Swarm, isn't really a lot of experience). But I doubt he ever used Vulkan.
As for the team writing code for consoles... They are... Using the same API as in PC.
But how can they compare with the performance of older APIs if they never used them?
Besides, wasn't Wardell that said Microsoft, AMD and Nvidia were hiding the true benefits of DX 12? The creators and most interested parts on DX 12 were hiding it's benefits... shure, shure!


Stardock and Oxide games, he's part of both, are using the low level api's .. its pretty clear he has access to people that understand the api's deeply ...

And the dx12 api's are pretty much the same, or will arrive exactly the same, on xb1 .. and it already has the fast path ones with dx11.x ...
 
Besides, wasn't Wardell that said Microsoft, AMD and Nvidia were hiding the true benefits of DX 12? The creators and most interested parts on DX 12 were hiding it's benefits... shure, shure!

That was actually true though. MS and AMD seemed to specifically downplay the improvements as they didn't think that they would be believed without proof. Prime example is the 2000% improvement seen by DF in their recent article using a benchmarking tool. The 2000% gain is real on that metric, but no one knows what the difference will be in real world game performance.
 
He or is team did wrote on Mantle (although one Demo, Star Swarm, isn't really a lot of experience). But I doubt he ever used Vulkan.
As for the team writing code for consoles... They are... Using the same API as in PC.
But how can they compare with the performance of older APIs if they never used them?
Besides, wasn't Wardell that said Microsoft, AMD and Nvidia were hiding the true benefits of DX 12? The creators and most interested parts on DX 12 were hiding it's benefits... shure, shure!

Look at what happened when MS claimed the XB1 can be augmented with Cloud offloading. People still don't believe it since MS hasn't shown that yet. I think we'll see with Crackdown given that MS paired with Cloudgine for a massive world and destruction. Remember MS said that the Cloud can augment stuffs that are non-sensitive to latency and people still laughed. It's no wonder that they have been pretty conservative with their messages regarding DX12. At this point, it's better for MS to surprise people rather than disappoint them with hype.
 
If I were to attempt to steer whatever is remaining left of this topic. Instead of looking at synthetic benchmarks or marketing speak; useful discussion imo about the API are:

* Under what conditions will copying (copy engine) resources in parallel thrive, under what conditions will it provide no benefit.
* Under what conditions will async compute thrive, under what conditions will async compute provide no benefit
* How will DX12 improve/change how deferred/forward+ rendering engines operate
* Now that FL12 is a baseline, how could games benefit from the direct usage of PRT/Tiled Resources Tier 2 and bindless resources
* How do titles benefit from UAV loads how far can this be pushed, what are the limitations
* Under what conditions is ExecuteIndirect will thrive, under which conditions will it not
* Multi-threaded command buffer generation - how many draws do we expect games to perform per frame 2, 5, 7 years from today.
* How is async compute optimized across multiple platforms/configurations - it's clear that Nvidia and AMD do not handle async compute the same.
 
That was actually true though. MS and AMD seemed to specifically downplay the improvements as they didn't think that they would be believed without proof. Prime example is the 2000% improvement seen by DF in their recent article using a benchmarking tool. The 2000% gain is real on that metric, but no one knows what the difference will be in real world game performance.

People.. people... lets get our feet on the ground!

Is you measure how good DX 12 is in situations where the CPU was the Bottleneck on DX 11, you will see enourmous gains. And if you do a benchmark that tests those situations only, you will see tremendous gains.
But those are not real world games, those are sinthetic benchmarks!

Remember Mantle?

Mantle existed for some time. It was even supported by some games. Even the API creators supported it natively on Battlefield 4.
And what were the average gains? 19% on Battlefield, 23% on Thief (these are averages, since some parts profited more than others)!

But at that time Star Swarm already had shown a 300% gain on some tests!

Off course it did! It was designed to work specifically using situations under which DX 11 performance was not so good.

On real world scenarios, unless the game revolves around the use of thousands of objects, gains will not be that much. DX 12 will give great improvements on CPU usage, but not many on GPU usage (talking about consoles here), and GPU is the most usual Bottleneck.

But there will be gains... that's for shure... but never 2000%...

PS: Sorry Iroboto... You are quite right.
 
* Under what conditions will copying (copy engine) resources in parallel thrive, under what conditions will it provide no benefit.
It will basically always provide a performance boost, assuming you need to stream data during the game play (open world games, etc).
* Under what conditions will async compute thrive, under what conditions will async compute provide no benefit
It gives big boost when the rendering queue is fixed function bound (triangle setup, ROP). Also running ALU bound shader concurrently with BW bound shader (or clears / resolves / decompression) gives big gains. Async compute gives no gains if both shaders running concurrently have identical bottlenecks or have very bad memory access patterns.
* How will DX12 improve/change how deferred/forward+ rendering engines operate
No change.
* Now that FL12 is a baseline, how could games benefit from the direct usage of PRT/Tiled Resources Tier 2 and bindless resources
We prefer software virtual texturing instead of hardware PRT. Some games likely benefit from hardware PRT.
* How do titles benefit from UAV loads how far can this be pushed, what are the limitations
More clean code. No reinterpret cast hackery. Some saved ALU cost.
* Under what conditions is ExecuteIndirect will thrive, under which conditions will it not
In our case it means that we don't need to emulate multidraw. This improves our triangle rate, because it allows us to use indexed geometry. It is a big CPU cost reduction for games that need it push big amout of draw calls with binding changes between them. We have virtual texturing for this purpose, so a single indirect draw is sufficient to replace ExecuteIndirect (but at a slight performance cost, so we prefer ExecuteIndirect).
* Multi-threaded command buffer generation - how many draws do we expect games to perform per frame 2, 5, 7 years from today.
We don't gain from this at all (since our CPU cost in DX11 is already less that 1ms total with multidraw emulation). Games that process their frames by CPU have big gains.
* How is async compute optimized across multiple platforms/configurations - it's clear that Nvidia and AMD do not handle async compute the same.
This is a big question mark. Hopefully we can manually adjust the CU allocation somehow. Fully automatic allocation will likely cause problems.
 
Last edited:
he speaks from the experience of working with low level api's like dx12 and vulkan, and he clearly has developers close to him that are writing code for both and for consoles.. ... so I tend to listen to him ...

Also he definitely sounds like an approved spokesperson for AMD HW, on numerous times AMD have called him out in agreement with his observations ..

Wardell interviews are always fun to read as a developer because it's a big game of trying to guess what he's talking about. I'm sure everything he says originated with something technical and accurate, but by the time it got to him it lost most meaning. D3D12 will solve the resolution problem on xbox one? PS4 is not completely native yet? Most engines aren't doing multithreaded rendering? D3D12 gives you more light sources on PC?

In a way it's the same issue as Digital Foundry. It's one big game of Chinese whispers with an engineer on one end and an article on the other. The more people the information goes through the more meaning it loses. Digital Foundry is like reading information that's gone through one or two people, Wardell is like reading information that's gone through about 20.

Anyway.... take Wardell interviews with grain of salt. I have nothing against the guy but his explanations of things are not always close to reality.
 
D3D12 gives you more light sources on PC?
Properly optimized DX11 compute shader based lighting is already able to push 10k+ lights at 60 fps on a middle class GPU. Do you really need more?

Obviously if we are talking about shadow casting light sources, then DX12 is a big boost (cheap draw calls and/or ExecuteIndirect). But even with DX12, shadow casting light sources are going to be expensive.
 
Properly optimized DX11 compute shader based lighting is already able to push 10k+ lights at 60 fps on a middle class GPU. Do you really need more?

Obviously if we are talking about shadow casting light sources, then DX12 is a big boost (cheap draw calls and/or ExecuteIndirect). But even with DX12, shadow casting light sources are going to be expensive.

Likely you actually need less light sources, albeit higher quality ones with dynamic bounce and shadows.

Shadow rendering must be the kernel of truth hidden in his statement. The draw calls add up very quickly, but lighting itself is still a shading problem that DX12 barely touches.
 
I've read something about voxel based raytracing in dx12, but I've barely understood that.
It's still too much computationally expensive?
 
My understanding is that voxel cone global illumination is greatly improved if the GPU is FL12_1. Conservative rasterization, raster ordered views and volume tiled resources all play a part in speeding it up.

It's clear it can be done with Fl12_0 but there are likely some compromises compared to nvidias approach.
 
Tomorrow Children's solution is more Cone Tracing than Ray Tracing.
Ray is a special case of a cone. A cone with a zero radius.

Pixels in the screen are not points. Pixels are rectangles. Tracing pixels with rays is not optimal, it produces aliasing (just like rasterization does). Tracing with pyramids is optimal (but hard), and cone is a pretty good approximation of a pyramid (much better approximation than a ray). Cone also approximates a specular lobe better than a ray. You need much less of them for a good result.
 
My understanding is that voxel cone global illumination is greatly improved if the GPU is FL12_1. Conservative rasterization, raster ordered views and volume tiled resources all play a part in speeding it up.

It's clear it can be done with Fl12_0 but there are likely some compromises compared to nvidias approach.
Conservative rasterization and ROVs allow implementation of faster algorithms to voxelize a triangle mesh. Maxwell also has a special extension to SV_RenderTargetArrayIndex (in OpenGL) that supports a bit field instead of a single RT index. This allows replicating the same triangle to multiple slices at once (without a geometry shader). This further boosts the performance of triangle mesh voxelization.

Volume tiled resources can be used to store voxels (and/or distance fields) efficiently. This gives a small boost compared to software based (3d) virtual texturing (one less indirection). However since the GPUs have 64k pages, hardware volume tiled resources might be a little bit too coarse grained compared to software based solutions (that can be configured to produce a smaller memory overhead). Only time will tell.
 
Back
Top