No DX12 Software is Suitable for Benchmarking *spawn*

Tons of reasons which aren't relevant to me. Just expose all the resource pools and the exact storage requirements to the developer at allocation, make it actually low level. If they can't handle it they can use your babbies API which pulls the rug out from under them at will.

Fragmentation is unavoidable, but you can allow the programmer to mitigate it instead of the driver. Just like they do in their own process space on the CPU (at least until they start using 64 bit addressing).
 
Is the multi-user constraint an API-enforced one? Preemption and QoS demands can be made by the OS even if the API were accommodating developer requests to the fullest. A snapshot of resource allocations and their full behavior would remain accurate only as long the OS permits.
Perhaps with the "gaming mode" attempts in recent OS versions, coupled with increasing levels of VM encapsulation and hardware support of virtualization in the CPUs and GPUs, allocation within the confines of a VM can be made more free. Perhaps at some level, the OS can generally promise that a certain percentage of the platform can be dedicated to a game partition like with the consoles. Even then the system could reserve the right to take back resources or undermine the actual behavior of the resources, barring a custom OS that relaxes protection and responsiveness.
 
Tons of reasons which aren't relevant to me.
But are relevant for common user and for every true computer scientists.
Just expose all the resource pools
They are already exposed on Vulkan, but style didn't improve nothing at all compared to the D3D12 style.
and the exact storage requirements to the developer at allocation, make it actually low level
What can deterministically exposed is already exposed.
If they can't handle it they can use your babbies API which pulls the rug out from under them at will.
That's the DX11/OGL style.
Fragmentation is unavoidable, but you can allow the programmer to mitigate it instead of the driver.
D3D12 and Vulkan already allow the programmer to mitigate fragmentation, simply not all hardware has the same memory management capabilities.
Just like they do in their own process space on the CPU
Which is already done, address space is already shared.
(at least until they start using 64 bit addressing).
64-bit addressing is a myth and a waste of time and page-table complexity.
 
Last edited:
Is the multi-user constraint an API-enforced one?
Multi-user is not an issue at all thanks to virtual addressing
Preemption and QoS demands can be made by the OS even if the API were accommodating developer requests to the fullest.
That's not the definition of preemption: if the VRAM would be totally controlled by an application, the OS would stack untill the application unstall the request, which would require the application to service other process requests, which would results in application complexity, performance degradation and tons of security issues.
QoS has nothing to do since it is related to network priority traffic and packed switching.
A snapshot of resource allocations and their full behavior would remain accurate only as long the OS permits.
Which is already done.
Perhaps with the "gaming mode" attempts in recent OS versions, coupled with increasing levels of VM encapsulation and hardware support of virtualization in the CPUs and GPUs, allocation within the confines of a VM can be made more free.
Game mode performance increase is a myth, it's more about asks UWP application to go background, it is only a mitigation for system with tons of store crapware running and already claims UWP apps VRAM and memory if needed.
VM encapsulation like in XBOX is just used as DRM and nothing more. DX-kernel already give maximum priority to the foreground application.
Perhaps at some level, the OS can generally promise that a certain percentage of the platform can be dedicated to a game partition like with the consoles.
The OS cannot promise nothing since a PC is not a console: every PC has it's own unique combination of hardware and software and on console this is done more for DRM reasons then for performance. Even on console no-one wants another PS3 style memory management again.
Even then the system could reserve the right to take back resources or undermine the actual behavior of the resources, barring a custom OS that relaxes protection and responsiveness.
This is already exposed in WDDM 2.0 residency: it is left to the programmer to mitigate page swapping, giving the developer the possibility to give more priority to some resources instead of others to reduce game-play issues, while it is left to the OS in DX11/OGL and previous APIs which results in application hangs and stalls until swapping is complete.
 
With all that, do you think we we'll see an engine dx12/vulkan based soon ? (Not a dx11 tweaked for dx12...)

If not, I guess the api will be the main bottleneck for pc gaming ?
 
Can anyone remember a GDC(?) slide which commented on how drivers are still a big factor on DX12 too? Can't seem to find, but I'm pretty sure I saw it in this thread
 
New Can anyone remember a GDC(?) slide which commented on how drivers are still a big factor on DX12 too? Can't seem to find, but I'm pretty sure I saw it in this thread

37c6cb8dfd1d.jpg


https://www.techpowerup.com/231079/is-directx-12-worth-the-trouble
 
With all that, do you think we we'll see an engine dx12/vulkan based soon ? (Not a dx11 tweaked for dx12...)

If not, I guess the api will be the main bottleneck for pc gaming ?

Pretty sure on PC Wolfenstein 2 is vulkan only, but I could be wrong. Doom Eternal is vulkan only, for sure.
 
That's not the definition of preemption: if the VRAM would be totally controlled by an application, the OS would stack untill the application unstall the request, which would require the application to service other process requests, which would results in application complexity, performance degradation and tons of security issues.
In the context of a more console-like model, the memory requested by the game would be mostly exempt from the demand paging of other processes once the allocation was spun up. Operating systems can exempt or constrain the paging for ranges of memory, though outside of specific purposes it's minimized. If there are specific transition points like the game being pushed to the background, closure, or errors the OS could offer more firm promises if the initial spin-up can be completed.

QoS has nothing to do since it is related to network priority traffic and packed switching.
I was going with a more informal use of QoS that has been used to describe the level of performance in shared services or hardware outside of networking. For compute hardware, HSA 1.1 included QoS changes unrelated to networking, and discussions about shared or virtualized GPUs have used it in terms of the time slice and resource allocation adjustments made between separate clients.

Game mode performance increase is a myth, it's more about asks UWP application to go background, it is only a mitigation for system with tons of store crapware running and already claims UWP apps VRAM and memory if needed.
I was not claiming that it was happening now, but speculating as to whether it could be combined with the advancing features for sharing and virtualization in GPUs to allow something like this to have benefits.

VM encapsulation like in XBOX is just used as DRM and nothing more. DX-kernel already give maximum priority to the foreground application.
The OS cannot promise nothing since a PC is not a console: every PC has it's own unique combination of hardware and software and on console this is done more for DRM reasons then for performance. Even on console no-one wants another PS3 style memory management again.
Security and DRM are major drivers of the Xbox One's virtualized setup, but the console also leverages the setup to provide some enforced resource and time budgets to developers. With the game in the foreground, at least 6 cores and 5 GB of DRAM are allocated to the game partition. The reservation's memory is not subject to paging, and the isolation of the game and system partitions also allows for some rather stringent promises on the percentage of time the game will be given use of the GPU.
Additionally, the partitioned system allows for independent OS versioning for the console's application partition versus the OS version seen by a given game.

There would be specific transition points like when switching between foreground and background, and adjustments to time slice and allocation based on the version of the SDK a game opts to use. Other violations of the resources granted and their accessibility typically revolve around game-ending or crash scenarios, such as long-running compute failing to yield at the end of the game partition's GPU time slice. The latter problem could potentially be handled better with modern hardware with features like SRIOV and preemption capabilities the CI architectures lacked.
It's not the only way to go about doing this, since the PS4 has some similar partitioning of CPUs and memory without a hyperviser and multiple guest operating systems.

Whether it's a significant enough use case in PC hardware for the effort is uncertain, but making more strong guarantees than best-effort of the developer is possible for servers. A hypervisor can allocate cores and memory to an instance and leave that allocation's physical resources off-limits to other VMs.
 
Pretty sure on PC Wolfenstein 2 is vulkan only, but I could be wrong. Doom Eternal is vulkan only, for sure.
There's just one small catch - consoles. Neither (at least to my knowledge) supports Vulkan even if the hardware is capable.
 
framegraph-extensible-rendering-architecture-in-frostbite-40-638.jpg


On the premise that more low level= more efficient, There was this slide from dice gdc presentation comparing experience in context of some memory optimization. In addition to that there are multiple graphic programers stating that they see everything in console tools exposed from apis. It's safe to say console are even lower level

.
v6QwHw9l.png
 
Thx a lot. It's kind of sad, all the TFLOPS "wasted" because of poor tools / API (maybe Vulkan is doing better than DX12 ...) on PC. I know it's not new, but I kinf of hoped for a change with DX12&co...
 
Done some more digging on twitter, found some intersting opinions
https://twitter.com/SebAaltonen/status/1050760469798158336
https://twitter.com/BartWronsk/status/1050761265587486720
https://twitter.com/SebAaltonen/status/1050762291090792448
https://twitter.com/SebAaltonen/status/1050763915469164544
https://twitter.com/SebAaltonen/status/1050764920374128641
Also console context (guy was senior grapics coder on Witcher2&3, Assasins creed and God of war)
https://twitter.com/BartWronsk/status/974101129712754688
https://twitter.com/BartWronsk/status/962356068469755904
There is more in above threads, enough links for one post i think, some opinions about difficulties are oposite. Ultimately all this was kinda obvious from beginning, attempt for low level coding on platform without guarateed, well documentet basline will be difficult, some abstraction is always needed.
 
Thx a lot. It's kind of sad, all the TFLOPS "wasted" because of poor tools / API (maybe Vulkan is doing better than DX12 ...) on PC. I know it's not new, but I kinf of hoped for a change with DX12&co...
It's not about poor tools and APIs, it' about less abstraction of many different architectures, some of them with poor to very poor documentation (cof cof NVIDIA). Vulkan and DX12 made some minor different choices to what expose, with very minor final results. There will be always a trade-off betweem abstraction + code simplicity vs low-level and code complexity. What could really improve things would be a proper well performed page faulting implementation, which is still far ahead of current hardware eco-system and which would require a complete rething of GPU-CPU-RAM connections, but who really want today pass over DMA/QPI and PCI-E shit on the consumer market? (I guess no-one).
 
Last edited:
DX12 is now added to Hitman 2, resulting in MASSIVE fps uplifts over DX11, I personally witnessed a 30fps increase on my 3770K in one of the sniping levels with huge draw distance and object count. DSOG experienced 20fps gains on their 4390K in crowded areas, others gained more on their 8700K!

while Reddit’s member ‘jeesusperkele‘ has shared some comparison in which an NVIDIA GeForce GTX1080Ti achieves 30-40fps performance gains in DirectX 12.

https://www.dsogaming.com/news/hitm...-40fps-increase-in-cpu-ram-limited-scenarios/

This is quite possibly the best implementation of DX12 to date.
 
Last edited:
Back
Top