HAGS: Hardware Accelerated GPU Scheduling *Newt Thread*

The fellow (PhazDelta) who tested Hardware Scheduling in RDR2 stated he had other issues with Win 2004 rolled back to Win 1909.
It fixed his issues (slowdown in GPU usage) and decided to rerun RDR2 to see if any changes with HAGS. Nvidia's customer support rep noticed his postings and asked him to fill a bug rep.

451.48 on win 1909 Vulkan api using GTX 1080
v1TD5gs.jpg
 
The fellow (PhazDelta) who tested Hardware Scheduling in RDR2 stated he had other issues with Win 2004 rolled back to Win 1909.
It fixed his issues (slowdown in GPU usage) and decided to rerun RDR2 to see if any changes with HAGS.
I'm confused here. Isn't 2004 required for HAGS?
 
Here are some tests done with an i5-4570 and a GTX 1060 on W10 2004.
Basically an apples to oranges comparison since he's using two different drivers, but frametime chart comparison is interesting, seems less "noisy".

Pc Specs :
GPU : Msi Gaming X 1060 6Gb
CPU : I5 4570
RAM : 8GB Ram
PSU : Antec 500W Platinum
MoBo : Acer

 
The cards with limited VRAM look to be the real winners here. I wish I hadn't given away my 1060 3GB or I'd be running some tests of my own, particularly in VR which I expect to see a massive performance increase.


The above video shows up to 3x (!!!) performance increases at certain resolutions in GTAV with the 1060 3GB.
 
From watching the video again it's not quite as dramatic of a change - with HAGS on there are still situations where the framerate drops from ~30-40fps to ~10fps, but there does seem to be less of them, and in a few specific scenes the HAGS on scenario eliminates those sustained drops entirely.

Looks like HAGS just does a better job of managing VRAM usage if it's full up.
 
Any chance of seeing HAGS with embedded GPUs?
 
New DX Dev Blog update regarding Hardware Accelerated GPU Scheduling

https://devblogs.microsoft.com/directx/hardware-accelerated-gpu-scheduling/

You may have noticed a mysterious new optional feature called Hardware Accelerated GPU Scheduling appear in the advanced graphics settings page with the Windows 10 May 2020 update. The purpose of this blog is to give some background on this new feature and how we are introducing it. It is intended for folks curious about Windows internals. Remaining on the cutting edge of hardware innovation has always been a critical aspect of our graphics platform. Hardware Accelerated GPU Scheduling enables more efficient GPU scheduling between applications. For most users, this transition will be transparent. It is one of those things that if we do our job right, you will never know the transition happened. As the graphics platform continues to evolve, this modernization will enable new scenarios in the future.
 
8th page of this doc:
https://developer.nvidia.com/sites/.../GDC16/GDC16_gthomas_adunn_Practical_DX12.pdf

Command Lists #2
Each ‘ExecuteCommandLists’ has a fixed CPU overhead
Underneath this call triggers a flush
So batch up command lists.
Try to put at least 200 μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs
Submit enough work to hide OS scheduling latency
Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones


So HW shcedulling should minimize the OS scheduling overhead for small ECL calls in theory
Wonder how this feature deals with recent low-latency features of NVIDIA and AMD drivers.
The new ultra low latency modes minimize CPU queue size and have some perf overhead, wonder whether HW GPU schedulling can fix the issue, while still keeping latency low.
 
Is there even any indicator HW scheduling is even expected to reduce fixed submission costs, or is this effectively "only" permitting semaphores and the merging of independent contexts / queues to be shifted from OS scheduler to lower end of driver / hardware?

In the later case it's still expected to get rid of pipeline stalls introduced by bad scheduling choices on semaphores on the OS side. E.g. when you have a tight, bidirectional dependency between queues you end up with a lot of latency on every blocking (unfulfilled) semaphore, due to work not even having been *really* submitted to device queue yet, as semaphores on Windows had been a pure software construct managed by OS, handled entirely even before priority scheduling.
 
Hardware Accelerated GPU Scheduling
June 30, 2020
You may have noticed a mysterious new optional feature called Hardware Accelerated GPU Scheduling appear in the advanced graphics settings page with the Windows 10 May 2020 update. The purpose of this blog is to give some background on this new feature and how we are introducing it. It is intended for folks curious about Windows internals.
https://devblogs.microsoft.com/directx/hardware-accelerated-gpu-scheduling/
 

From your link.
"The goal of the first phase of hardware accelerated GPU scheduling is to modernize a fundamental pillar of the graphics subsystem and to set the stage for things to come… but that’s going to be a story for a another time ."

Same score in Time Spy, with and without HAGS for me.
https://www.3dmark.com/compare/spy/12774709/spy/12268374
Think this technology need some time to mature, before we really can enjoy the benefits.
 
Last edited:
From your link.
"The goal of the first phase of hardware accelerated GPU scheduling is to modernize a fundamental pillar of the graphics subsystem and to set the stage for things to come… but that’s going to be a story for a another time ."

Same score in Time Spy, with and without HAGS for me.
https://www.3dmark.com/compare/spy/12774709/spy/12268374
Think this technology need some time to mature, before we really can enjoy the benefits.
To my understanding it's not that much about the technology itself but the way current applications are built for the old system, hiding the latency like mentioned in MS's devblog
 
how this feature deals with recent low-latency features of NVIDIA and AMD drivers
whether HW GPU schedulling can fix the issue, while still keeping latency low.
Anti-lag reduces latency by reducing buffering of commands on the CPU side. Hardware scheduling IHMO goes even further by freeing the CPU from micromanaging the buffers, thus reducing driver overhead.

Is there even any indicator HW scheduling is even expected to reduce fixed submission costs
They've said it explicitly in the blog post: "the new scheduler reduces the overhead of GPU scheduling".

WDDM driver is mostly not reentrant, so each driver call is served on the FIFO (first come - first served) base. Thus eliminating a realtime-priority driver thread that processes GPU command buffers would reduce processing overhead.


BTW they've also said that it's only "the first phase of hardware accelerated GPU scheduling is to modernize a fundamental pillar of the graphics subsystem and to set the stage for things to come"... WDDM 3.0?
 
Last edited:
BTW, WDDM 2.7 reports the state of the hardware-accelerated GPU scheduler by three separate on/off caps bits: HwSchSupported, HwSchEnabled, HwSchEnabledByDefault and this potentially allows some bizarre combinations like combining 'not supported' with 'enabled' and 'enabled by default'.

WDDM 2.9 reports the enabled/disabled state with HwSchEnabled bit, but it uses a generic HwSchSupportState field which reports the details of the feature implemenation in the driver: not supported (always off), experimental, stable, and always on (feature is required for the driver to work).

It seems like these generic DXGK_FEATURE_SUPPORT_* states would replace individual version-specific feature caps bits for future revisions of WDDM interfaces (see d3dkmdt.h in the latest Insider Preview SDK 20161).
 
Last edited:
To my understanding it's not that much about the technology itself but the way current applications are built for the old system, hiding the latency like mentioned in MS's devblog

its a big focus for Windows 10 X and why it was delayed. They are trying to modernize windows so it will work better on lower end processors. I got to try a neo with 10x and with regular 10. The responsiveness of the os was much improved. Sadly 10x seems delayed into the new year
 
Back
Top