DirectStorage GPU Decompression, RTX IO, Smart Access Storage

So Microsoft wants the GPUs to handle the burden of decompressing game assets, to become the alternative of the dedicated hardware decompressor blocks on Xbox and PlayStation.

This will require the game to be shipped with a special GPU friendly compression solution, to be then decompressed by the GPU to accelerate loading times and data streaming during gameplay. The solution will work on existing GPUs.

GPU vendors will determine which of their GPUs can handle that responsibility. NVIDIA already announced support for RTX 2000 and RTX 3000 cards. AMD hinted that RX 6000 will support the feature as well.

What bothers me is the overhead this will incur on GPUs, the way I see it, is that GPUs can decompress data during loading the levels of the game with no performance penalty, the GPU already sets idle during these moments, however on the fly data streaming will take away some of the GPU resources dedicated to rendering of the game, how will developers balance things out in that regard?

https://e1cdn.social27.com/events-f...ndowsbfd95322-7e41-4e8d-821e-7adee9f9725c.pdf
 
Still too many unknowns. The near complete lack of actual details makes me think this is still a ways out.
 
how will developers balance things out in that regard?
Question reminds me about experience when figuring out async compute on GCN / Vulkan:
Started to do it on dedicated compute queues. Results were disappointing at first. Then it turned out only the gfx/compute queue gives max performance, so the dedicated queues are obviously meant for such kind of background tasks.
Curiously changing queue priorities had no effect at all and seems not implemented.
Problem: This is not specified by API or even documented by AMD. For NV / Intel i'll have to do similar experiments again, and then hope discovered behavior presists on future HW and drivers.
So i guess Direct Storage means to address those uncertainties, likely with dedicated queues to run the decompression stuff.
 
They specifically mentioned DEFLATE over BCn textures, which is what BCPack on the Xbox seems to do. And in the session chat, they also clarified that DirectStorage is not using peer-to-peer DMA, though they consider this for a future releases.

So I'm not sure how RTX I/O would be related to DirectStorage, besides implementing a similar lossless decompression step using compute shaders. Looks like another proprietary API to me.
 
Last edited:
What bothers me is the overhead this will incur on GPUs, the way I see it, is that GPUs can decompress data during loading the levels of the game with no performance penalty, the GPU already sets idle during these moments, however on the fly data streaming will take away some of the GPU resources dedicated to rendering of the game, how will developers balance things out in that regard?
A GPU will be used for decompression as well as rendering. From the GPU point this is just another compute workload which it has to perform to be able to render a scene. Moving this from CPU to GPU should be beneficial on average as GPUs should be considerably faster at this.
I'd expect GPUs to include specialized h/w decompression units for this work in the future though, similar to video dec/enc we have now.
 
So is RTX IO doing something extra over DirectStorage hardware decompression?
IMO DirectStorage likely provides the means to move data as effortlessly as possible between nvme to GPU. I think MS may leave it up to the vendors themselves to implement their own decode algorithms that wood best suit their hardware maybe?
 
I'd kind of expect there to at be an open-source general reference implementation for GPUs and then specific versions by AMD/Nvidia/Intel GPU providers.
 
So is RTX IO doing something extra over DirectStorage hardware decompression?
Who knows, the slide with a direct connection between SSD and video RAM didn't make much sense to me.

Nvidia could probably use peer-to-peer DMA on Linux but this would require major revisions to the WDDM driver model on Windows.
The API isn't in beta even, nobody knows yet how it will be released.
Exactly. It's almost 8 months from the original RTX I/O announcement, but no further details were made available.
 
So Microsoft wants the GPUs to handle the burden of decompressing game assets, to become the alternative of the dedicated hardware decompressor blocks on Xbox and PlayStation.

This will require the game to be shipped with a special GPU friendly compression solution, to be then decompressed by the GPU to accelerate loading times and data streaming during gameplay. The solution will work on existing GPUs.

GPU vendors will determine which of their GPUs can handle that responsibility. NVIDIA already announced support for RTX 2000 and RTX 3000 cards. AMD hinted that RX 6000 will support the feature as well.

What bothers me is the overhead this will incur on GPUs, the way I see it, is that GPUs can decompress data during loading the levels of the game with no performance penalty, the GPU already sets idle during these moments, however on the fly data streaming will take away some of the GPU resources dedicated to rendering of the game, how will developers balance things out in that regard?

https://e1cdn.social27.com/events-files/SysTools11_DIrectStorage for Windowsbfd95322-7e41-4e8d-821e-7adee9f9725c.pdf

Nvidia have claimed that the impact on GPU performance is negligible. It's worth remembering that for in game background streaming the transfer rate is going to be very low by NVMe standards. Maybe 10's to low hundreds of MB/s.

So if this tech can easily saturate the max bandwidth of "gaming NVMe SSD's" as Microsoft claims, then 200MB/s for example (which I think was the figure being claimed for the UE5 demo) should be no problem at about 3% of the max bandwidth of the fastest SSD's out there.

I'd also assume when they say their GPU based decompression can easily saturate NVMe drives that they're not basing that on the compute capabilities of a 3090 but rather something a bit more common.
 
A GPU will be used for decompression as well as rendering. From the GPU point this is just another compute workload which it has to perform to be able to render a scene. Moving this from CPU to GPU should be beneficial on average as GPUs should be considerably faster at this.
I'd expect GPUs to include specialized h/w decompression units for this work in the future though, similar to video dec/enc we have now.

Hardware is coming , it will just lag behind by years vs the GPU implementation.
 
A year more like. The problem isn't in h/w's absence, it's in the need to make it work on what's available currently.

They talk about it in the video they put up. By the time hardware is released it will still take years for the majority of people to have purchased and installed it.

So if we get hardware decoder support in our cpus or in ddr 5 or whatever hardware support may look end up actually being , we might not see a sizable amount of users for years after that. But with the gpu support we may get a few generations of hardware supporting this in software from the get go


Around 21 mins in
 
Last edited:
They talk about it in the video they put up. By the time hardware is released it will still take years for the majority of people to have purchased and installed it.

So if we get hardware decoder support in our cpus or in ddr 5 or whatever hardware support may look end up actually being , we might not see a sizable amount of users for years after that. But with the gpu support we may get a few generations of hardware supporting this in software from the get go
This is what I'm saying. Lack of h/w support isn't an issue, PC h/w is updated fast. The issue is with the current install base.
 
This is what I'm saying. Lack of h/w support isn't an issue, PC h/w is updated fast. The issue is with the current install base.

Yea but depending on the amount of cards this works on the user base may be enough for this first phase . Even if its just Nvidia RTX and AMD RDNA cards it will still be a sizable install base. If it works on the 10x series by nvidia and vega from amd that is bigger even still.

In the slids it looks like they have what they call sub components in development

1) Direct Compute based decompressor
Initial prototype decompression saturate gaming nvme ssds bandwidths
Open door to further innovate in possible future silicon implementations

2) Cpu decompressor for assets destined for system memory

3) Compressor
 
I wouldn't even take what was said as confirmation that will get dedicated decompression on GPUs.

In console I can see the benefit, in graphics cards that will have a working sharder implementation may not be worth the silicon. Depends on the performance.

Could see more reason for dedicated silcon on APU's I guess.
 
I wouldn't even take what was said as confirmation that will get dedicated decompression on GPUs.

In console I can see the benefit, in graphics cards that will have a working sharder implementation may not be worth the silicon. Depends on the performance.

Could see more reason for dedicated silcon on APU's I guess.

I'm thinking we will see dedicated silicon on CPUs.
 
Back
Top