DirectStorage GPU Decompression, RTX IO, Smart Access Storage

I'm kind of worried.. TLOUP2 recommended specs have been announced, and in their tweet they also confirm that TLOUP2 will use DirectStorage.. and considering Nixxes is either helping or doing this port, I'm left worried that the game is going to have the same issue as all Nixxes other games which utilize DirectStorage.. and by that I mean stuttering going in and out of cinematics, and on camera cuts, both of which basically affects Spider-Man 2 to a ridiculous degree... as well as general frametime shittiness. But hey, here's hoping it's just Insomniac's engine.
Only if they use GPU decompression, which they've only used on two ports.
On GeForces, not on Radeons. Don't know about Arcs.
 
I am actually not sure if it is on XSX, but I know 100% it is on XSS at least lol sorry. It should also be there for Doom The Dark Ages when that comes out I imagine.
Interesting, I suspect it probably will be on XSX also though. Probably helping with the fast loading it has.

The install size for Indy is much smaller on XSS, but with it using SFS would have expected bit higher texture quality.

VRS having lower performance impact the lower the resolution, you'd think they would use that on XSX but not XSS where will also have bigger visual impact.

Not sure which has biggest texture impact on XSS Indy, VRS or texture quality
 
@GLS28R @Kaotik

I'm quite happy to report that there's no stuttering at all going in and out of cinematics or camera cuts... also what's great to see is that there's absolutely no issue with DirectStorage tanking frametimes on Nvidia hardware if you spin the camera around wildly with the mouse. Not sure if this game uses GPU decompression though. I suppose I could check with SpecialK next time.

But whatever they've done or are doing with this game... it's MUCH better than their results with the Insomniac engine.

Overall, I'm fairly content with this quality of this port thus far. It's miles better than TLOUP1.. although there's always some issues I hope they work out.. as well as a few nitpicky things I have.
 
Today I got my answer to the question above. They're only using CPU based DirectStorage. Makes sense..

This comes from Digital Foundry's Tech interview with Naughty Dog and Nixxes posted today

Could you talk about the changes to streaming, vis a vis DirectStorage?

Coen Frauenfelder:
Actually, the system lends itself very well to DirectStorage. We're just using CPU decompression, without GPU decompression. The new system gives us a lot of benefits - more room, better scalability on streaming things in faster.

Jurjen Katsman: One thing to maybe add is that we are using different compression algorithms which decompress really fast with low CPU usage, but use a little more disk space. We'd rather not spend too much time on unloading and decompressing in the background, so we make different trade-offs compared to PS5.


I don't think GDeflate GPU based decompression is long for this world... but it also makes me worry about neural texture compression, since the same fundamental issue applies to it as well. The technology is great, but the hit to performance and frametime consistency isn't worth it. These types of decompression need to happen off of the GPU on a dedicated decompression block.
 
These types of decompression need to happen off of the GPU on a dedicated decompression block.
Or how about developers would follow what MS has explicitly recommended in DS SDK and provide the users with a UI option to choose what h/w to use for decompression?
The problem is that PCs are all different (and evolve constantly!) which means that the bottlenecks are different which means that you can't just assume that it is better to decompress the assets on CPU or GPU as on different systems either can be optimal. If a game is 100% GPU limited then putting additional work on the GPU will always result in a performance loss, and the same will happen when using CPU decompression in a CPU limited scenario. The best option in this case would be to let users decide where to perform decompression as the best option is 100% system specific.
The "dedicated decompression block" sounds like an awful idea on PC IMO.

Edit: In fact the same logic can be applied to shader compilation. It is also almost 100% system specific which would be the best option to perform shader compilation on PC - as a one "precompile" step at game's launch (or in the menus prior to actually launching the gameplay) or as a background compilation process. The latter can be completely fine on many core CPUs while lower end ones would in fact struggle with it quite a bit.
 
Last edited:
I don't think GDeflate GPU based decompression is long for this world... but it also makes me worry about neural texture compression, since the same fundamental issue applies to it as well. The technology is great, but the hit to performance and frametime consistency isn't worth it. These types of decompression need to happen off of the GPU on a dedicated decompression block.
Again, the issues with GPU decompressions are only known to apply to NVIDIA. At least AMD is free of any such issues with GPU decompression. Due lack of proper testing we have no clue on what's the situation at Intels side.

The world doesn't need to and shouldn't bend over because NVIDIA can't fix their issue.
 
Or how about developers would follow what MS has explicitly recommended in DS SDK and provide the users with a UI option to choose what h/w to use for decompression?
The problem is that PCs are all different (and evolve constantly!) which means that the bottlenecks are different which means that you can't just assume that it is better to decompress the assets on CPU or GPU as on different systems either can be optimal. If a game is 100% GPU limited then putting additional work on the GPU will always result in a performance loss, and the same will happen when using CPU decompression in a CPU limited scenario. The best option in this case would be to let users decide where to perform decompression as the best option is 100% system specific.
The "dedicated decompression block" sounds like an awful idea on PC IMO.

Edit: In fact the same logic can be applied to shader compilation. It is also almost 100% system specific which would be the best option to perform shader compilation on PC - as a one "precompile" step at game's launch (or in the menus prior to actually launching the gameplay) or as a background compilation process. The latter can be completely fine on many core CPUs while lower end ones would in fact struggle with it quite a bit.
How do developers allow users to decide which hardware to use for decompression when they require that the assets be compressed using different algorithms which may only work on either the CPU or GPU?
 
Again, the issues with GPU decompressions are only known to apply to NVIDIA. At least AMD is free of any such issues with GPU decompression. Due lack of proper testing we have no clue on what's the situation at Intels side.

The world doesn't need to and shouldn't bend over because NVIDIA can't fix their issue.
No, they don't only apply to Nvidia. Nobody wants this type of shit when they simply turn the camera in their games...

1743868145988.png
 
Last edited:
Again, the issues with GPU decompressions are only known to apply to NVIDIA.
You've said this many times now without providing any sort of proof - care to do so?
From what I've seen "the issues" (if you could call performance degradation "an issue") are present on all GPUs and do not in fact "apply only to Nvidia".
Granted that Radeons generally has higher async compute ceiling (due to lower h/w utilization in the absence of it) which could help them with hiding the cost more efficiently.

How do developers allow users to decide which hardware to use for decompression when they require that the assets be compressed using different algorithms which may only work on either the CPU or GPU?
You can decompress GDeflate on CPU. And if that's the problem then there can be a different algorithm introduced - GDeflate is what Nvidia has supplied MS with essentially, if people aren't happy with it nothing stops them from providing a better alternative.
 
No, they don't only apply to Nvidia.
Care to elaborate?
ComputerBase did do the testing and found that Radeons have no issues with DS GPU Decompression, while NVIDIA does.
1743867946907.png
You've said this many times now without providing any sort of proof - care to do so?
From what I've seen "the issues" (if you could call performance degradation "an issue") are present on all GPUs and do not in fact "apply only to Nvidia".
Granted that Radeons generally has higher async compute ceiling (due to lower h/w utilization in the absence of it) which could help them with hiding the cost more efficiently.
I'm pretty sure I've provided the link above several times
 
Somewhat interestingly.. Monster Hunter Wilds recent patch came out supposedly improving VRAM utilization and a number of players reported issues with stuttering.

Someone released a tool to *decompress the textures and repackage them* and then suddenly people were able to play with the highest textures on lower end GPUs without stuttering..

Here's a link to the tool:


1743868359984.png
 
Care to elaborate?
ComputerBase did do the testing and found that Radeons have no issues with DS GPU Decompression, while NVIDIA does.
View attachment 13396

I'm pretty sure I've provided the link above several times
Yes, I did already elaborate. I showed a game where AMD definitely does have the same issues.
 
I think fundamentally the issue here is that at the cost of a bit more storage space, you can provide something which we know works better across all GPU vendors... and developers are actively choosing to forego GPU Decompression and doing it on the CPU instead. Currently, I agree with their decision... because the results are simply night and day. Also in Alex's interview there with Nixxes about TLOUP2 port they specifically say the PC version can load much faster than the PS5 version, but it's the shader compilation which happens during loading that often prevents it.
 
Yes, I did already elaborate. I showed a game where AMD definitely does have the same issues.
Apparently you edited the message while I was replying.

Considering the mess MHW is, can you be sure those issues are caused by GPU Decompression?
You can't just assume it's the cause without some sort of test to show it is caused by it. Supposedly "Special K" mod allows one to disable GPU Decompression in MHW but can't find any benchmarks comparing it.
If someone provides me with the game I'm more than happy to test it myself, but I'm not paying for that steaming pile of unoptimized crap.
 
I think fundamentally the issue here is that at the cost of a bit more storage space, you can provide something which we know works better across all GPU vendors...
It's not just the storage space though. GDeflate allows streaming to the GPU VRAM in a compressed format which gives you 2-4X the effective PCIE bandwidth. With CPU decompression you're reading the compressed data but uploading it to the GPU in a decompressed format which means that you lose half of bandwidth savings.
And again it is completely system configuration specific whether you'd want to prefer one or the other.
 
It's not just the storage space though. GDeflate allows streaming to the GPU VRAM in a compressed format which gives you 2-4X the effective PCIE bandwidth. With CPU decompression you're reading the compressed data but uploading it to the GPU in a decompressed format which means that you lose half of bandwidth savings.
And again it is completely system configuration specific whether you'd want to prefer one or the other.
Yes, but that bandwidth savings is for nothing when game frametimes go to shit just by turning the camera.

For things such as decompression it shouldn't be the player choosing. Games should be doing this stuff invisibly to the player. One of the points MS was touting when they were talking up DirectStorage before it released was how it (Windows) would always choose the best path given the hardware that was being run. Games need to do the same thing. If a player has an overabundance of RAM and VRAM, a game should saturate all available memory and improve performance and load times. If they have fast storage, but less memory, it should stress the storage bandwidth. If a person has a fast CPU but a slow GPU, it should decompress on the CPU.. or vice versa with a fast GPU and slow CPU. Games should allow players to optionally pre-compile all PSOs if they choose, or do it on background threads. If the CPU doesn't have many cores... behavior should be to pre-compile more PSOs upfront.. if there's lots of cores, it should do more background compilation... and so on and so forth.
 
For things such as decompression it shouldn't be the player choosing. Games should be doing this stuff invisibly to the player.
The current situation is what you get with this philosophy.
Consider that PCs evolve and that in the future a gaming CPU can have 4X more cores making it an ideal candidate for any form of decompression.
Can a game has some heuristic which would suggest the optimal settings for a particular configuration? Sure. Should a game also has an explicit option in case that fails? Definitely.
Not having options is a console mindset and it works there only because a console is a 100% known fixed h/w.
 
Yes, but that bandwidth savings is for nothing when game frametimes go to shit just by turning the camera.
Again, you have not shown this to be the case. Your example is from a game that is known to be horribly unoptimized in many ways without any proof whatsoever that GPU Decompression let alone DirectStorage is the culprit for said frametime dips.

You would need to show it doesn't happen without GPU Decompression to make your argument valid.
Other games with GPU Decompression do not exhibit similar issues on Radeons.
 
Again, you have not shown this to be the case. Your example is from a game that is known to be horribly unoptimized in many ways without any proof whatsoever that GPU Decompression let alone DirectStorage is the culprit for said frametime dips.

You would need to show it doesn't happen without GPU Decompression to make your argument valid.
Other games with GPU Decompression do not exhibit similar issues on Radeons.
We know it to be the case because when you decompress the textures using the tool I linked above... there's no issue on either GPUs.

Furthermore, Monster Hunter Wilds is one of the only other points of reference using GPU decompression which isn't a port done by Nixxes utilizing Insomniac's engine.. which could just as easily be an explanation as to why Nvidia has problems where AMD doesn't in said titles... which helps us rule out a specific engine or developer issues being the cause.
 
Last edited:
Back
Top