DirectStorage GPU Decompression, RTX IO, Smart Access Storage

We know it to be the case because when you decompress the textures using the tool I linked above... there's no issue on either GPUs
Why not just test with GPU Decompression disabled to put the issue to rest?

Furthermore, Monster Hunter Wilds is one of the only other points of reference using GPU decompression which isn't a port done by Nixxes utilizing Insomniac's engine.. which could just as easily be an explanation as to why Nvidia has problems where AMD doesn't in said titles... which helps us rule out a specific engine or developer issues being the cause.
Why would Nixxes porting or Insomniacs engine be relevant?
PS5 has no relation to DirectStorage and it uses dedicated hardware for decompression (as does Xbox Series for that matter), not AMD GPU
 
Why not just test with GPU Decompression disabled to put the issue to rest?


Why would Nixxes porting or Insomniacs engine be relevant?
PS5 has no relation to DirectStorage and it uses dedicated hardware for decompression (as does Xbox Series for that matter), not AMD GPU
Because there's no "disable GPU decompression" in Monster Hunters settings...

Because there could any number of non-DirectStorage reasons as to why Nvidia has a particular issue where AMD doesn't... considering the games were originally developed around AMD hardware. Why would you assume that developer integration would not potentially be an issue in this case?
 
Because there's no "disable GPU decompression" in Monster Hunters settings...
You can disable it with Special K mod supposedly. Or you could try just removing the relevant files which works for other games?

Because there could any number of non-DirectStorage reasons as to why Nvidia has a particular issue where AMD doesn't... considering the games were originally developed around AMD hardware. Why would you assume that developer integration would not potentially be an issue in this case?
Because disabling GPU Decompression fixes the issues for NVIDIA in said titles and same function has nothing to do with GPU on consoles?
 
You can disable it with Special K mod supposedly. Or you could try just removing the relevant files which works for other games?


Because disabling GPU Decompression fixes the issues for NVIDIA in said titles and same function has nothing to do with GPU on consoles?
It doesn't work for these games. Unlike Ratchet and Clank you need the .dlls otherwise it doesn't run. Why is what I provided not enough for you? Literally decompressing the files using the tools prevents the issue from happening in MH:W.. the conclusion is that GPU decompression is the culprit.. and AMD is affected in this title as well, and equally fixed when the files are decompressed beforehand.

Yet the entire way memory is handled is different..... which means the implementation of Nixxes memory management could play an issue, one which affects Nvidia and AMD differently. All your points of references to Nvidia having a problem specific to them is from Nixxes and Insomniac's engine. You're so adamant on blaming Nvidia, claiming it's an IHV issue.. why couldn't it just as easily be a developer implementation/engine issue?

If you can, point me to ANY other example you may have of Nvidia having this issue which isn't from Nixxes, or using Insomniac's engine.
 
It doesn't work for these games. Unlike Ratchet and Clank you need the .dlls otherwise it doesn't run. Why is what I provided not enough for you? Literally decompressing the files using the tools prevents the issue from happening in MH:W.. the conclusion is that GPU decompression is the culprit.. and AMD is affected in this title as well, and equally fixed when the files are decompressed beforehand.
Because it's not GPU vs CPU decompression, it's GPU decompression vs decompressed, they are not the same thing. Even if the game doesn't allow removing the .dll's, Special K allows you to disable GPU Decompression.

Yet the entire way memory is handled is different..... which means the implementation of Nixxes memory management could play an issue, one which affects Nvidia and AMD differently. All your points of references to Nvidia having a problem specific to them is from Nixxes and Insomniac's engine. You're so adamant on blaming Nvidia, claiming it's an IHV issue.. why couldn't it just as easily be a developer implementation/engine issue?
Of course it could be, but it would be curious why DirectStorage GPU Decompression on/off would fix the issue if the issue wasn't with DS GPU Decompression. The DS DLLs aren't by Nixxes after all, are they?
If you can, point me to ANY other example you may have of Nvidia having this issue which isn't from Nixxes, or using Insomniac's engine.
Is there even any other games besides MHW and Nixxes ports yet?
 
Ok, I just finished trying out two other games. Final Fantasy 16, which according to SpecialK utilizes GPU decompression with DirectStorage, and it does NOT have any issue at all when spinning the camera dropping frames, nor does it stutter when rapidly flinging the mouse wildly... and Ghost of Tsushima.. which also interestingly does not have any issue whatsoever dropping frames nor stuttering when rapidly flinging the mouse around... considering it's ported by Nixxes.

I'm beginning to think this is an implementation/engine issue more than an inherent API or IHV issue... and I think we have some good data points thus far.

We have:
A Nixxes game with Insomniac engine (Ratchet and Clank, Spider-Man 2) - which exhibit massive frame drops and hitching when spinning the camera or flinging rapidly with the mouse on Nvidia
A Nixxes game with Sucker Punch engine (Ghost of Tsushima) - which does not exhibit any issues when spinning the camera or flinging rapidly with the mouse on Nvidia
A SquareEnix game (Final Fantasy 16) - which does not exhibit any issues when spinning the camera or flinging rapidly with the mouse on Nvidia
A Capcom game (Monster Hunter Wilds) - which does exhibit issues on both vendors, which can be fixed with mod to decompress before on both

To me, this data suggests:
Nvidia doesn't have an inherent issue with DirectStorage that AMD doesn't.. but rather Nixxes implementation within Insomniac's engine is causing an issue specifically on Nvidia hardware.. which doesn't happen with Sucker Punches' engine... and that the issue can happen to both IHVs, as evidenced by Monster Hunter Wilds by Capcom.
 
Because it's not GPU vs CPU decompression, it's GPU decompression vs decompressed, they are not the same thing. Even if the game doesn't allow removing the .dll's, Special K allows you to disable GPU Decompression.
Regardless.. it would prove that decompression on the GPU at runtime is the issue.
Of course it could be, but it would be curious why DirectStorage GPU Decompression on/off would fix the issue if the issue wasn't with DS GPU Decompression. The DS DLLs aren't by Nixxes after all, are they?
But the engine falling back to standard decompression is handled by Nixxes... When you remove the dlls with Ratchet and Clank, the game reverts to using the traditional path.
Is there even any other games besides MHW and Nixxes ports yet?
See my post above.

If you look on SteamDB you can search by SDK, and filter by DirectStorage


This doesn't discern whether a game uses GPU or CPU based DirectStorage, but with a little research you can find out easily enough.

Another game which I never knew supported DirectStorage is Assassin's Creed Shadows.. however I don't have that game and can't test it. I don't believe Alex noted any issue in his analysis, which I imagine he would have.. so I'm inclined to believe there's no issue in that game either, which is another good data point, as it's a completely different developer/engine.
 
Last edited:
I don't think it's using GPU decompression.
I don't think that GoT does either btw. Special K is hardly a precise tool.
Yea, I'm just finding that out now haha. GoT doesn't support it according to SpecialK.. and even in Final Fantasy 16 which it does say supports it.. does literally nothing when you disable GPU decompression, or BypassIO altogether.. lol I'm inclined to believe SpecialK may accurately report whether GDeflate is being used, but the overrides are all just nonsense.

Which then pushes me back to Monster Hunter Wilds and the tool to decompress the textures beforehand.. which we know actually works and does something. SpecialK does report MH:W as using GDeflate GPU decompression, and overriding it does nothing there as well.
 
I was about to list the games the way you did, but you here you are doing the work for me, thanks for the good analysis!
Spider-Man 2
The game was updated with 8 patches so far, some of them addressing DirectStorage performance directly. On my end it works perfectly now, didn't notice any issues. I think Digital Foundry is planning an analysis of this port once it becomes stable enough.
Another game which I never knew supported DirectStorage is Assassin's Creed Shadows.. however I don't have that game and can't test it. I don't believe Alex noted any issue in his analysis
There is also Portal RTX and Half Life 2 RTX. They run fine.
 
Last edited:
Didn't the 40-series and older GPUs struggle with decompression even at lower resolutions? Meanwhile, AMD GPUs did not lose any performance, even at 4K.
No, it is the same as on 50 series - the less you are limited by GPU the lower is the impact of adding another workload onto it.
It is the same with all GPUs essentially, AMD's just have a higher async compute ceiling which they can tap into when it's not used to hide the costs better.
 
No, it is the same as on 50 series - the less you are limited by GPU the lower is the impact of adding another workload onto it.
It is the same with all GPUs essentially, AMD's just have a higher async compute ceiling which they can tap into when it's not used to hide the costs better.
It's not really the same.

Look at how much performance the 4090 loses in Ratchet and Clank:


4080 loses a lot of performance as well: https://www.computerbase.de/news/ga...rage-gibt-es-mehr-fps-auf-geforce-gpus.85052/

Meanwhile, the average framerate on the 5090 at native 4K remains almost the same.

Here is a benchmark at low resolution on an A770. It also loses a ton of performance.

 
Meanwhile, the average framerate on the 5090 at native 4K remains almost the same.
Wasn't that expected? AFAIK Blackwell has an ASIC for the Huffman decoder part of GDeflate. (Maybe it does the LZ77 part to - don't know. It's easy enough and doesn't take up much silicon to "only" match PCIe speeds.) Has been ported over from the data center architectures where it was needed to handle Deflate (without the G) as part of JPEG.

GDeflate was just an attempt to push the introduction of Deflate into on-GPU decompression by forcing a segmentation of the Huffman encoded bitstream which would supposedly be "fast enough" to do on the GPU despite the woefully bad mismatch in processor core architecture.

Apples and pears comparison. Yes, you can do GDeflate with its short Huffman encoded segments in parallel. And you can match the performance of the PCIe bus, so it's "fast enough" for a synthentic benchmark. But currently only one architecture isn't taking a hit. We had this subject before in this thread though - the fair warning that GDeflate (as opposed to a pure LZ family decompression without the problematic Huffman coding) was a poison pill to begin with.

However, don't be to quick to generalize assumptions here. There's more than one pitfall with DirectStorage in its current form, and it all appears like different games are currently tapping into different ones. With effects correspondingly also varying from gradual performance degradation to more extreme situations like freezing for several frames worth of time.
 
Wasn't that expected? AFAIK Blackwell has an ASIC for the Huffman decoder part of GDeflate. (Maybe it does the LZ77 part to - don't know. It's easy enough and doesn't take up much silicon to "only" match PCIe speeds.) Has been ported over from the data center architectures where it was needed to handle Deflate (without the G) as part of JPEG.

Do you have a source for this? I didn't see this anywhere in the whitepaper.
 
Wasn't that expected? AFAIK Blackwell has an ASIC for the Huffman decoder part of GDeflate.
Nothing on that was said anywhere and there are no signs of that in whatever little performance testing we have.
5090 is still losing performance with DS when ran in high GPU load scenario (4K res). As I've said already this isn't any different to how Lovelace handle this - if you're CPU limited then GPU decompression is "free" (obviously).
 
I'd have said that if Blackwell had what sounds to be a hardware decompression block for real-time decompression then Nvidia would have been shouting this from the rooftops as a major selling point.

Do we have any tests on lower end Blackwells?
 
Has anyone tried older drivers vs newer drivers on the 40 series and reported any changes/improvements on it? What about the newest DirectStorage DLLs?

I just recently tried Ratchet and Clank again and noticed that since getting my 9800X3D, the portal transition sequence loads so much faster than my previous processor (12 core 3900X).. whether DirectStorage is enabled or not. My god is it fast. Also, I swapped out the DirectStorage DLLs for the latest ones and I don't know if it's placebo or not (probably is) but I can spin the camera around very fast in that game now and it doesn't stutter, but I can still make it stutter if I REAAAALLLY spin that thing crazily. Something that would never happen during any type of normal play. I don't remember it being like that. It used to stutter quite a bit easier than that before from what I remember.

Anyway, I still think that what Nixxes is doing with TLOUP2 is the right way for DirectStorage.. by just doing it on the CPU. There's no stuttering at all no matter how crazily you spin around the camera, no framerate drops. What DesgustatoR said is right though.. give the player the option.

Another quick thing I noticed about TLOUP2 on PC though, which caught my attention since Nixxes said in Alex's interview that the PC version can potentially load even faster than the PS5 version, is that the engine seems to have a hard cap of 240fps specifically when loading, which is funny considering the game let's you go up to 360fps in the game. So when the frame limiter disengages for loading screens, the loading is still being limited by framerate. I wonder why they decided on a 240fps cap? It would be very interesting if there was a way to mod and remove that limit and test the loading.
 
Last edited:
Back
Top