Ratchet & Clank technical analysis *spawn

Maybe a stupid question, but if you delete the direct storage DLLs, aren't than the DLLs from the systen32 being used?
The other thing might be, if direct storage GPU decompression is used, the GPU has more to do. But this might have a negative impact as your CPU is just idle (after all the PS5 CPU isn't that great) and your SSD can provide enough read speed. The old way might just have loaded more, but as bandwidth was not a problem here and the CPU has enough overhead, the GPU decompression is just more work for the GPU.
 
Maybe a stupid question, but if you delete the direct storage DLLs, aren't than the DLLs from the systen32 being used?
The other thing might be, if direct storage GPU decompression is used, the GPU has more to do. But this might have a negative impact as your CPU is just idle (after all the PS5 CPU isn't that great) and your SSD can provide enough read speed. The old way might just have loaded more, but as bandwidth was not a problem here and the CPU has enough overhead, the GPU decompression is just more work for the GPU.
seems like it's disabled for good.

F2QQ8phXoAAlB9H.png

 
All of this PS5 boasting because a single DirectStorage implementation takes a performance hit.. or because it currently loads 1 SECOND slower than the PS5 version?

Cmon now.. I'm totally all for all of this discussion, and indeed it should be happening, but people are drawing conclusions WAYYYYY too soon off of this.

If anything, Ratchet confirms to me that PCs current architecture *without DirectStorage* on a Gen3 drive... is already damn close to PS5's custom I/O, and can already handle what is likely the most demanding game from a I/O standpoint. While it is slower, it doesn't compromise the game's vision and integrity. This game clearly game in a bit hot. AMD's RT issues, bugs, missing audio sounds and visual effects, and so on... I think they wanted to have the "DirectStorage" marketing point on the box and perhaps both Nixxes and the hardware vendors rushed to get it out, and there are some bugs?

Alex had mentioned in his video, and I'll reiterate.. if this game was designed to pre-cache more data into RAM in the background ahead of time than it already does... it wouldn't even be a question. The architecture itself is still clearly sufficient.. All of this talk about PS5's I/O being so superior (which in many respects it is, to be undoubtedly clear) always has to do with games being designed around console paradigms first and foremost. I mean give me a break... if Insomniac and Nixxes designed a PC game around a 32GB RAM / 16GB+ VRAM PC from the ground up... they'd be able to do far more.. then try and port that to console up front and see how it would fare... It would take massive amounts to work properly (if at all) as well.

There's still plenty of work to do on this port, so let's see where it settles after a couple of patches.
 
Maybe a stupid question, but if you delete the direct storage DLLs, aren't than the DLLs from the systen32 being used?
The other thing might be, if direct storage GPU decompression is used, the GPU has more to do. But this might have a negative impact as your CPU is just idle (after all the PS5 CPU isn't that great) and your SSD can provide enough read speed. The old way might just have loaded more, but as bandwidth was not a problem here and the CPU has enough overhead, the GPU decompression is just more work for the GPU.

See 2 pages ago, this was all covered.
 
Now. I was wrong. The PS5 IO system is superior to any PC subsystem at the moment from the looks of things running R&C. Even if having much more RAM for caching.

However, this lack of 1:1 performance does not in any way make it the preferred way to play. Faster FPS, higher resolution and added RTAO + RTS still makes this game preferred on the PC.
 
It feels like most of the income Sony gets from ports are from people doing research to prove PC is better than PS :D
imho Kraken is giving many people an inter dimensional beating. to the point of moving them around inter dimensionally. It's beating them to a pulp. I mean, they expected it to be magic and it's not that magic, a PC without closed hardware can compete with it pretty well.

There is this article on the Kraken numbers:

https://cbloomrants.blogspot.com/2020/07/performance-of-various-compressors-on.html

Dataset 1 (Texture data BC1,3,4,5, and 7. Mix of diffuse, normals, etc.)
Kraken ratio: 1.76:1 (PS5 perf will be: 5.5GB/sec * 1.76 = 9.68GB/sec)
Kraken + RDO ratio: 3.13:1 (PS5 perf will be: 5.5GB/sec * 3.13 = 17.2GB/sec)

Dataset 2 (Texture data BC6 and 7. Mix of diffuse, normals, etc. Very much what MSFT expects of BCpack data)
Kraken ratio: 1.78:1 (PS5 perf will be: 5.5GB/sec * 1.85 = 10.1GB/sec)
Kraken + RDO ratio: 3.99:1 (PS5 perf will be: 5.5GB/sec * 1.99 = 21.9GB/sec)
 
Last edited:
imho Kraken is giving many people an inter dimensional beating. to the point of moving them around inter dimensionally. It's beating them to a pulp. I mean, they expected it to be magic and it's not that magic, a PC without closed hardware can compete with it pretty well.

There is this article on the Kraken numbers:

https://cbloomrants.blogspot.com/2020/07/performance-of-various-compressors-on.html

Dataset 1 (Texture data BC1,3,4,5, and 7. Mix of diffuse, normals, etc.)
Kraken ratio: 1.76:1 (PS5 perf will be: 5.5GB/sec * 1.76 = 9.68GB/sec)
Kraken + RDO ratio: 3.13:1 (PS5 perf will be: 5.5GB/sec * 3.13 = 17.2GB/sec)

Dataset 2 (Texture data BC6 and 7. Mix of diffuse, normals, etc. Very much what MSFT expects of BCpack data)
Kraken ratio: 1.78:1 (PS5 perf will be: 5.5GB/sec * 1.85 = 10.1GB/sec)
Kraken + RDO ratio: 3.99:1 (PS5 perf will be: 5.5GB/sec * 1.99 = 21.9GB/sec)

Except PC can use Kraken on drives with a high rate throughout so the numbers will still be higher on PC.

But they're useless as we're still no where near needing that much throughput all 99% of the time.
 

I think that's the first game I've seen where texture quality and AF impact performance to such a degree.

Going from trilinear to 16x costs a whopping 8% performance.

Very High as known now causes frame pacing issues.

Wow, in some scenes, RTAO absolutely TANKS performance. Here we go from 89fps with SSAO down to 44fps with RTAO set to Very High. Your performance gets chopped in half in some scenes. That's insane.

R&C Rift Apart AO.png
 
Last edited:
Wow, in some scenes, RTAO absolutely TANKS performance. Here we go from 89fps with SSAO down to 44fps with RTAO set to Very High. Your performance gets chopped in half in some scenes. That's insane.
My guess, geometry wasn't designed for high-level RT, and the RT is using overly complex geometry not suited to it. The game would need to be designed with RT in mind to optimise best for it, which PS5 lead platforms won't be.
 

I think that's the first game I've seen where texture quality and AF impact performance to such a degree.

Going from trilinear to 16x costs a whopping 8% performance.

Very High as known now causes frame pacing issues.

Wow, in some scenes, RTAO absolutely TANKS performance. Here we go from 89fps with SSAO down to 44fps with RTAO set to Very High. Your performance gets chopped in half in some scenes. That's insane.

View attachment 9312
Very interesting especially since you can't tell the difference in these image comparisons.
 

So, this seems to only affect NVIDIA GPU which reminds me, does RTX IO work on AMD GPUs? I assume it does not and AMD simply uses DirectStorage 1.3 GPU decompression.

F2O2t4pWYAIi_kg.png


Second image is from a Discord channel. I thought it was CapFrameX but it's not and I don't have the source.

Also interesting to note that getting rid of the DirectStorage dll stops the crashes that happen upon exiting the game after manually enabling ReBar using NVIDA Inspector.

Seems like an issue with RTX IO than DirectStorage. AMD GPUs do not seem to suffer from the same problems.

As far as I understand it, RTX-IO is just Nvidia's name for GPU decompression on their GPU's. I don't think it's an alternative to Direct Storage GPU Decompression, rather it's the necessary vendor specific half of the pipeline that takes the input from Direct Storage and does the actual GPU decompression work. AMD should have their own equivalent solution.

The fact that we aren't seeing the performance degradation when using higher texture settings on AMD GPU's means one of two things. Either they aren't using GPU decompression at all, or their solution works much better than Nvidia's, either at the hardware or software level. Either option is pretty interesting so I'm looking forward to finding out what's happening here. Does anyone here own an AMD GPU that they could test with the game?
 
imho Kraken is giving many people an inter dimensional beating. to the point of moving them around inter dimensionally. It's beating them to a pulp. I mean, they expected it to be magic and it's not that magic, a PC without closed hardware can compete with it pretty well.

There is this article on the Kraken numbers:

https://cbloomrants.blogspot.com/2020/07/performance-of-various-compressors-on.html

Dataset 1 (Texture data BC1,3,4,5, and 7. Mix of diffuse, normals, etc.)
Kraken ratio: 1.76:1 (PS5 perf will be: 5.5GB/sec * 1.76 = 9.68GB/sec)
Kraken + RDO ratio: 3.13:1 (PS5 perf will be: 5.5GB/sec * 3.13 = 17.2GB/sec)

Dataset 2 (Texture data BC6 and 7. Mix of diffuse, normals, etc. Very much what MSFT expects of BCpack data)
Kraken ratio: 1.78:1 (PS5 perf will be: 5.5GB/sec * 1.85 = 10.1GB/sec)
Kraken + RDO ratio: 3.99:1 (PS5 perf will be: 5.5GB/sec * 1.99 = 21.9GB/sec)

That's not how it works. Those are compression ratio's on specific texture sets that compress well. Different data sets compress to different degrees. What matters is the average rate of compression for the entire game, which for Kraken is around 1.5x without RDO and 2x with RDO. Thos numbers come from the Oddle website and are corroborated by Sony's own throughput figures for the PS5 with was 9GB/s without RDP and 11GB/s with.

The compression ratio of Kraken is very similar GDeflate.
 
As far as I understand it, RTX-IO is just Nvidia's name for GPU decompression on their GPU's. I don't think it's an alternative to Direct Storage GPU Decompression, rather it's the necessary vendor specific half of the pipeline that takes the input from Direct Storage and does the actual GPU decompression work. AMD should have their own equivalent solution.
Yeah, that's what I'm saying. AMD and NVIDIA both use DirectStorage but only NVIDIA uses RTX IO which is why I attribute the performance penalty to RTX IO. If DS 1.3 was the problem, both NVIDIA and AMD would exhibit the same issue.
The fact that we aren't seeing the performance degradation when using higher texture settings on AMD GPU's means one of two things. Either they aren't using GPU decompression at all, or their solution works much better than Nvidia's, either at the hardware or software level. Either option is pretty interesting so I'm looking forward to finding out what's happening here. Does anyone here own an AMD GPU that they could test with the game?
I was thinking the same that perhaps AMD doesn't use GPU decompression. From what people have gathered, deleting the DirectStorage dlls doesn't impact load times or rift sequences much. They say what they notice is a slight texture fade-in rather than the instantaneous loading of Direct Storage/RTX IO.
 
View attachment 9309
Here is everything theres to it. Those 30GB/s synthetic benchmarks were never going to prove anything.

Do you actually understand what any of those steps mean? Beyond the decompression step at least? Can you explain what unique hardware in the PS5 makes them faster than they would be a high end PC for example?

The fact is that all of these activities in a PC will be run on either the CPU or GPU passing through system memory in most cases. So can you explain why speeding up any or all 3 of those components in the PC would have no impact on the overall throughput?

You keep dismissing the synthetic benchmarks but they prove that level of throughput from disk to GPU memory is absolutely possible. There is no drive bottleneck, bus bottleneck, processing bottleneck or GPU decompression bottleneck that will prevent 20-30GB/s of decompressed texture data being moved from disk to GPU memory and being rendered on screen with a high end PC.

Where a real game differs from those benchmarks is that you have to setup and initialise the entire game world at the same time as simply loading textures which is a CPU side activity. The problem is, faster CPU's are not speeding anything up on the PC side so the bottleneck must be elsewhere.

Cerny and his team figured out that ALL THOSE STEPS in the slide are absolutely necessary to garantee a I/O datastream of 22gb/s .

There is no 22/GB/s. That is simply the peak capacity of the decompression unit for dealing with extreme corner cases of very compressible data sets that can achieve a 4:1 compression ratio. On average, the compression ration and thus throughput is half that. Both Oodle and SOny have confirmed this.

(((oh yeah i know 🙄 8-9gb/s is the more known number from that presentation, but since he explicitly states that the kraken units output is 22gb/s - i think the rest of the I/O block is able to keep up )))
Otherwise they would have excluded something to save silikon space...

No 9-9GB is the real world throughput based on average compression ratio of a full game packaged before RDO. With RDO in the form of Oodle Texture that average compressin ratio goes up to 2:1 or 11GB/s.
 
Well it seems like the CPU's are fast enough anyway, that there's more CPU availability than GPU in this title.
 
Back
Top