Ratchet & Clank technical analysis *spawn

Here's the rift sequence with CPU/GPU counters with an unlocked framerate, you can see that during the transitions the 4090's GPU usage drops, so it's not a case of the GPU not being able to keep up with the decompression demands, some other bottleneck:

The GPU usage drops because it's not really rendering anything.. just Ratchet/Rivet and a flat textured background of some dimensions.

I think they've basically capped how many threads the decompression will run on so that it allows for the animation to at least play out and stay around the PS5 level?

They wouldn't want the PC to completely outclass the console, would they :sneaky:
 
The other thing I noticed is that all three SSDs will routinely hit 100% active time yet they never come anywhere close to their bandwidth limits. Though I will note that I had to disable BypassIO to get these readings from the NVMe drives which might be a confound.

You wouldn't hit anywhere the peak read limits on a SSD unless it's just a sustained large block sequential read operation.

I'm not sure if anyone's actually profiled these DirectStorage games to see what exact read operations (or even write operations?) are being done. If it's still just predominantly low QD 4k random reads I'm actually not even sure why all the discussions seems to be just throwing around the peak read speed numbers.
 
The GPU usage drops because it's not really rendering anything.. just Ratchet/Rivet and a flat textured background of some dimensions.

The framerate is unlocked - if it's 'not really rendering anything' and there's subsequently no demand on the GPU or CPU, then the fps would skyrocket. So clearly the CPU/GPU are still doing considerable work during that short period.

I think they've basically capped how many threads the decompression will run on so that it allows for the animation to at least play out and stay around the PS5 level?

I think that's likely the case, my point is I'm just noticing that the game doesn't seem bottlenecked, at least from that example on that rig, by GPU resources when using Gdeflate on the larger textures. It seems at the moment, the stress points for these transitions are more likely due to CPU threads.
 
So looking at the DF video... yea pretty bad how lots of textures on the PC version aren't loading in properly. Nixxes never fixed the issue in Spider-man.. pretty sad to see it make a return here... and it's not like the higher quality textures are just coming in late... they're just not coming in period.

Hopefully Alex really calls that out and brings heavy attention to it.. because it needs to be fixed. There's NO reason why PCs with GPUs with 16/24GB of VRAM, and 32/64GB of System RAM, along with DirectStorage, shouldn't have all this stuff loaded.
 
Last edited:
You wouldn't hit anywhere the peak read limits on a SSD unless it's just a sustained large block sequential read operation.

I'm not sure if anyone's actually profiled these DirectStorage games to see what exact read operations (or even write operations?) are being done. If it's still just predominantly low QD 4k random reads I'm actually not even sure why all the discussions seems to be just throwing around the peak read speed numbers.
Not in traditional computing, no, but DirectStorage is all about changing that. Intel's SFS demo in particular is purely random 64K reads and gets quite close to fully maxing out my SATA and NVMe drives' bandwidths. I wasn't really expecting that in R&C though and mostly just wrote that for posterity.

On a different note, I just tried removing the DirectStorage DLLs from the game's directory and it still works fine and loads fast. Running the game from SSD now gives me that same 7GB read count as when I ran it from HDD earlier only with much better performance since it's no longer limited by the HDD. I'd be curious if others can replicate that.
 
Not in traditional computing, no, but DirectStorage is all about changing that. Intel's SFS demo in particular is purely random 64K reads and gets quite close to fully maxing out my SATA and NVMe drives' bandwidths. I wasn't really expecting that in R&C though and mostly just wrote that for posterity.

On a different note, I just tried removing the DirectStorage DLLs from the game's directory and it still works fine and loads fast. Running the game from SSD now gives me that same 7GB read count as when I ran it from HDD earlier only with much better performance since it's no longer limited by the HDD. I'd be curious if others can replicate that.

Doesn't it just fall back to using the Windows default DirectStorage dll if you remove the game one? So perhaps a slightly different implementation but still DirectStorage.

Are you able to try a CPU downclock or disabling some cores on the portal sequence to see if it impacts the load speed?
 
The RTX 2070 Super is decimated due to it's 8GB VRAM buffer, in fact all 8GB GPUs are decimated with max RT settings, as the game is VRAM heavy even at 1080p, it needs around 10GB with RT.

The 2080Ti is truly the minimum GPU to run this game with RT, and it's faster than PS5 even when it's running at a much higher quality raster and RT settings.


This also shows the game doesn't really scale much after 4 cores.

My 7600 bearing a 12900k? Should never happen if all cores are loaded and scaling as expected.
 
Here's the rift sequence with CPU/GPU counters with an unlocked framerate, you can see that during the transitions the 4090's GPU usage drops, so it's not a case of the GPU not being able to keep up with the decompression demands, some other bottleneck:


Very interesting....

He's using a 13900k so I timed it against the earlier posted Nvidia video and your PS5 capture. Obviously we don't know what CPU Nvidia was using but the video from B4B definitely seemed faster to me.

Obviously there would be some variation on when I start and stop the timer, but I tried to start it the moment the purple "load screen" filled the screen and stop it the moment the new world filled the screen. I ran the video's at 25% speed for additional accuracy and this is what I got:

Bang for Buck Video (13900k)

5.05
6.8
6.73
4.78
7.36

PS5

4.59
7.87
4.46
5.06
7.38

Nvidia Video (CPU unknown)

5.96
8.19
7.42
5.83
7.54

Assuming any errors in timing I make average out, you can clearly see the BfB system is loading a bit faster than the NV one and is basically a wash with the PS5, winning some and losing some.
 
Doesn't it just fall back to using the Windows default DirectStorage dll if you remove the game one? So perhaps a slightly different implementation but still DirectStorage.

Are you able to try a CPU downclock or disabling some cores on the portal sequence to see if it impacts the load speed?
I also thought it might do that. But I used Process Explorer to check and there were no dstorage.dll or dstorage.dll being loaded by the process. Those were plainly visible before but the game seemed to run just fine without them.

There's also some evidence in that DirectStorage games will normally bypass Windows' file cache. You can see which files are cached in tools like RAMmap and running the game with DS enabled will only show the executable and some DLLs being cached. But when I remove these DLLs then this list gets much larger as it now starts caching all of the actual game data.

DS enabled

DS disabled

I think this is basically how the game works when it detects that it's running from an HDD. Removing the files will then force it to run that way on SSDs as well. I'll need to confirm it though.
 
Last edited:

As expected, the usual angle from NXG. I just have a couple of observations:

  • He starts by trying to validate his results by claiming, that this is an excellent port/best case scenario for the PC. This despite the bugged RT for AMD, BHV issues and texture issues. I agree the port is decent, but it quite obviously needs more time to optimise if they haven't even managed to resolve some serious bugs (broken RT) for launch. So making this out be best case for the PC in terms of optimisation is disingenuous. He ignores that the PS5's performance has improved over time through patching and so it's only fair to expect the same on the PC side (which lets be honest, is basically a given for all major games these days sadly). Essentially we are comparing a couple years worth of post launch optimisation here to a game that clearly launched a little too early.
  • He compares performance to the 2070 using an HDD and no mention of CPU used. Granted his commentary there is primarily around the HDD not being good enough to run the game (more on that below) but he also comments on the GPU's performance in relation to the low resolution despite the GPU being underutilised most of the time.
  • At one point he makes an offhand comment along the lines of "Spoiler, you can't get close to the PS5 even on top end hardware". He doesn't clarify what aspect of the game he's talking about here but he says it while running a comparison video in the background which ironically shows pretty much identical loading performance between the PS5 and two different PC drives. No mention of the other PC hardware in use though.
  • At the end, again in typical NXG fashion without any technical data to back it up he delivers the throw away comment that you will "probably" need a 16GB GPU to "get close" to the PS5. Obviously total BS.
On the whole "SDD - is it needed or is it not" debate. I think the argument is a little silly tbh, of course it's needed to achieve the results as seen on the PS5 within the other constraints of that console. Is it also needed on PC? The answer for the most part seems to be yes, although nothing like the speed of the PS5 drive is required. However it also seems very possible to get a great experience on PC with an HDD provided you have great hardware elseshere. I assume that's because the game will simply stream what it needs into VRAM or system RAM in the background before it's needed:

 
  • Like
Reactions: snc
I also thought it might do that. But I used Process Explorer to check and there were no dstorage.dll or dstorage.dll being loaded by the process. Those were plainly visible before but the game seemed to run just fine without them.

There's also some evidence in that DirectStorage games will normally bypass Windows' file cache. You can see which files are cached in tools like RAMmap and running the game with DS enabled will only show the executable and some DLLs being cached. But when I remove these DLLs then this list gets much larger as it now starts caching all of the actual game data.

DS enabled

DS disabled

That's really interesting then. Did you notice if CPU utilisation went up and GPU down during loading compared to when it's using DS? Obviously if it's not using DS it would need to decompress the GDeflate data on the CPU which I understand is pretty suboptimal.
 
That's really interesting then. Did you notice if CPU utilisation went up and GPU down during loading compared to when it's using DS? Obviously if it's not using DS it would need to decompress the GDeflate data on the CPU which I understand is pretty suboptimal.
I didn't really pay attention to that but I'll check for it later. It's also possible that there are visual differences even though I didn't notice any.
 
The RTX 2070 Super is decimated due to it's 8GB VRAM buffer, in fact all 8GB GPUs are decimated with max RT settings, as the game is VRAM heavy even at 1080p, it needs around 10GB with RT.

The 2080Ti is truly the minimum GPU to run this game with RT, and it's faster than PS5 even when it's running at a much higher quality raster and RT settings.


Some really interesting results there. Amazing to see the 3060 12GB matching the 3070Ti 8GB at 1080p and beating it at 1440p! Massive VRAM limitation there even without RT (which makes it even worse).

I would point out though that this is at Max settings which includes Very High textures. Said textures really should be designed for larger than 8GB frame buffers these days and may even be larger that what the PS5 is using (putting aside the fact that some of them don't load properly). So I'm not sure this tells us much about the obsolescence of 8GB GPU's (does a simple texture setting drop give a big boost in performance?) or even how those GPU's perform vs the PS5. I'm sure Alex's video will illuminate all of that though.

It's also notable that even at 4K Max with RT, there is no notable performance drop off even with "only" 10GB on the 3080 (as compared to the 4070 12GB) so a certain YT commentators (who shall go unnamed) claim that you "probably need a 16GB GPU to get near the PS5's performance" is clearly untrue.

EDIT: This also shows how extremely difficult getting a proper comparison to the PS5 is going to be in this game thanks to the varying settings and use of DRS. It would be quite possible to interpret those results as the PS5 being faster than a 4070Ti based on the 4K RT performance (quality mode can run *up to* 4K at 40fps with RT). Of course the PC max RT settings are going to be hugely more taxing, and the PS5 could be dropping well below 4K in that test scene, but that won't stop many such comparisons being made which highlights the importance of proper settings, scene and resolution matched tests.
 
Last edited:
Back
Top