Ratchet & Clank technical analysis *spawn

Thanks, interesting. I'll play around with it a bit.



It is perhaps both, meaning that there is a higher load, but it maybe shouldn't be affecting it this much - I gotta figure (hope) that the game being rendered properly with very high textures is not intended to swamp a 12gb card without RT, but that's what it does when I do this fix - my 3060 will eventually choke at 4K dlss peformance after playing for a while, and that's just very high textures but with a mix of high and medium settings. 😬

Did you notice the LOD change then switching DLSS modes after boot though? There are definitely more objects being drawn/higher quality when you switch DLSS modes, and when comparing to the PS5, that is actually how it should look - switching DLSS modes just makes it render the world properly.

This is what I see, zooming into a section when booting up the game with DLSS performance, then switching to another DLSS mode, then back to performance. The video just swaps back and forth between the original and the 'corrected' version, exact same settings:


The vram cost to this is always around 1 extra GB, which basically makes very high textures (eventually) unplayable on my 12GB 3060 with 4K DLSS performance. Whether that's due to a streaming system bug or not who knows at this point, but my suspicion is that at least some of these benchmarks are basically being run with lower than console settings in a rather significant aspect.

The GPU load from this change, aside from vram, is highly variable too. In enclosed areas, it's maybe 1-2 fps. But in areas like this, I can get a 7+ fps hit as it's simply drawing far more in the distance. Albeit this game's performance is highly tied to vram usage even if you're not technically 'at your limit' so who knows what it's doing with this extra detail, it could just start spilling over into main memory already.

DLSS Performance at boot, compared to switching to DLSS Ultra Performance. Despite rendering internallly at 720p compared to 1080p, DLSS Ultra Performance resolves more near/mid-term detail, simply due to the switch being performed and resetting the LOD. Even with DLSS Ultra, vram usage shoots up ~800mb.

There was nothing obvious to me from eyeballing it across runs in my test area but that's likely just me not being that good at noticing those things without a side by side. Clearly there is a difference as you've shown. I need to take some comparison shots and flip between them.
 
Does anyone know where the photomode screenshots are stored from the Steam version of the game? What I've read online is that they're in the [user]\documents\ratchet&clank\screenshots folder but they're not there for me.
 
There was nothing obvious to me from eyeballing it across runs in my test area but that's likely just me not being that good at noticing those things without a side by side. Clearly there is a difference as you've shown. I need to take some comparison shots and flip between them.

The best locations are in areas with vegetation and of course, long draw distances. It's hard to eyeball at first glance as it doesn't affect every surface/object, and the distance varies. In some scenes the only difference is seen on medium to long distances, in others the higher-res texture doesn't pop in right in front of you until you do this switch. Also being on a monitor vs. 5ft from a 55" TV is a factor, not to mention even with the wrong mips, the games textures at the VH setting are in such high resolution that it still looks decent.

The only thing that clued me into this in the first place was actually in an enclosed area that I just visited in the PS5 version, I was flipping back and forth and was "wait, why does this wall texture under that poster in performance mode on the PS5 look so much sharper? It was sharper in DLSS before" - and then I started playing with settings and discovered this.

The performance cost, even before vram starts being strangled on a 12GB card (which it will be - but again, not saying this isn't a bug itself even with proper lod), can be very significant in these areas with lots of distant objects:

Before:

1691094305719.png

After:

1691094317620.png
 
Last edited:
Somethings that might be worth looking into with respect to DirectStorage/Compression is the impact of -

- Resizable BAR, including across architectures

- Hardwar Accelerated GPU Scheduling, including across architectures
 
Computerbase has added some DS tests backing @Flappy Pannus data. Interestingly though, the slower the HD the more performance you lose with DS. SATA drives show an incredible deficit. All NVidia users should delete the DLL files. They make no difference on AMD.

Would be very interesting to learn if Radeon is actually using GPU decompression in this when DS is enabled. If they are, and they show no difference, then that's kinda good news - it means there's not a rendering performance hit when using GPU decompression, it's more of a Nvidia bug. The actual advantages it brings even if so are still in question mind you, there's no CPU measurements taken with DS on/off in this article which would have been helpful.

So potentially, when working as it should, GPU decompression could be giving a little boost to CPU performance with no cost to GPU performance, at least with the load in this game. Maybe that's why there never was a CPU/GPU toggle to begin with, Nixxes felt there wasn't a point as there wasn't a downside when they were testing it initially.
 
Would be very interesting to learn if Radeon is actually using GPU decompression in this when DS is enabled. If they are, and they show no difference, then that's kinda good news - it means there's not a rendering performance hit when using GPU decompression, it's more of a Nvidia bug. The actual advantages it brings even if so are still in question mind you, there's no CPU measurements taken with DS on/off in this article which would have been helpful.

So potentially, when working as it should, GPU decompression could be giving a little boost to CPU performance with no cost to GPU performance, at least with the load in this game. Maybe that's why there never was a CPU/GPU toggle to begin with, Nixxes felt there wasn't a point as there wasn't a downside when they were testing it initially.
I think it's almost certain they arent.
 
I think it's almost certain they arent.

I think that's likely yes, but even if there isn't a driver hook by the IHV into their most optimized (cough) solution like RTX I/O, won't any other GPU will have the decompression done by DirectCompute by default? The way I read it is that that's the fallback, the decompression will still happen on the GPU if DS 1.1+ is used.

Anandtech said:
DirectStorage, in turn, will be implementing GDeflate support in two different manners. The first (and preferred) manner is to pass things off to the GPU drivers and have the GPU vendor take care of it as they see fit. This will allow hardware vendors optimize for the specific hardware/architecture used, and leverage any special hardware processing blocks if they’re available. All three companies are eager to get the show on the road, and it's likely some (if not all) of them will have DirectStorage 1.1-capable drivers ready before the API even ships to game developers.

Failing that, Microsoft is also providing a generic (but optimized) DirectCompute GDeflate decompressor, which can be run on any DirectX12 Shader Model 6.0-compliant GPU. Which means that, in some form or another, GDeflate will be available with virtually any PC GPU made in the last 10 years – though more recent GPUs are expected to offer much better performance.
 
Last edited:
Would be very interesting to learn if Radeon is actually using GPU decompression in this when DS is enabled. If they are, and they show no difference, then that's kinda good news - it means there's not a rendering performance hit when using GPU decompression, it's more of a Nvidia bug.

I'm not sure it's that simple as there are other GPU architectural as well hardware/software stack differences between Nvidia and AMD in this case.

For instance Resizable BAR (ReBAR) in theory could have some sort of interaction with DirectStorage and GPU decompression. ReBAR is defaulted on I believe with AMD GPUs (at the very least on newish AMD platforms) while it's not with Nvidia (manual driver whitelist regardless of hardware) as the impact on the latter is mixed (positive/negative). I'm reading some anecdotal reports that forcing ReBAR on might affect performance (both positive and negative) as well cause stability issues. Which is why I'm interested in some more documented tests on this, which could also mean broader potential implications going forward on the Nvidia side for the time being at least.

Not to mention other potential implications, such as then the impact of having a AMD CPU and GPU vs. Intel CPU and GPU.
 
I think that's likely yes, but even if there isn't a driver hook by the IHV into their most optimized (cough) solution like RTX I/O, won't any other GPU will have the decompression done by DirectCompute by default? The way I read it is that that's the fallback, the decompression will still happen on the GPU if DS 1.1+ is used.
I imagine the generic Microsoft decompressor has not yet been supplied and AMD doesn't have their own optimized solution yet.
 
Ok, after just playing around and making more direct comparisons to the PS5 (made more difficult by the fact the PS5 version doesn't have a manual save and just saves the last 10 auto slots, argh) - it appears the 'fixed' LOD you get by resetting the DLSS settings may not reflect what it's supposed to be either.

There are some cases where surfaces on the PS5 seem to have slightly more detail, and resetting the DLSS solves it. The issue though is that you get abnormally sharp LOD's in other areas that differ far more from the ones on the PS5 than they are similar. On the whole, the image of the PC after a fresh boot is far closer to the LOD of the PS5 it seems, with some minor exceptions.

What seems to be happening is that while resetting the DLSS settings may correct some minor mipmap/lod issues in certain areas, it also may be pulling in incorrect mips for other objects - one's intended to be viewed when your camera is 2ft in front of them instead of 50. This would explain on the mining planet for example, how there's suddenly far more specular aliasing with DLSS when it's reset - it behaves basically like a too aggressive negative lodbias setting. It doesn't make much sense for 12GB to be swamped without RT, so the wrong mips being pulled in would explain why the engine shits the bed at that point.

So basically it just seems to be streaming bugs and the usual settings change->engine goes haywire thing. So uh yeah, maybe ignore that 'reset your DLSS settings after every game load' recommendation. 😬
 
Last edited:
Ok, after just playing around and making more direct comparisons to the PS5 (made more difficult by the fact the PS5 version doesn't have a manual save and just saves the last 10 auto slots, argh) - it appears the 'fixed' LOD you get by resetting the DLSS settings may not reflect what it's supposed to be either.

There are some cases where surfaces on the PS5 seem to have slightly more detail, and resetting the DLSS solves it. The issue though is that you get abnormally sharp LOD's in other areas that differ far more from the ones on the PS5 than they are similar. On the whole, the image of the PC after a fresh boot is far closer to the LOD of the PS5 it seems, with some minor exceptions.

What seems to be happening is that while resetting the DLSS settings may correct some minor mipmap/lod issues in certain areas, it also may be pulling in incorrect mips for other objects - one's intended to be viewed when your camera is 2ft in front of them instead of 50. This would explain on the mining planet for example, how there's suddenly far more specular aliasing with DLSS when it's reset - it behaves basically like a too aggressive negative lodbias setting. It doesn't make much sense for 12GB to be swamped without RT, so the wrong mips being pulled in would explain why the engine shits the bed at that point.

So basically it just seems to be streaming bugs and the usual settings change->engine goes haywire thing. So uh yeah, maybe ignore that 'reset your DLSS settings after every game load' recommendation. 😬


Great find, yeah this game really doesn't behave well with live setting changes.
 
In mid 2023 it’s just darwinism for anyone buying an 8GB GPU.
yup, that's why I got a 16GB GPU almost a year ago.


In fact it's quite sad that a small handheld PC like the GPD Win 4 and the likes can have up to 3 to 4 times more VRAM available for games -the 32GB version- than most flagship GPUs on the market. :rolleyes: :sneaky:
 
Last edited:
yup, that's why I got a 16GB GPU almost a year ago.


In fact it's quite sad that a small handheld PC like the GPD Win 4 and the likes can have up to 3 to 4 times more VRAM available for games -the 32GB version- than most flagship GPUs on the market. :rolleyes: :sneaky:
Indeed it is. The PC platform is outdated and every damn port released in the last couple of months proves this.

The PC needs to switch to unified memory ASAP. I know it's hard to get everyone on board, but it needs to happen.
 
The crippled BW of unified is better than not??
The memory bus design and RAM configuration dictates the bandwidth. Apple's M2 MAX has 800Gbps memory bandwidth, and that's a chip not even designed to drive high-end graphics. :runaway:
 
Indeed it is. The PC platform is outdated and every damn port released in the last couple of months proves this.

The PC needs to switch to unified memory ASAP. I know it's hard to get everyone on board, but it needs to happen.

  • Performance decrease on the CPU side thanks to higher latency memory
  • Higher cost for the same amount of memory (assuming GDDR5)
  • Less bandwidth available to the GPU due to memory contention (at least at the high end where memory speeds are already pushed to their limits, or higher cost for the same effective memory bandwidth in the lower end tiers)
  • Upgrading is more complex and potentially more constrained
And for what? Interconnect standards between CPU and GPU are more than fast enough to handle modern workloads and we're on the verge of doubling that bandwidth with the move to PCIe5. Granted there is an overhead involved in having to shuffle data between CPU and GPU memory but thanks to advances like resizable bar, GPU upload heaps, and AMD's smart access storage, separate memory pools can be treated more like a single memory pool than ever without the need to swap as much data between those pools. Granted it may take some time for these new features to be used widely and consistently, but likely far less time than it would take for the entire PC market to transition to a unified memory architecture, and with none of the associated disadvantages listed above.

Also I don't see any reason to associate a few recent bad ports with the PC's NUMA architecture. I don't believe there has been any evidence of such a link.
 
The memory bus design and RAM configuration dictates the bandwidth. Apple's M2 MAX has 800Gbps memory bandwidth, and that's a chip not even designed to drive high-end graphics. :runaway:
That's not what we're getting with the unified RAM configs being talked about though. GPD Win 4 isn't challenging any 8GB GPU for BW. My point really being, BW is less a limiting factor than capacity for current graphics? What's the ball-park 'good enough' BW per resolution or BW:capacity ratio?
 
Great find, yeah this game really doesn't behave well with live setting changes.

Christ, I'm ping-ponging back and and forth on this like the hamster on V from The Boys.

Remembered NvInspector, and -LODBias, which is the go-to for games that fuck up their mip settings for DLSS and base them off the internal res and not the output. So set it to -1.5 for DLSS Performance, and...boom. Upon fresh game load, same quality as resetting the DLSS - except this time, the vram doesn't skyrocket, basically no increase at all. Game is perfectly playable with VH textures displayed in their proper full glory, no stuttering.

So, for my 150th variant of "What I think is happening": Nixxes didn't forget about the proper lodbias for DLSS, but it's not set properly on game load. Resetting DLSS does indeed fix it, but that just reveals the settings bug that has affected the game since launch, it causes the streaming system to go haywire. So the game can handle the VH textures just fine, displayed at their proper LOD in 4K, on 12GB cards - their vram cost at the proper lod is not the issue.

However, there are two caveats to forcing the lodbias: There's a performance cost to this as it is rendering more detail than before (mostly in areas with vegetation) - even more than the dlss reset trick, albeit not with the vram thrashing so it's still a win. While the detail wrt object lod and textures in most areas seems identical to the dlss fix, one weird addition is reflectivity - it's massively increased for surfaces that employ screen space reflections, or at least with some materials. Every metal surface in the game now becomes buffed to a mirror finish. It's actually kind of cool in some cutscenes as you can see so much detail of the world reflected in Clank, but I don't think it's intended as this doesn't happen with native TAA/DLAA or the PS5. Maybe that's the reason this is more costly than the DLSS reset fix which doesn't do this.

As for why this wasn't picked up before, the degree with which this is noticeable depends highly upon your starting internal res, screen size and a PS5 nearby to use as a base for comparison. As the game was reviewed mostly on higher-end cards, they were using DLSS Quality or DLAA, and there it's very hard if not impossible to spot as the LOD for 1440p and up is going to be higher than 1080p for DLSS perf. For DLSS performance, even viewing stuff like the single objects in the model viewer in the game can make it evident, but only if you have either a PS5 to flip to on the same display, or do side by side comparisons as I show below.

The wrong lod bias on boot does kind of resemble the PS5 in performance mode more closely for scenes with vegetation like I said, but now I think that's just a case of that mode on the PS5 being sub-4k often, and the weakness of IGTI compared to DLSS when resolving fine detail. The PS5's 4k Fidelity mode more closely resembles the LOD of DLSS with this fix than its performance mode.

The enhanced shimmering in some scenes was also likely partly to me keeping the default sharpening at 10, which helps the textures a little bit with the wrong lod bias set at boot. When it's fixed though, it's aggressiveness just exacerbates DLSS trying to deal with the increased detail. Cutting it in half or off helps.

PS5 vs PC DLLS Perf (fresh boot)

PS5 vs PC DLLS Perf (DLSS reset, lod fixed)

4K Native AA vs. DLSS Perf (dlss reset, lod fixed)

DLSS Perf with DLSS Reset Fix vs. DLSS Perf with LOD Bias Fix on Fresh boot (look how shiny! Also note the vram decrease with the lod bias fix vs. the dlss reset)

Showing that it's not just distant detail that's affected:

DLSS Perf vs DLSS reset fix, single object in model viewer


1691162460900.png
 
Apple's M2 MAX has 800Gbps memory bandwidth, and that's a chip not even designed to drive high-end graphics. :runaway:
M2 Max has just 400GB/s bandwidth, using a gigantic 512-bit bus and LDDR5 (to lower power consumption). M2 Ultra achieves 800GB/s using an enormously gigantic 1024-bit bus, totally crazy. The cost itself is just insanely prohibitive. Only Apple can do this as they put these SoCs into very expensive devices.
 
Back
Top