Ratchet & Clank technical analysis *spawn

Is GPU decompression enabled? I'd assume so, and If so that could be the part of the speed difference on that 990, we'd expect more stutters from PCIE/DDR saturation, but a straight smooth drop from decompression usage. It's too bad the PCIE stats aren't some detailed graph.

AMD seems to know what it's doing with trying to copy Apple's big SOC strategy asap. The first "big" GPU/CPU combo, well modest sized but anyway, come out next year with RDNA3.5 (apparently around 40CU for an APU). And they're retiring the AM5 socket ASAP, claiming minor refreshes count as supporting multiple generations so they can "technically" have fulfilled their promised gen support, to get a much bigger and more powerful socket out ASAP as well. This saves money for them, means customers buy their CPU/GPU combo direct from AMD (suck it board partners, and most of all Nvidia/Intel), and gets them the same super popular SOC format that Apple has with their MX line.

But relevant to this discussion, it also delivers the numerous advantages an APU has over separate GPU/CPU setup. Such as CPU and GPU being able to read/write to the same memory at the same speed, meaning this decompression could happen on a CPU core and then write directly to GDDR for the GPU to use without any PCIE bandwidth restrictions or latency issues. With the much more advanced arch from Zen 2 modern CPUs can certainly spare a core while still running games twice as fast or more than a PS5.
 


😊



Compusemble's video also shows what I was curious about - high framerate situations, and still - DirectStorage loses. They have a 7700X yes, but it's not that ridiculously high end of a CPU. Pretty surprising.

On the plus side shining the spotlight on this will likely drive improvements, as others have said this is the first implementation of GPU decompression in a shipping game, I'm sure there is a lot of room to optimize this further. We would need longer tests across various systems to really come to a firm conclusion as to the worth of GPU decompression overall and even with just this game at this early stage, but it is kind of a funny PR snafu that a narrative being spread now is "Hey the best way to improve performance on this game is to disable that technology that's been hyped up for a year". :)

Edit: Yowza

 
Last edited:
To be fair, architecturally Arc is much better at RT than RDNA2/3.
the game runs natively on a RDNA2 machine, which makes this even more confusing considering that AMD RDNA2/3 GPUs have some advantages over the PS5 graphics logic.

Also, RT is more and more essential now. It's one of those effects where your GPU does all the job and doesn't need external hardware to look good. I mean, unlike HDR where you need to have a HDR 10000 (whatever) TV or monitor to appreciate its benefits, same with VRR and stuff.
 
Game is a beaut. They just need to polish up some issues and it will be really great.

20230729182405-1.jpg


20230729183841-1.jpg


20230729182009-1.jpg


20230729182723-1.jpg


20230729183849-1.jpg
 
the game runs natively on a RDNA2 machine, which makes this even more confusing considering that AMD RDNA2/3 GPUs have some advantages over the PS5 graphics logic.

Also, RT is more and more essential now. It's one of those effects where your GPU does all the job and doesn't need external hardware to look good. I mean, unlike HDR where you need to have a HDR 10000 (whatever) TV or monitor to appreciate its benefits, same with VRR and stuff.
Well Nixxes/Insomniac can't exactly use AGC on PC like they can on PS5. D3D12 on PC is somewhat far off from consoles in terms of raytracing features and to make things worse they use raytracing pipelines instead of inline raytracing the latter of which is easier to implement in their drivers compared to the former ...
 

😊



Compusemble's video also shows what I was curious about - high framerate situations, and still - DirectStorage loses. They have a 7700X yes, but it's not that ridiculously high end of a CPU. Pretty surprising.

On the plus side shining the spotlight on this will likely drive improvements, as others have said this is the first implementation of GPU decompression in a shipping game, I'm sure there is a lot of room to optimize this further. We would need longer tests across various systems to really come to a firm conclusion as to the worth of GPU decompression overall and even with just this game at this early stage, but it is kind of a funny PR snafu that a narrative being spread now is "Hey the best way to improve performance on this game is to disable that technology that's been hyped up for a year". :)

Edit: Yowza

FWIW, I don't see nearly as big differences on Very High settings. Like 95fps vs 100fps on average and margin of error type on the minimums.

It also seems to need more CPU grunt. I didn't test this thoroughly though.

9ZAatOe.png
 
So, this seems to only affect NVIDIA GPU which reminds me, does RTX IO work on AMD GPUs? I assume it does not and AMD simply uses DirectStorage 1.3 GPU decompression.

F2O2t4pWYAIi_kg.png


Second image is from a Discord channel. I thought it was CapFrameX but it's not and I don't have the source.
That image comes from twitter user Sebastian. He's been testing the game and contributing to the discussion alongside CapFrameX and Compusemble.

 
Also interesting to note that getting rid of the DirectStorage dll stops the crashes that happen upon exiting the game after manually enabling ReBar using NVIDA Inspector.
 
That's actually very interesting and I stand corrected on the "PCIe 3.0 being enough" argument - at least in that portal segment. It seems to be exceeding the read speed of PCIe 3.0 on several occasions. Although notably it's no-where near PCIe 4.0 limits. This could certainly impact Turing class GPU's during that sequence and result in some pretty heavy frame drops. That said, this sequence if far from representative of the games average gameplay where I doubt PCIe 3.0 would prove to be a limiting factor. So this is quite a good example of why using that sequence to judge the average performance of Turing GPU's in this game - especially using 1% lows is a pretty terrible idea.



I'm responding to this one here to keep the R&C technical discussion in one place.

So what Rich says here is obviously correct. But what I want to know is where you think the bottleneck on PC lies?

The DS benchmarks are not "drive speed" benchmarks. They are end to end data throughput benchmarks. They measure drive speed, decompression speed, and indirectly, bus capacity.

What they don't really measure is CPU limits because those benchmarks are very light on the CPU.

That's why I initially suspected the CPU might be the bottleneck on PC. But we've now seen, thanks to Alex that CPU makes no difference beyond a base performance level. Neither does GPU performance (i.e. decompression speed), RAM speed, PCIe width etc...

So where do you think the bottleneck is? What component of the PS5 IO chain do you specifically think is faster than what the PC is able to deliver? Personally I have no idea now. All the major components seem to have been ruled out, but I'm not convinced yet that it's a cap imposed in the game engine. If it were - it could have been done a lot better.
 
I think it's safe to assume that the Direct Implementation isn't as good as initially thought and there's work to do both on the game side and on the GPU driver side.
 
That's actually very interesting and I stand corrected on the "PCIe 3.0 being enough" argument - at least in that portal segment. It seems to be exceeding the read speed of PCIe 3.0 on several occasions. Although notably it's no-where near PCIe 4.0 limits. This could certainly impact Turing class GPU's during that sequence and result in some pretty heavy frame drops. That said, this sequence if far from representative of the games average gameplay where I doubt PCIe 3.0 would prove to be a limiting factor. So this is quite a good example of why using that sequence to judge the average performance of Turing GPU's in this game - especially using 1% lows is a pretty terrible idea.



I'm responding to this one here to keep the R&C technical discussion in one place.

So what Rich says here is obviously correct. But what I want to know is where you think the bottleneck on PC lies?

The DS benchmarks are not "drive speed" benchmarks. They are end to end data throughput benchmarks. They measure drive speed, decompression speed, and indirectly, bus capacity.

What they don't really measure is CPU limits because those benchmarks are very light on the CPU.

That's why I initially suspected the CPU might be the bottleneck on PC. But we've now seen, thanks to Alex that CPU makes no difference beyond a base performance level. Neither does GPU performance (i.e. decompression speed), RAM speed, PCIe width etc...

So where do you think the bottleneck is? What component of the PS5 IO chain do you specifically think is faster than what the PC is able to deliver? Personally I have no idea now. All the major components seem to have been ruled out, but I'm not convinced yet that it's a cap imposed in the game engine. If it were - it could have been done a lot better.

PS5-Road-to-Kopie-2.jpg
Here is everything theres to it. Those 30GB/s synthetic benchmarks were never going to prove anything.

Cerny and his team figured out that ALL THOSE STEPS in the slide are absolutely necessary to garantee a I/O datastream of 22gb/s .

(((oh yeah i know 🙄 8-9gb/s is the more known number from that presentation, but since he explicitly states that the kraken units output is 22gb/s - i think the rest of the I/O block is able to keep up )))
Otherwise they would have excluded something to save silikon space...

The Road to PS5 talk was no marketing strategy ( although it might anyway functions like one though).
It was a honest presentation about PS5s design philosophy and the specific problem they went to solve with it.

Meant for developers to inform them about the paradigm change possible from now on...
 
Last edited by a moderator:
I think it's safe to assume that the Direct Implementation isn't as good as initially thought and there's work to do both on the game side and on the GPU driver side.
Seems like an issue with RTX IO than DirectStorage. AMD GPUs do not seem to suffer from the same problems.
 

So, this seems to only affect NVIDIA GPU which reminds me, does RTX IO work on AMD GPUs? I assume it does not and AMD simply uses DirectStorage 1.3 GPU decompression.

F2O2t4pWYAIi_kg.png


Second image is from a Discord channel. I thought it was CapFrameX but it's not and I don't have the source.
Sorry for double post but that 10min editing limit is so stupid...

Abou the topic:
That's rather interesting... AMD might be in trouble here because it might have same tech like PS5 GPU but lacks both PS5s I/O AND Nvidias RTX I/O

Maybe they cannot handle rendering and doing GPU decompression very well - especially Raytracing...
Even RTX I/O brings additional burden to Game rendering.
 
Last edited by a moderator:
View attachment 9309
Here is everything theres to it. Those 30GB/s synthetic benchmarks you all jacked off to were never going to prove anything.

Cerny and his team figured out that ALL THOSE STEPS in the slide are absolutely necessary to garantee a I/O datastream of 22gb/s .

(((oh yeah i know 🙄 8-9gb/s is the more known number from that presentation, but since he explicitly states that the kraken units output is 22gb/s - i think the rest of the I/O block is able to keep up )))
Otherwise they would have excluded something to save silikon space...

The Road to PS5 talk was no marketing strategy ( although it might anyway functions like one though).
It was a honest presentation about PS5s design philosophy and the specific problem they went to solve with it.

Meant for developers to inform them about the paradigm change possible from now on...

So, how many GB/s do you think a PC with Direct storage can manage before the I/O side of things start being a really big limit?
 
So, how many GB/s do you think a PC with Direct storage can manage before the I/O side of things start being a really big limit?
Don't know - we need to wait for the next wave of ps5 Exclusives and them being ported to PC...
665d9778-ps5-playstation-5-xbox-series-x-anatomie-vr4player-002-1140x641.jpg
If any future PS5 Exclusive will even remotly mimic what that slide suggests then this will be the big test for PC architecture / Direct Storage..
Continuously heavy I/O streaming demands ... Not only some portal Stuff...
That will bring the truth out. Also about a topic nobody thinks about right now.

Thermals - PS5s SSD and the Kraken Controller is encompassed in PS5s cooling solution.

On PC it is up to the user and if not properly done a SSD might throttle down ... That would be a problem then...
 
A PS5 port is likely not the easiest way to showcase DS, as you're coming from a different api/io system.
 
Back
Top