Digital Foundry Article Technical Discussion [2025]

DigitalFoundry interview with Naighty Dog and Nixxes about The Last Of Us 2 Port.


It had been a while since I've looked at it, but I remembered Nathan Gregory's (Naught Dog) Game Engine Architecture book had a section on fibers, and when I checked it out I was surprised it was referring to Windows fibers vs threads. I wonder what it is about the Windows implementation of fibers that made it unsuitable to port from PS4/5.

The best I can think of is that maybe there are less blocking system calls on Playstation.
 
Rather underserved self praise for a port where a 3060 is unable to double PS4 performance at even lower settings.
It can if paired with a fast enough CPU. Stolen from Neogaf.



KqR1XMa.png


Z1phL5i.png


The optimized settings are very similar to the Pro settings in the DF video, I would even say higher overall. Worth noting that Motion Blur is completely disabled here, but on the flip side, Volumetrics are Medium instead of Low, Textures are Very High instead of High, AF is 16x, and it uses DLAA which has a performance penalty over TAA (but Alex also uses DLAA).
 
I would have to see more tests. In GameGPU, the difference is only 4%.
Here are more tests at native 4K, the 5080 is again 15% faster than 9070XT, the 3080 also ties the 6900XT (even though it should be slower).


At any rate, part 2 runs way better than part 1, and doesn't need as much VRAM. In fact, it requires way less VRAM than ever before, suggesting that part 1 had serious issues, Nixxes did indeed help fix them in part 2.
 
Rather underserved self praise for a port where a 3060 is unable to double PS4 performance at even lower settings.
Interesting bits, Async Compute on PC doesn't level the playing field with PS5, Async is way faster on PS5.

it doesn't necessarily level the playing field with the PS5. Certainly using async compute with the PS5, where you know exactly what the hardware is and what things pair well together, and there's less driver in the middle, we've always found it to be a lot more beneficial on consoles than it is on PC, unfortunately.

Also spin locking threads is more expensive on PC, PS5 is cheaper, and also uses more of the CPU to help with performance, they couldn't do that on PC as it will cause the game to become CPU limited, reducing performance. In general CPU performance remains unsolved on PC to this day, so achieving equal performance remains difficult.

One thing is the spin locking. That is cheap on the console, but on Windows, that can be very problematic for performance. That's one thing that we addressed for this port in particular.
on PS5, no one cares what the CPU utilisation is. The job system was originally constructed to just always use everything, every second, and so moving that to PC, Nixxes was super helpful in helping to optimise utilisation as people on PC do care about it. It was challenging to reduce that as we never had to worry about it on console.

I would say the praise is relatively justified, they improved over the part 1 release tremendously, part 2 runs way faster, uses way less VRAM, uses less CPU resources, and loads much much faster. There are limits to what they can do to an engine that is extremely customized to PS5's strengths: gigantic shader/material permutations, shared memory pool, faster async compute, cheaper CPU multi threading cost .. etc.
 
It can if paired with a fast enough CPU. Stolen from Neogaf.



KqR1XMa.png


Z1phL5i.png


The optimized settings are very similar to the Pro settings in the DF video, I would even say higher overall. Worth noting that Motion Blur is completely disabled here, but on the flip side, Volumetrics are Medium instead of Low, Textures are Very High instead of High, AF is 16x, and it uses DLAA which has a performance penalty over TAA (but Alex also uses DLAA).
That is not even the same view as in my video... So it is not debunking anything. Different views in that forest run differently, better and worse.

The concept that I was CPU-Limited there is rather presumptuous, the GPU was 9X% utilised and the CPU was not being tapped out, unlike in other scenes. When i get to my PC I will post the OSD statistics which I otherwise blurred out in that video section for aesthetic purposes.

Why not try lining up the camera at the same location and do that test again?


Regarding optimisation in general in kind of have a bugbear about this. So let us at face value say, yeah you are getting less GPU utilisation on PC, of course you are. But you also have *other* transformational performance enhancements that can be added but probably are not added because of Budget and time table reasons. These Ports are done with minimal resources for Maximum ROI. For example: How is culling done? Is it using mesh shaders which GPUs have had on PC since 2018?

No, they won't add that even if it could greatly boost triangle throughput and perf.
 
Last edited:
That is not even the same view as in my video... So it is not debunking anything. Different views in that forest run differently, better and worse.
He makes an entire pass through the forest. You can watch it here. Timestamped.


It never drops anywhere near as low as 57fps and certainly more than doubles PS4's performance.

The concept that I was CPU-Limited there is rather presumptuous, the GPU was 9X% utilised and the CPU was not being tapped out, unlike in other scenes. When i get to my PC I will post the OSD statistics which I otherwise blurred out in that video section for aesthetic purposes.
Well, could you try with a different setup to confirm? Say, throw the 3060 into the 9800X3D machine and see if the results differ.
Why not try lining up the camera at the same location and do that test again?
Oh, not my video. Was just browsing neogaf and they were talking about it and I found the link. I thought it was interesting that using very similar settings, that video was getting far better performance. The one thing that stands out is the 13600K vs Ryzen 5 3600. There's also the memory configuration to consider. It could be more than just the Ryzen 3600 holding things back. However, I only have one PC and can't really test. The best I can do is compare videos, but it's tricky because the computers can differ a lot and the benchmarks passes are never 1 to 1.
Regarding optimisation in general in kind of have a bugbear about this. So let us at face value say, yeah you are getting less GPU utilisation on PC, of course you are. But you also have *other* transformational performance enhancements that can be added but probably are not added because of Budget and time table reasons. These Ports are done with minimal resources for Maximum ROI. For example: How is culling done? Is it using mesh shaders which GPUs have had on PC since 2018?

No, they won't add that even if it could greatly boost triangle throughput and perf.
There's certainly time and budget constraints, but I was looking around online and found quite a bit of performance variations. Regarding your 9800X3D video limited to 85fps, there's also this:


Also saw on the Steam forum that even people with Intel CPUs sometimes experience the same issue. In another video, the 5090 and 9800X3D were running at 130fps but then crashed to 85fps, just like yours.
 
Stuff and more
Sure, on monday I will drop the RTX 3060 into the Ryzen 7 9800X3D PC. I would be surprised to see major differences.

For example as a way we look and see whether or not a game is GPU limited on PC is by looking at GPU utilisation. The scenes where I show sub 60 show very high GPU utilisation (98%), also take a look at the frame-time graph in those areas when I am GPU limited when Vsync is off. Notice how the frame-time graph is super flat. Also look at the CPU utilisation, it is always below 80%. Given such statistics, that is where we typically see what a what a GPU limit is like.

re1s6BE.jpeg

yXuo5wb.jpeg

n288VKQ.jpeg

iRL9ix2.jpeg

62el1rr.jpeg

OWf4KBC.jpeg


Now take a look at a CPU limit for statistical differences:

kAsSXv1.jpeg


Notice how the GPU utilisation here in the shot above is not 98%+ - it is 74%. So the statistics tell us that it is being underutilised according to Windows. Then look at the CPU utilisation, each thread is 92%+. Then look at the frame-time graph, it is no longer a smooth line, it is erratic. That is typically what CPU limits look like.

A GPU limit means I could throw in a bigger GPU and get a better frame-rate. A CPU limit means I could throw in any GPU and that frame-time graph and even raw frame-rate would be mostly the same.
He makes an entire pass through the forest. You can watch it here. Timestamped.
He actually does not make an entire pass through the forest, he is importantly missing the section from earlier on where my footage is from. The forest starts in a confined path in the sunlight, not in the middle under the trees where it is dark. It is also important to note that his RTX 3060 is typically at 2100 Mhz. That is a suped up after market model or one that is manually overclocked, mine is a stock model (reference) going to 1920 Mhz while playing. His GPU is ~10% faster than mine in terms of core speed and who knows how much faster in terms of memory clock (his is not listed).

I will upload the full play through of that area that I did and showcased in my video to show you exactly what I saw on the Ryzen 5 3600 + RTX 3060.
 
Last edited:
Here is an upload of the full playthrough of that section @Below2D. This is the playthrough of that section that I highlighted in my review. Take note of my GPU clock vs. his, also take note of my GPU utilisation and CPU utilisation for the sections that are below 60 fps. If you watch, you can also see a moment or 2 where the game does a load and causes a frame-time hitch, followed by high CPU utilisation (something I talk about in my video).

I have no problem at all uploading performance videos for people at beyond3d since you are good folks here, but one thing I should say is that I am not trying to deceive an audience or anything. If I record something, I write and talk about it in a video in a way to describe it, while highlighting what I am saying visually on screen. My script for my video talking about 3060 performance says the following:

"When punching in the settings that give you the greatest equivalency to base PlayStation 4s quality you might be surprised what level of GPU and CPU are necessary just to even double that level of performance. Playing through the game on an RTX 3060 targetting 1080p and 60 fps it was not hard to find many scenes that are below 60 fps on that GPU due to the GPU being tapped out. It could hold 60 over extended periods for sure in a nice way, but it definitely faltered more often than one would hope in reaching 60 fps at a mere 1080p."

I purposefully say that my RTX 3060 (which is a refrence model) can hit 60 fps for large stretches of time, but it can also commonly enough fall below 60 fps while GPU limited in scenes. You should assuredly see that in the video above.

That persons video of an RTX 3060 at 2100 Mhz is not at all debunking my statement or the footage I present on screen.
 
Last edited:
Here is an upload of the full playthrough of that section @Below2D. This is the playthrough of that section that I highlighted in my review. Take note of my GPU clock vs. his, also take note of my GPU utilisation and CPU utilisation for the sections that are below 60 fps. If you watch, you can also see a moment or 2 where the game does a load and causes a frame-time hitch, followed by high CPU utilisation (something I talk about in my video).

I have no problem at all uploading performance videos for people at beyond3d since you are good folks here, but one thing I should say is that I am not trying to deceive an audience or anything. If I record something, I write and talk about it in a video in a way to describe it, while highlighting what I am saying visually on screen. My script for my video talking about 3060 performance says the following:

"When punching in the settings that give you the greatest equivalency to base PlayStation 4s quality you might be surprised what level of GPU and CPU are necessary just to even double that level of performance. Playing through the game on an RTX 3060 targetting 1080p and 60 fps it was not hard to find many scenes that are below 60 fps on that GPU due to the GPU being tapped out. It could hold 60 over extended periods for sure in a nice way, but it definitely faltered more often than one would hope in reaching 60 fps at a mere 1080p."

I purposefully say that my RTX 3060 (which is a refrence model) can hit 60 fps for large stretches of time, but it can also commonly enough fall below 60 fps while GPU limited in scenes. You should assuredly see that in the video above.

That persons video of an RTX 3060 at 2100 Mhz is not at all debunking my statement or the footage I present on screen.
Thanks. This was an hypothesis I had based on the video I saw that seemed to perform significantly better than your GPU, but yours seem fully tapped out. If you can still test it on your high-end system, it'd be appreciated.

What are your thoughts on the ReBar issue? This reminds me of Spider-Man back when it was first ported. A lot of people reported much better performance and CPU scaling by disabling it and the video I linked above showed this. This doesn't seem to happen all the time though, so I'm not sure if ReBar is the real culprit.
 
Interesting bits, Async Compute on PC doesn't level the playing field with PS5, Async is way faster on PS5.



Also spin locking threads is more expensive on PC, PS5 is cheaper, and also uses more of the CPU to help with performance, they couldn't do that on PC as it will cause the game to become CPU limited, reducing performance. In general CPU performance remains unsolved on PC to this day, so achieving equal performance remains difficult.




I would say the praise is relatively justified, they improved over the part 1 release tremendously, part 2 runs way faster, uses way less VRAM, uses less CPU resources, and loads much much faster. There are limits to what they can do to an engine that is extremely customized to PS5's strengths: gigantic shader/material permutations, shared memory pool, faster async compute, cheaper CPU multi threading cost .. etc.
Async has always been more beneficial for console as knowing what resources will be underutilized is of great importance. This can’t be known when having to code for many combinations in an API that is much more abstracted from the hardware. No one expected PC to match console performance efficiency, but this is a very near PS5 level GPU failing to double PS4 performance.

Does it run way faster? Performance seems quite comparable. It’s improved in other areas but actual framerate performance seems unchanged.
 
It had been a while since I've looked at it, but I remembered Nathan Gregory's (Naught Dog) Game Engine Architecture book had a section on fibers, and when I checked it out I was surprised it was referring to Windows fibers vs threads. I wonder what it is about the Windows implementation of fibers that made it unsuitable to port from PS4/5.

The best I can think of is that maybe there are less blocking system calls on Playstation.
On PS consoles, custom C++ fibers support is implemented at the OS/kernel level with the intent to avoid context switching overhead ...

With Windows, you could only implement user-space C++ fibers by making use of the now deprecated user-mode scheduling feature to write your own task scheduler. The problem on Windows is that the OS itself doesn't understand the concept of fibers so OS uses threads to apply the pre-emptive scheduling model for them. Yielding/Suspending the execution of a fiber (which are backed by OS threads there) will cause context switching!
 
The price would have been determined well before the tariffs announced last week, and while it may have been chosen in anticipation of tariffs Nintendo clearly didn't expect the rates to be that high, hence the pre-order delay. US prices will likely be even higher than $450.
 
If it's 8nm and a LCD without true HDR, 450$ is a bit high. I wonder if the target was 399$ and the recent economic conditions bumped it by 50$
They purposely difersified their production away from China to not be so reliant in a single production area post Covid (2019). This was the price point without tariffs.

They didn’t expect Vietnam and Cambodia to be hit so hard which is where they diversified to.
 
If it's 8nm and a LCD without true HDR, 450$ is a bit high. I wonder if the target was 399$ and the recent economic conditions bumped it by 50$

The price is what Nintendo would expect after such a huge success.

It's worth remembering the 3DS that arrived for $250, when the DS cost $150 and the bad repercussion of the price made Nintendo lower the price a few months later and create a program called Ambassador, to give games to those who bought the console at a high price.

Or the Wii-U that came in at $350 and seemed expensive compared to the PS4 at $400.
 
Back
Top