Digital Foundry Article Technical Discussion [2025]

How is culling done? Is it using mesh shaders which GPUs have had on PC since 2018?

No, they won't add that even if it could greatly boost triangle throughput and perf.
I'm not going to broadly disagree that more work could be done to make some of these ports better. In fact that should be obvious from the differences in part 1 and part 2, yes? They clearly did more work porting part 2 and deserve praise for that. But I think we PC gamers (broadly generalizing... many of us play things on both these days) also need to avoid the temptation to fall into the same sort of "moving target" comparisons that we criticize console gamers for. It's very easy to throw out notions like "oh this DirectX feature or IHV-specific API would make things so much better!" without really having good data for a given game to back that up.

DirectStorage is actually a perfect example here... based purely on the advertising from Microsoft and others people got really excited and somehow got the idea that this was going to revolutionize PC gaming and so on, but I think it's pretty fair to say that for the few people that have used it (such as here), it doesn't exactly show off anything particularly revolutionary that can't be done without it.

Another example is sampler feedback and related streaming stuff; this is the stuff that gets marketed to consumers as to why they should get a new GPU or operating system or so on but in reality we are pretty deep into the point of diminishing returns on many of these API/hardware features (raytracing aside, since so much of that is still hardware black box on PC).

Mesh shaders specifically is really a tougher one to argue because it is a fundamental change to the content pipeline. You can't just "implement mesh shaders" and get a benefit. The important part of such a shift is the content pipeline changes that restructure meshes into appropriate clusters with associated LOD logic and so on. The actual mesh shader "hardware" feature is way less relevant than those changes, even if you implement them without mesh shaders at all (such as previous GPU culling systems did, nanite on systems without mesh shaders, etc). So while it's easy to say "just add mesh shaders and your geometry culling will get so much better", we don't really know here that geometry is a huge bottleneck in the first place, and the ask is potentially a rather huge one depending on their current geometry pipeline.

I'd certainly be curious for someone to do a deeper dive into some of these frames to see where the time is going, but I don't know that I would broadly say this is a "minimal effort port" for something that came from a single-platform engine. Even games that target both Xbox and PS5 are much easier to port because they fundamentally have to deal with more than one platform in the first place. Single platform stuff can just make a lot of simplifying assumptions to the point that "porting" them can sometimes stray into territory more like writing a PS5 emulator than just optimizing some code.

None of this is to say that bringing criticisms is not useful; I think a lot of the improvements in part 2 were probably motivated by the criticisms of part 1 and presumably if PC ports continue to be a thing for the series future engine iterations will keep this in mind even if they come out first on the consoles. But I think it's worth giving some leeway to folks who are porting a single platform game that was never designed with ever running on anything other than a playstation in mind, long past the time where any amount of art or content pipeline iteration is realistic.
 
Last edited:
With Windows, you could only implement user-space C++ fibers by making use of the now deprecated user-mode scheduling feature to write your own task scheduler. The problem on Windows is that the OS itself doesn't understand the concept of fibers so OS uses threads to apply the pre-emptive scheduling model for them. Yielding/Suspending the execution of a fiber (which are backed by OS threads there) will cause context switching!
You can of course implement fibers entirely in user space and not involve the OS scheduler or UMS at all, which is how most people do it on PC. That said, there are some significant tradeoffs especially in C++, and it's not the sort of thing you could easily just slip in under an implementation that uses full OS-level thread/fiber contexts as a base. It's certainly reasonable for an engine to cite this as a pain point moving to PC from a single platform.

The busy-wait comment is a little weird to me because while it's somewhat less harmful on PS5 , there's still some amount of dynamic clocking there AFAIK, so you'd still at least want to use some sort of processor idle commands (ex. _mm_pause) in busy waits, even if you are not giving control back to the OS (maybe this comment was more targeted at PS4?). On PC you can do similar things although that still ties up the thread of course and appears to users like the CPU is "busy" even though it can be in a somewhat lower power state. Again I think it's a valid pain point, but it seems like the console/PC use cases can be somewhat closer to each other going forward than they necessarily were here.

Async compute is definitely not nearly as useful on PC or any non-fixed platform as you effectively have to statically schedule with fake barriers to get a good benefit, which is often actively harmful on non-fixed platforms. In theory tasks graphs will eventually provide a superior model, but we're seemingly quite a ways away from even tech demos demonstrating good wins there.
 
Last edited:
You can of course implement fibers entirely in user space and not involve the OS scheduler or UMS at all, which is how most people do it on PC. That said, there are some significant tradeoffs especially in C++, and it's not the sort of thing you could easily just slip in under an implementation that uses full OS-level thread/fiber contexts as a base. It's certainly reasonable for an engine to cite this as a pain point moving to PC from a single platform.

The busy-wait comment is a little weird to me because while it's somewhat less harmful on PS5 , there's still some amount of dynamic clocking there AFAIK, so you'd still at least want to use some sort of processor idle commands (ex. _mm_pause) in busy waits, even if you are not giving control back to the OS (maybe this comment was more targeted at PS4?). On PC you can do similar things although that still ties up the thread of course and appears to users like the CPU is "busy" even though it can be in a somewhat lower power state. Again I think it's a valid pain point, but it seems like the console/PC use cases can be somewhat closer to each other going forward than they necessarily were here.

Async compute is definitely not nearly as useful on PC or any non-fixed platform as you effectively have to statically schedule with fake barriers to get a good benefit, which is often actively harmful on non-fixed platforms. In theory tasks graphs will eventually provide a superior model, but we're seemingly quite a ways away from even tech demos demonstrating good wins there.

I'm really interested to see what was presented in the work graphs talks at GDC, but I imagine we're still years from really useful implementations for games.
 
I'd certainly be curious for someone to do a deeper dive into some of these frames to see where the time is going, but I don't know that I would broadly say this is a "minimal effort port" for something that came from a single-platform engine. Even games that target both Xbox and PS5 are much easier to port because they fundamentally have to deal with more than one platform in the first place. Single platform stuff can just make a lot of simplifying assumptions to the point that "porting" them can sometimes stray into territory more like writing a PS5 emulator than just optimizing some code.
I know some Sony games like using shader resource tables on consoles to create/copy resource descriptors on the GPU timeline but on the PC port of Ghost of Tsushima they do expensive emulation of said feature with SM6.6 dyanmic resources. However, that is still a regression compared to prior plans for D3D12 being able to offer the functionality for the GPU to create/copy resource descriptors before the idea was scrapped altogether!
 
I'd certainly be curious for someone to do a deeper dive into some of these frames to see where the time is going, but I don't know that I would broadly say this is a "minimal effort port" for something that came from a single-platform engine. Even games that target both Xbox and PS5 are much easier to port because they fundamentally have to deal with more than one platform in the first place. Single platform stuff can just make a lot of simplifying assumptions to the point that "porting" them can sometimes stray into territory more like writing a PS5 emulator than just optimizing some code.
@Mamoniem on twitter would do an amazing job of this. He's done a few Sony games already and has already stated some interest in potentially doing God of War: Ragnarok to compare it to the previous God of War 2018 PC port. He goes super deep though so it can take months and months of his free time so sadly they're few and far between.. but I'm definitely going to suggest to him to take a look at The Last of Us Part 2. Maybe some day he'll get around to it.

 
Hopefully Alex gets a look at South of Midnight. This is a very textbook low effort game.

- Cutscenes locked at 30fps but slowly get there and ramp back up so enjoy a stuttering mess
- Character has no interaction with the environment
- Money saved on textures lol
- pop in and lod issues galore

Gameplay is even worse than any of that so it really is the real deal of mediocrity.

Try it on xgp for the laughs but dear god, don’t buy this.
 
Hopefully Alex gets a look at South of Midnight. This is a very textbook low effort game.

- Cutscenes locked at 30fps but slowly get there and ramp back up so enjoy a stuttering mess
- Character has no interaction with the environment
- Money saved on textures lol
- pop in and lod issues galore

Gameplay is even worse than any of that so it really is the real deal of mediocrity.

Try it on xgp for the laughs but dear god, don’t buy this.
DF already reviewed it for both Xbox platforms and PC.
 
I know some Sony games like using shader resource tables on consoles to create/copy resource descriptors on the GPU timeline but on the PC port of Ghost of Tsushima they do expensive emulation of said feature with SM6.6 dyanmic resources. However, that is still a regression compared to prior plans for D3D12 being able to offer the functionality for the GPU to create/copy resource descriptors before the idea was scrapped altogether!
Yes these days it's better to just go entirely bindless, especially with raytracing. Not sure if that's what you mean by dynamic resources, but generally bindless has a fairly low overhead these days assuming relatively coherent access, which it would necessarily would be for a porting case like this. [edit] Ah yes this is just the nicer HLSL typing they added on top of bindless stuff. This need not really be any slower in practice, but it does depend on some details... i.e. it's a good fit for textures and general random access buffers, but not always a good fit for uniforms/constant buffers on some platforms where care needs to be taken to pass certain things with root constants explicitly instead.

While bindless is conceptually a good replacement for GPU descriptor copying it is typically somewhat intrusive into shaders - either by having to convert their sampling to explicit bindless, or requiring some amount of compiler preprocessing. Both of those are not a trivial amount of work, so agreed this could be another pitfall for a port.
 
I'm not going to broadly disagree that more work could be done to make some of these ports better. In fact that should be obvious from the differences in part 1 and part 2, yes? They clearly did more work porting part 2 and deserve praise for that. But I think we PC gamers (broadly generalizing... many of us play things on both these days) also need to avoid the temptation to fall into the same sort of "moving target" comparisons that we criticize console gamers for. It's very easy to throw out notions like "oh this DirectX feature or IHV-specific API would make things so much better!" without really having good data for a given game to back that up.

DirectStorage is actually a perfect example here... based purely on the advertising from Microsoft and others people got really excited and somehow got the idea that this was going to revolutionize PC gaming and so on, but I think it's pretty fair to say that for the few people that have used it (such as here), it doesn't exactly show off anything particularly revolutionary that can't be done without it.

Another example is sampler feedback and related streaming stuff; this is the stuff that gets marketed to consumers as to why they should get a new GPU or operating system or so on but in reality we are pretty deep into the point of diminishing returns on many of these API/hardware features (raytracing aside, since so much of that is still hardware black box on PC).

Mesh shaders specifically is really a tougher one to argue because it is a fundamental change to the content pipeline. You can't just "implement mesh shaders" and get a benefit. The important part of such a shift is the content pipeline changes that restructure meshes into appropriate clusters with associated LOD logic and so on. The actual mesh shader "hardware" feature is way less relevant than those changes, even if you implement them without mesh shaders at all (such as previous GPU culling systems did, nanite on systems without mesh shaders, etc). So while it's easy to say "just add mesh shaders and your geometry culling will get so much better", we don't really know here that geometry is a huge bottleneck in the first place, and the ask is potentially a rather huge one depending on their current geometry pipeline.

I'd certainly be curious for someone to do a deeper dive into some of these frames to see where the time is going, but I don't know that I would broadly say this is a "minimal effort port" for something that came from a single-platform engine. Even games that target both Xbox and PS5 are much easier to port because they fundamentally have to deal with more than one platform in the first place. Single platform stuff can just make a lot of simplifying assumptions to the point that "porting" them can sometimes stray into territory more like writing a PS5 emulator than just optimizing some code.

None of this is to say that bringing criticisms is not useful; I think a lot of the improvements in part 2 were probably motivated by the criticisms of part 1 and presumably if PC ports continue to be a thing for the series future engine iterations will keep this in mind even if they come out first on the consoles. But I think it's worth giving some leeway to folks who are porting a single platform game that was never designed with ever running on anything other than a playstation in mind, long past the time where any amount of art or content pipeline iteration is realistic.
Happy to see you respond Andrew. I agree with your appreciation of how optimising for PC and getting the most out of the platform is challenging. I also want to talk about the aspect in your quote of me that you did not mention.

"But you have *other* transformational performance enhancements that can be added but probably are not added because of Budget and time table reasons. "

What my original quote is saying is that there are always things that can be done to make a game perform better, but budget and time tables are getting in the way. Regarding mesh shaders which was a thrown out as an example by me, as you write yourself, the content pipleline change is the aspect that makes it a challenge in terms of time and money. Feasibility as being technically possible is most likely already there. Throwing out more triangles to in a way which maps closer to the underlying hardware is always gonna be nice - but for x amount of money and x amount of time before a release date to get x amount of return in sales? Maybe not. Let us just ignore for a moment the specific example of GPU performance, which I think is symptomatic of what I am talking about, but in general I think Nixxes has not been getting enough time and resources to do its ports since just after its first release of Spider-Man. The quality of releases has been increasingly strained from a reviewers perspective especially in light of how polished Sony games tend to be. Why do I say that?

Nixxes had to turn this port around of TLOUP2 in 6 months. In total, Nixxes have worked on 8-9 different game releases on 5-6 different engines in the last 3 years (Spider-Man PC, Horizon Zero Dawn patches of Virtuous Port, Miles Morales, Ratchet and Clank, Horizon Forbidden West, Ghost of Tsushima, Horizon Zero Dawn Remastered PC, Horizon Zero Dawn Remastered PS5, Spider-Man 2, The Last of us Part 2). I think the core thing limiting Sony game quality and stability is not technical feasibility of something (can we make X work at all?), rather, can we make x work at all with the amount of budget and time Sony gives us? I think when you look at the sheer volume and diversity of titles in that time scale, you can start imagining why cracks appear in Sony products at launch. It started to get really obvious to me with Ratchet and Clank that Sony is obviously not giving them enough time to release as surprising things were being left in products for launch. The pattern has become: game comes out with issues, initiate of flurry of patches near launch, look to have a greater amount of issues that are fixed or remedied within the 3 month window after launch usually. This is how Sony does it on PC as of late, on console, it is not like that at all. Sony games on console are typically seeded 2 weeks before launch in a gold position having undergone an extensive polishing phase. Based on what I can see in comparison to other game releases I cover, Nixxes is not being given the comensurate polish time before release for the task at hand.

As I said when Ratchet and Clank launched, I think Sony needs to give Nixxes more time before release in general. Both for a polishing and for fundamental aspects of the ports that come out. With the last batch of Sony releases on PC I have had to do a lot of gremlin chasing at launch which filter into my reviews "report on x bug the user might see" instead of spending time talking about optimised settings. With other games that publish in more polished states, I do not have to go bug hunting to try and figure something out for large parts of my review process, so I can spend more time on the normal aspects of a review. A great example is Microsoft's Senuas Saga, Kingdom Come Deliverence 2, Ubisoft Massive's Avatar, etc etc...

Sony's strategic decisions are having a negative effect on their ports on PC. Regarding The Last of Us Part 2, this strategic "moneyXtime" aspect is directly referenced by Naughty Dog and Nixxes in the interview I had with them. Some examples,
1. Where ND they say the originally wanted to Nixxes to port The Last of Us Part 1, but due to timetables, they got Iron Galaxy to do it.
2. Same with Part 2, ND wanted Nixxes on it, but instead they used Iron Galaxy and only transitioned to Nixxes after about 11 moths of of work.
3. When Nixxes mentions how they had to work with Iron Galaxies base port, instead of rolling their own.

With number 3 there, Nixxes has to deal with the sunk cost decisions referenced in Number 1 and 2. All of those things are considerations of "Time and Money" imposed by Sony, and not technical things. The end situation is where Nixxes had to doctor up fixes to issues from a problematic foundation from Iron Galaxy - kind of like they did with the original patches to Horizon Zero Dawn as ported by Virtuous. With TLOUP2 we are not looking at Nixxes' work in isolation, rather them applying fixes and adjustments to someone elses work. This is the opposite of what they did with Horizon Forbidden West, where they started the engine port themselves ignoring Virtuous' port. I really think we would be looking at a different product if Nixxes had done all of the TLOUP1 and 2 porting from the outset instead of what we got in the end.

That above is me just spittballing about money and product planning strategy and how I feel it is affecting its ports. In terms of concrete things in TLOUP2, you can see some of the effects of all that in the game.

1. When reviewing The Last of Us Part 2 pre-launch, we saw a number of visual issues in the game that had been in every single Naughty Dog release on PC, we had to tell Nixxes about them in a listed form. (some of these have since been patched)
2. TLOUP2 has nearly the same graphical options as TLOUP1.
3. TLOUP2 had the same forced sharpening that was present in Iron Galaxy ports (since patched)
4. GPU performance visavis console is largely the same

A long response to you here Andrew. lol

In general my disappointment referenced in the video title is much not at all with what Nixxes is capable of. Given enough time, they always do great things. Rather my disappointment with TLOUP2 is an effect of Sony's decisions on the quality of their PC releases and TLOUP2 has many hallmarks of that at release.
 
Last edited:
A long response to you here Andrew. lol
Thanks for the reply! And just to quote this part first - long replies are awesome and I appreciate you taking the time. One of the reasons I still engage a bit on B3D while I've left basically all other social media is because it offers the opportunity to actually dive in and get at the specifics even more than one can do in a youtube video or similar. Obviously there's pros and cons to every medium but I just wanted to say up front I am very aware of the amount of time it takes to read and engage with us on these sorts of topics and the fact that you take the time to write up stuff like this is one of the reasons I appreciate Digital Foundry and your work.

What my original quote is saying is that there are always things that can be done to make a game perform better, but budget and time tables are getting in the way. Regarding mesh shaders which was a thrown out as an example by me, as you write yourself, the content pipleline change is the aspect that makes it a challenge in terms of time and money.
Indeed, I fully agree with this and retrospectively I hope you didn't think I was trying to quote you out of context or misrepresent what you were saying.

I should have perhaps considered the way I phrased that more carefully; it was just a good example opportunity for me to speak to one of the peripherally-related topics that sort of bugs me on occasion, namely folks putting too much emphasis on marketed hardware or API features' influence on performance vs. the time and skill to do the optimization work that is needed. A decade or two ago this was more true for GPUs, but we're much closer to the CPU world now in terms of software being the primary driver of the results (ex. no one thinks a game performing poorly on the CPU is because they aren't using the latest AVX512 alphabet-soup feature). Even for big new features like raytracing, I think folks would be surprised at how relatively little hardware and how much software there is to support those implementations.

Anyways not to reiterate that again as I think we're very aligned on the stuff you are saying: there's not really a silver bullet for optimization in a lot of these ports, especially where it comes from platform exclusive games. Indeed I'm sure Nixxes has a long list of things they would continue to do if they were given time. Hopefully at least the work they are spending on these things can continue to be reused in future ports rather than thrown out and started again at the very least.

So first, fully agreed that these PC ports need more time to bake and should really be given them. There's obviously some issues that you don't run into until you're in the wild with a zillion different PC configurations, but I would be surprised if the majority of stuff you called out in your video wasn't known by Nixxes and Sony... just they were fixing even bigger issues until the last minute. Given our agreement here, permit me to broaden the scope because one of the threads you mentioned in your reply I think warrants some future discussion here and it's something I don't really have much insight into so I'd be curious what other folks think:
Nixxes had to turn this port around of TLOUP2 in 6 months. In total, Nixxes have worked on 8-9 different game releases on 5-6 different engines in the last 3 years (Spider-Man PC, Horizon Zero Dawn patches of Virtuous Port, Miles Morales, Ratchet and Clank, Horizon Forbidden West, Ghost of Tsushima, Horizon Zero Dawn Remastered PC, Horizon Zero Dawn Remastered PS5, Spider-Man 2, The Last of us Part 2). I think the core thing limiting Sony game quality and stability is not technical feasibility of something (can we make X work at all?), rather, can we make x work at all with the amount of budget and time Sony gives us?
...
As I said when Ratchet and Clank launched, I think Sony needs to give Nixxes more time before release in general. Both for a polishing and for fundamental aspects of the ports that come out.
...
1. Where ND they say the originally wanted to Nixxes to port The Last of Us Part 1, but due to timetables, they got Iron Galaxy to do it.
...
I really think we would be looking at a different product if Nixxes had done all of the TLOUP1 and 2 porting from the outset instead of what we got in the end.
So I think this may actually point to a deeper problem than unwillingness to spend time/money on these ports from Sony's part; there might just not be enough people with the fairly specific skills (in the world...) for them to do all these ports in the time they are trying to. With respect to the above notes, I suspect Sony would be more than happy to throw more money at Nixxes, and are mostly turning to other folks because there's only so much they can take on. And the more time they give folks to polish a specific title, the fewer titles they can port.

Now many of us would probably argue for quality over quantity, but it's harder for me to say they are somehow wrong to try and bring a broader range of their titles over. Certainly of the list you quoted there, I personally only care about maybe 1/3 of them, so it would be a bummer if those ones never got ported due to time constraints or something. But if course pick different people and the titles they care about will be different.

So I guess at a certain point the question is this: if you have a fixed amount of people available to work on these ports at the quality you want them at, how do you balance polish vs. throughput?

With other games that publish in more polished states, I do not have to go bug hunting to try and figure something out for large parts of my review process, so I can spend more time on the normal aspects of a review. A great example is Microsoft's Senuas Saga, Kingdom Come Deliverence 2, Ubisoft Massive's Avatar, etc etc...
True, but - not to undermine the effort spent making good PC versions of these other games - it's a very different situation if you develop something with multiplatform in mind from the start rather than revisit that years after a game has already shipped with a different development team. Moreover it's definitely easier to port something from Xbox to PC than something tailored a lot for PS5 to PC. I know you're not saying that these are analogous situations and are instead just pointing out that polished games do get released on PC, but just wanted to be clear for other readers.

In general my disappointment referenced in the video title is much not at all with what Nixxes is capable of. Given enough time, they always do great things. Rather my disappointment with TLOUP2 is an effect of Sony's decisions on the quality of their PC releases and TLOUP2 has many hallmarks of that at release.
That's fair, and I agree. I think Nixxes is doing a great job with the time and resources they have. I just suspect that even if Sony had infinite money to throw at this, we might be a bit supply-limited on folks who can do this work well, given the examples you posted of other studios doing ports. In that situation, the only real alternative is for Sony to significantly cut down on the number of ports they are doing and spend more time on each, but I suspect that's where the ROI math gets harder to justify.
 

I think Cyberpunk is a pretty solid way to compare the CPU performance between Switch 2 and the other consoles. There are a few segments in that game that are really CPU heavy, like Cherry Blossom Market. I know it's CPU limited because my Core i7 9750H is pretty similar in terms of performance to the consoles and it's also dropping frames a little there, regardless what resolution.

If you'd compare the crowd density between consoles and Switch 2 and then that framerate in Cherry Blossom Market, you should be able to get a good approximation of the performance. Not the raw CPU performance of course, but what skilled developers can squeeze out of it using the NVAPI.
 
Back
Top