Current Generation Hardware Speculation with a Technical Spin [post launch 2021] [XBSX, PS5]

Status
Not open for further replies.

pjbliverpool

B3D Scallywag
Legend
Congratulations to NXG!

So he does confirm PS5’s smaller game install size is due to Oodle compression. Nice!

If it is due to Oodle compression, then PS5 giving up a few secs of load speed for tighter compression would be a great tradeoff for PS5's limited storage size.

Not to derail the topic of misaligned reflections but I found the above to be more interesting.

Specifically because this would imply that the XSX version is not using BCPACK, since BCPACK should produce very similar results to Oodle Texture + Kraken. By extension that means the XSX may not be using it's hardware decompression block while the PS5 is, thus putting a greater load on the CPU which may have to stand in for the decompression. This could explain why the IO is being blamed for the stutters. Rather than it being down to the PS5 having a faster IO system (the XSX should be more than sufficient for anything this game needs) it's more a case of the XSX system not being fully utilised.
 

Shortbread

Island Hopper
Legend
Not to derail the topic of misaligned reflections but I found the above to be more interesting.

Specifically because this would imply that the XSX version is not using BCPACK, since BCPACK should produce very similar results to Oodle Texture + Kraken. By extension that means the XSX may not be using it's hardware decompression block while the PS5 is, thus putting a greater load on the CPU which may have to stand in for the decompression. This could explain why the IO is being blamed for the stutters. Rather than it being down to the PS5 having a faster IO system (the XSX should be more than sufficient for anything this game needs) it's more a case of the XSX system not being fully utilised.

I don't know, anything is possible. Are there any other third-party titles on XBSX that exhibit similar stuttering issues even with similar installation sizes? If the answer is yes, then it's something else that's hardware related.
 

Globalisateur

Globby
Veteran
Supporter
I don't know, anything is possible. Are there any other third-party titles on XBSX that exhibit similar stuttering issues even with similar installation sizes? If the answer is yes, then it's something else that's hardware related.
Since when framerate drops caused by CPU bottleneck during data streaming are bugs? We have those odd framerate drops since PS360 generation in almost all games that need to stream new data (PS5 and PC included) and this is the first time in years that I hear people calling those as such.

Those framerate drops are not bugs, there are performance issues caused by specific bottleneck (usually a combination of software + hardware limitation, at some point any software job will be limited by something) and like the performance problems caused by the GPU limits, these can be alleviated if the developers improve their code in order to better use the available hardware and software. We usually say the developers optimize their code.

And as this game is not using PS5 custom I/O hardware for the loadings, I doubt it is using it elsewhere.
 

pjbliverpool

B3D Scallywag
Legend
And as this game is not using PS5 custom I/O hardware for the loadings, I doubt it is using it elsewhere.

Just because a game isn't coded in such a way as to take advantage of a high speed IO system for loading doesn't mean it still can't decompress its optical drive files on a hardware decompression unit rather than a CPU. Its possible the PS4 version of the game already does that.
 

thicc_gaf

Regular
So uh, this isn't really related to the Control discussion but, came across a curious post on Era with someone saying the PS5 has one TMU per CU while Series X has four TMUs per CU. Can anyone here verify if that's true or not?

Because I've always assumed that both systems had four TMUs per CU, and looking over some of the RDNA 2 GPU specs you can basically work out those having four TMUs per CU and I figure that would be standardized in the RDNA 2 spec except maybe for very small laptop/mobile APUs. But it almost sounds too wild to be true, that'd create an absolutely massive gap between PS5 and Series X (in favor of the latter) when it comes to texel data and texture fillrate. We're talking 36 vs. 208 here!

Again though, it's just someone's else post and I can't even verify if that person is a dev or has access to devkits for these systems. But I'm curious if anyone here knows about this and can verify or debunk it. Really can't picture a TMU disparity that huge between these systems but hey, if so, it is what it is :/.
 

DSoup

Series Soup
Legend
Supporter
Just because a game isn't coded in such a way as to take advantage of a high speed IO system for loading doesn't mean it still can't decompress its optical drive files on a hardware decompression unit rather than a CPU. Its possible the PS4 version of the game already does that.

You don't want the I/O system decompressing data willy-nilly. Too much check-in code is written on the basis that it will load compressed data from disk, allocating only as much memory as needed to load that compressed data into before decompressing it (whether by CPU or hardware). If the data was already decompressed during transfer it won't fit into the allocated memory.

To get the best of the new I/O systems developers need to rethink how assets are stored from the ground up. This isn't something you can just shoe-horn in, this is literally changing the structure and storage of multiple gigabytes of data in most games.
 

function

None functional
Legend
You don't want the I/O system decompressing data willy-nilly. Too much check-in code is written on the basis that it will load compressed data from disk, allocating only as much memory as needed to load that compressed data into before decompressing it (whether by CPU or hardware). If the data was already decompressed during transfer it won't fit into the allocated memory.

To get the best of the new I/O systems developers need to rethink how assets are stored from the ground up. This isn't something you can just shoe-horn in, this is literally changing the structure and storage of multiple gigabytes of data in most games.

One of my suggestions about Control on XSX was that - possibly - the CPU side hitches weren't caused by high IO overhead as such, but more by the CPU being briefly deluged with work stemming from reads that would have previously been limited by natural bottlenecks on console and PC.

If your CPU is five times as fast, but your new storage system is effectively serving up data in the same way but ten or twenty times faster than even an SSD was last gen, perhaps what was an easily manageable instantaneous workload in the past become something that you start to choke on due to some element of how you manage data and operate upon it.

Even once you've got something in RAM, there can be quite a bit of work involved in getting it ready to be used by the game (and DX).
 

DSoup

Series Soup
Legend
Supporter
If your CPU is five times as fast, but your new storage system is effectively serving up data in the same way but ten or twenty times faster than even an SSD was last gen, perhaps what was an easily manageable instantaneous workload in the past become something that you start to choke on due to some element of how you manage data and operate upon it.
If it's this alone, it should be more pronounced on PS5 where the I/O is faster and the CPU is slower than Series X.
 

function

None functional
Legend
If it's this alone, it should be more pronounced on PS5 where the I/O is faster and the CPU is slower than Series X.

Indeed, which is why I was thinking about something like DX resource binding (just as an example) and the cost of getting assets ready for use once they're in memory. I believe certain API operations can really start to add up if you do too many per frame, and heavily impact frame rate.

At its heart Control for MS platforms is a 2019 DX11 game, designed around mechanical HDDs. The PS5 version is definitely is doing something better, probably on the CPU side, and I think it's most likely to be something to do with how the game is able to use data once it's in memory.

I understand that it's compelling to see stutters and hitches related to area transitions and say "Cerny IO block!!", but getting stuff into memory is only a small part of what it takes to get assets rendered on screen without hiccups.
 

DSoup

Series Soup
Legend
Supporter
I understand that it's compelling to see stutters and hitches related to area transitions and say "Cerny IO block!!", but getting stuff into memory is only a small part of what it takes to get assets rendered on screen without hiccups.
I don't think anybody is saying this but I do have a fair number of people on my ignore list.
 

function

None functional
Legend
I don't think anybody is saying this but I do have a fair number of people on my ignore list.

Well I'm probably getting a bit carried away and bringing in baggage from other conversations and other places, especially after seeing how NXGamer's comments have been unfairly used (not his fault, he make great videos).

I know you're not saying that, and I should have kept my reply more focused on what you were. I shouldn't have tossed that in, so sorry about that.
 

goonergaz

Veteran
No probs. Like I said I have a lot of people on my ignore list. Some threads look weird, like folks arguing with themselves. :yep2:

HHmmm...I don't think I'm on you ignore list :D, but I am curious what cache scrubbers bring to the table and why these are not a part of any discussion around the PS5 closing the gap in Control.

...I'm also interested to see, if I'm on your ignore list and the fact that I've quoted you, does this cause a rift in the space time continuum that ultimately get's me a ban!? :runaway:
 

DSoup

Series Soup
Legend
Supporter
HHmmm...I don't think I'm on you ignore list :D, but I am curious what cache scrubbers bring to the table and why these are not a part of any discussion around the PS5 closing the gap in Control.
I think the answer to that is that nobody able to post knows which is probably why there is little discussion. The speculation : fact ratio is already bad in the Console Technical forums! Too many folks participate on the premise of wanting to learn but they don't want to learn, they want their view to be correct.

Assuming Sony even have devtools to measure the effectiveness of the cache scrubbers (and how you would even measure it) maybe at some future GDC some dev will include this.
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
cache scrubbers bring to the table
hardest issue with Sony in general is how tight lipped they are about their hardware. Even finding something on PS3 is extremely challenging at this point in time. For whatever reason it's always been easier to discuss MS consoles, either the information flows easily, or there are just leaks everywhere; and perhaps this creates a bias against PS because there's no real talk happening there, but if cache scrubbers do something, we're not likely to hear about it until a developer interview mentions it.

Right now I think a majority of PS5s biggest advantage is probably sitting with their geometry engine. I do believe after some time, and sort of aligning with Matt H comments around it vs VRS. the geometry is culled so early, and their hardware seems biased to cull significantly more triangles than it can rasterize, that there is significantly less workload going forward. Typically back face culling happens very late in the 3D pipeline, so you're doing lots of work on a lot of triangles and then tossing them very late. I think this can explain some of performance boosts we're seeing with RDNA 2 (6800 series) and PS5 for some titles for sure (just peeking at other threads for Triangle generation etc).

At least I think it plays a larger role than cache scrubbers, which my understanding from reading seems like a general non issue. I think however, when Mesh Shaders do finally come into play (as I don't think cross gen titles will write a separate path for them), this advantage would go away in theory. But right now, I believe the compilers are making full use of converting calls into primitive shaders, and they are culling triangles very quickly. This looks like a significant pain point for XSX if (a) they aren't setup to do this (can't convert 3d shaders to primitive shaders) or (b) they don't have the fixed function pipelines of RDNA 2 for it in favour of having more compute and feature sets (aligning a bit more with nvidia in this case). At least, correct me if I'm wrong, but I haven't heard any hub bub around how well XSX can cull triangles it's possible that it's not big on FF triangle discard. And according to hotchips (b) is very improbable. They indicate unified Geometry engine, that also supports mesh shading. Leaving (a) which the last we looked at the documentation back in June, they were not capable of leveraging the NGG yet.

tldr; Geometry Engine and Primitive Shaders are my main focus for investigation for PS5. If there is statistical bias moving PS5 ahead of regression, this is where I would investigate given the information that is available.

I recall this one moment in DMC 5 demo where XSX completely tanked and PS5 held steady but we're literally rotating in a near empty room with a statue during a cutscene.

Just spitballing, that was a possible situation where PS5 was obliterating triangles out of view/blocked/etc and because it discarded so much so effectively, all of it's triangle generation could be put towards visible triangles. And we saw XSX dropped really badly there, meaning it was wasting triangle generation on triangles that would later be discarded and thus we saw a huge frame drop. This post here on B3D where Voxilla was doing some benchmarking was the inspiration here to look for these moments:
https://forum.beyond3d.com/posts/2191463/

For perspective XSX is 7.3GTris/s maximum rate.
 
Last edited:

cwjs

Regular
tldr; Geometry Engine and Primitive Shaders are my main focus for investigation for PS5. If there is statistical bias moving PS5 ahead of regression, this is where I would investigate given the information that is available.

I recall this one moment in DMC 5 demo where XSX completely tanked and PS5 held steady but we're literally rotating in a near empty room with a statue during a cutscene

Are geometry shaders something that could be generated automatically from regular primitive shaders? I haven't read much about them. Mesh shaders certainly cannot, especially not with any performance advantage (culling is manual in them, you'd need to process the meshes and go in and write the code -- maybe not a huge undertaking, but definitely not a free update.)

Regarding DMC5 (and control, which runs better than I expect if this is true) if what posters were saying in this thread about some xsx games being fast, presumably low budget dx11->dx12 updates, I think basically anything is on the table as far as performance dips go. Dx12 requires (careful) manual memory management -- the developer has to design the pipeline, manage gpu parallelism, control when resources are accessed simultaneously, and schedule things so the whole renderer doesn't grind to a halt (while it say, waits on one whole shader getting finished before any more code on the next one can start up due to memory barriers between resources). If that's what control is, without a significant amount of work to re-do the renderer, it's a miracle it works as well as it does.
 

iroboto

Daft Funk
Legend
Supporter
Are geometry shaders something that could be generated automatically from regular primitive shaders? I haven't read much about them. Mesh shaders certainly cannot, especially not with any performance advantage (culling is manual in them, you'd need to process the meshes and go in and write the code -- maybe not a huge undertaking, but definitely not a free update.)
This post here on B3D: https://forum.beyond3d.com/posts/2180107/
I'll post the tweet however. Unfortunately the follow up tweets were deleted, and I wish we took a screenshot of them. But it was very clear how effective the drivers were at taking raw front end shaders and converting them to primitive shaders.

It's about converting front end shaders into primitive shaders (near equivalent to Mesh shaders), I believe geometry shaders would be part of that, but no one really uses them as far I recall. The geometry engine would be responsible this task.

Regarding DMC5 (and control, which runs better than I expect if this is true) if what posters were saying in this thread about some xsx games being fast, presumably low budget dx11->dx12 updates, I think basically anything is on the table as far as performance dips go. Dx12 requires (careful) manual memory management -- the developer has to design the pipeline, manage gpu parallelism, control when resources are accessed simultaneously, and schedule things so the whole renderer doesn't grind to a halt (while it say, waits on one whole shader getting finished before any more code on the next one can start up due to memory barriers between resources). If that's what control is, without a significant amount of work to re-do the renderer, it's a miracle it works as well as it does.
Agreed, all sorts of possibilities, but I'm specifically looking for situations where there's nothing much really going on and XSX is tanking and PS5 is holding. Just seems to happen more often than not. Inside the room of death for Valhalla for instance, it's just dying. When you zoom in with the sniper rifle for hitman 3 for instance, it must keep all the scene geometry in memory, is it being drawn but not culled fast enough? The flowers obstruct all views of any geometry, but it still needs to be rendered, perhaps another culling problem. Dirt 5 may very well be a similar issue, it's just unable to discard the triangles or it's processing too many triangles that will later be discarded anyway. Dirt 5 is very tesselated track floor! All of that triangle generation cannot be wasted!

Just playing through demon souls for instance, was also something I really spent time to look at. The amount of tesselation and geometry everywhere.

I've been giving this a lot of thought, and with Control photomode showing me it's not an alpha issue (because PS5 should have won if this was the case, and therefore not a bandwidth issue, because PS5 would have won in this case because ROPs are largely bandwidth limited) then I really needed to look elsewhere.
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
So uh, this isn't really related to the Control discussion but, came across a curious post on Era with someone saying the PS5 has one TMU per CU while Series X has four TMUs per CU. Can anyone here verify if that's true or not?

Because I've always assumed that both systems had four TMUs per CU, and looking over some of the RDNA 2 GPU specs you can basically work out those having four TMUs per CU and I figure that would be standardized in the RDNA 2 spec except maybe for very small laptop/mobile APUs. But it almost sounds too wild to be true, that'd create an absolutely massive gap between PS5 and Series X (in favor of the latter) when it comes to texel data and texture fillrate. We're talking 36 vs. 208 here!

Again though, it's just someone's else post and I can't even verify if that person is a dev or has access to devkits for these systems. But I'm curious if anyone here knows about this and can verify or debunk it. Really can't picture a TMU disparity that huge between these systems but hey, if so, it is what it is :/.
not true as far as I know. It's the first I've heard of it if I'm honest.
 

cwjs

Regular
This post here on B3D: https://forum.beyond3d.com/posts/2180107/
I'll post the tweet however. Unfortunately the follow up tweets were deleted, and I wish we took a screenshot of them. But it was very clear how effective the drivers were at taking raw front end shaders and converting them to primitive shaders.

It's about converting front end shaders into primitive shaders (near equivalent to Mesh shaders), I believe geometry shaders would be part of that, but no one really uses them as far I recall. The geometry engine would be responsible this task.
Thanks a lot for the info -- this is something where my knowledge of the hardware side is way below my knowledge of the software side so bear with me if this is a stupid question: Automatically converting standard fixed function shaders (vertex, etc) into something that the hardware is better designed for (more compute, etc) is one thing, but where would culling get introduced here? Does the hardware have some way to know, or is something happening on the developer side? With Mesh Shaders, for example, theyre not necessarily faster than traditional fixed function geometry at all, at least for simple cases -- but because of how they're structured (Task shaders -- specialized compute shaders that dispatch Mesh Shaders, specialized compute shaders that take the place of Vertex Shaders) you can relatively simply introduce huge culling benefits that just aren't practical in a straightfoward way on the old pipeline. (There are some recent xbox developer youtube videos about this on the dx12 side)
Agreed, all sorts of possibilities, but I'm specifically looking for situations where there's nothing much really going on and XSX is tanking and PS5 is holding. Just seems to happen more often than not. Inside the room of death for Valhalla for instance, it's just dying. When you zoom in with the sniper rifle for hitman 3 for instance, it must keep all the scene geometry in memory, is it being drawn but not culled fast enough? The flowers obstruct all views of any geometry, but it still needs to be rendered, perhaps another culling problem. Dirt 5 may very well be a similar issue, it's just unable to discard the triangles or it's processing too many triangles that will later be discarded anyway. Dirt 5 is very tesselated track floor! All of that triangle generation cannot be wasted!

I mean, I think the "nothing much going on" tells us that something shady is going on (bad dx12 renderer constantly hanging on fences, serious tool problem, something wrong with the hardware) rather than a performance difference. With the hitman example: I don't think that's actually very easy to cull. The kind of thorough geometric tests that can guarantee 'the flowers obstruct all views' are expensive and hard to get right -- I think the safer guess is that one has more to do with: 1- the way the hitman renderer works (maybe a relatively 'straightforward' forward renderer or something?) 2- the xbox running at a way higher resolution and 3- yeah, a ps5 hardware advantage on fill rate would make sense, but not that big.
 
Status
Not open for further replies.
Top