Current Generation Games Analysis Technical Discussion [2022] [XBSX|S, PS5, PC]

iroboto · Feb 21, 2022

PSman1700 said:
Is that what a 5700XT/rx6700 etc is doing?

5700XT has 64 single ROPS. @1800 Mhz
PS5 has 64 single ROPS. @2230 Mhz
XSX has 32 double ROPS @1800 Mhz
6700XT has 64 double ROPS @2400 Mhz

The doubling only affects certain formats, not all. So having more is still advantageous. 64 single rops is better than 32 double rops.
Known Fixed Quantities

When PS5 and XSX are using formats that can be doubled. PS5 still has up to +24% advantage on XSX. (pending clock speed variation)
When they aren't using formats that can be doubled PS5 has up to 147% advantage. (pending clock speed variation)
If Mesh Shaders aren't being used or micro geometry compute engines - PS5 has up to 24% triangle generation advantage (pending clock speed variation), possibly more if they support NGG pipeline on the driver side (likely imo based on Cerny talks)
If Mesh Shaders aren't being used or micro geometry compute engines - It also has up to 24% more discard than XSX (pending clock speed variation), possibly significantly more if PS5 supports NGG pipeline on the driver side of things (likely imo based on Cerny talks)
CPU is largely the same. DX12 is a heavier and clunkier API than GNM, and PS5 doesn't need to run games in a VM like Xbox does, so the clock speed differentials may be a wash.
There is in general a lot less hiccupping/issues on PS5, likely due to API differences.

Unknown Quantities (that imo are over attributed to when you consider the above):

Memory pool splitting on series consoles
I/O latency / bandwidth
Kraken decompressor / Oodle
cache scrubbers
VRS
RT Units on XSX

TLDR:
If PS5 and XSX are performing relatively the same, then XSX is doing the catchup on the ALU side of things. Which depending on the sizing of the workload can expose up to a maximum of a 20% more TF advantage (there is better utilization of CUs in the compute pipeline over the 3D pipeline)

When you consider this, you can see how XSX has a wild time with consistency because developers are more than free and continue to make heavy usage of the 3D pipeline. Between PS5 and XSX, despite being up on ALU, XSX can run simply into more bottlenecks at the beginning or the end of the 3D pipeline that may not cap out PS5, and so the reliance on the compute differential is required to make up that shortfall.

RT Units on XSX there are more and that quantity is fixed, however the type of ray tracing will determine how well utilized these units are. The more incoherent the algorithm is, the less saturation of threads there will be. And it becomes a serialized race of actions pending how fast data is brought back from memory. Parallelization is very low in RT, so the advantages are not clear.

Globalisateur · Feb 21, 2022

@iroboto In this game Dictator thinks the drops are CPU bound. And we can probably ignore ROPs in this game and in UE5 demo where there are drops during traversals.

Anyways you completely forgot the actually most important new feature of RDNA (compared to GCN): the new L1 cache which, ignoring ROPs, is the weakest part of XSX (compared to PS5), and by far. Compared to PS5: XSX is L1 cache starved. This is the biggest problem of XSX architecture in my opinion.

Both machines have 4MB of L1 cache. But PS5 cache is clocked 22% faster and XSX has to feed more CUs (44%) by L1 cache. So the more the games are going to be coded specifically for next gen consoles (so the main part will actually be L1 cache optimizing IMO), the more PS5 will be "pushing above its weight".

Nesh · Feb 21, 2022

iroboto said:
5700XT has 64 single ROPS. @1800 Mhz
PS5 has 64 single ROPS. @2230 Mhz
XSX has 32 double ROPS @1800 Mhz
6700XT has 64 double ROPS @2400 Mhz

The doubling only affects certain formats, not all. So having more is still advantageous. 64 single rops is better than 32 double rops.
Known Fixed Quantities

When PS5 and XSX are using formats that can be doubled. PS5 still has up to +24% advantage on XSX. (pending clock speed variation)

When they aren't using formats that can be doubled PS5 has up to 147% advantage. (pending clock speed variation)

If Mesh Shaders aren't being used or micro geometry compute engines - PS5 has up to 24% triangle generation advantage (pending clock speed variation), possibly more if they support NGG pipeline on the driver side (likely imo based on Cerny talks)

If Mesh Shaders aren't being used or micro geometry compute engines - It also has up to 24% more discard than XSX (pending clock speed variation), possibly significantly more if PS5 supports NGG pipeline on the driver side of things (likely imo based on Cerny talks)

CPU is largely the same. DX12 is a heavier and clunkier API than GNM, and PS5 doesn't need to run games in a VM like Xbox does, so the clock speed differentials may be a wash.

There is in general a lot less hiccupping/issues on PS5, likely due to API differences.

Unknown Quantities (that imo are over attributed to when you consider the above):

Memory pool splitting on series consoles

I/O latency / bandwidth

Kraken decompressor / Oodle

cache scrubbers

VRS

RT Units on XSX

TLDR:
If PS5 and XSX are performing relatively the same, then XSX is doing the catchup on the ALU side of things. Which depending on the sizing of the workload can expose up to a maximum of a 20% more TF advantage (there is better utilization of CUs in the compute pipeline over the 3D pipeline)

When you consider this, you can see how XSX has a wild time with consistency because developers are more than free and continue to make heavy usage of the 3D pipeline. Between PS5 and XSX, despite being up on ALU, XSX can run simply into more bottlenecks at the beginning or the end of the 3D pipeline that may not cap out PS5, and so the reliance on the compute differential is required to make up that shortfall.

RT Units on XSX there are more and that quantity is fixed, however the type of ray tracing will determine how well utilized these units are. The more incoherent the algorithm is, the less saturation of threads there will be. And it becomes a serialized race of actions pending how fast data is brought back from memory. Parallelization is very low in RT, so the advantages are not clear.

So when does the Series X have an advantage? This sounds surprising to me.

iroboto · Feb 21, 2022

Globalisateur said:
@iroboto In this game Dictator thinks the drops are CPU bound. And we can probably ignore ROPs in this game and in UE5 demo where there are drops during traversals.

Anyways you completely forgot the actually most important new feature of RDNA (compared to GCN): the new L1 cache which, ignoring ROPs, is the weakest part of XSX (compared to PS5), and by far. Compared to PS5: XSX is L1 cache starved. This is the biggest problem of XSX architecture in my opinion.

Both machines have 4MB of L1 cache. But PS5 cache is clocked 22% faster and XSX has to feed more CUs (44%) by L1 cache. So the more the games are going to be coded specifically for next gen consoles (so the main part will actually be L1 cache optimizing IMO), the more PS5 will be "pushing above its weight".

*small correction for you here though.
$GL1 is 256K per shader array, 4 shader arrays = 1MB
I don't have any information on L1 cache information on XSX. I only know that RDNA 2 default setup is 1M L1 cache, or 256K per shader array. And all RDNA 2 devices ship with 4MB of L2 cache while XSX ships with 5MB. The largest gaming GPU that nvidia pushes has 6MB. So that extra MB has significant performance impacts when you do not have infinity cache.

Unless you have information otherwise, we don't have exact information on L1 cache size for XSX. But it's likely, as you say, 1MB. Though it still has that larger L2 cache that PS5 has no access to and will result in a round trip to memory.
*regardless. L1 cache is read only on RDNA2. The compute units write back to L2 which XSX has a formidable advantage. Thus the need to have a larger L1 may be unnecessary. But who knows. L2 management is what Sony developers will need to optimize for, not L1 management.

Cache amount/speed is generally less important for the slower clocked XSX here, there is less penalization for XSX for a cache miss over PS5.

The reverse perspective is that PS5 goes fast, but to really take advantage of all of those additional cycles and not be wasted, it needs Infinity Cache which it doesn't have. Nor does it have any more cache that we know of.

Hit rates on cache will be better at lower resolutions lower workloads. But at high resolutions like 4K, or large data jobs you need bandwidth. And PS5 is very short compared to XSX here.

iroboto · Feb 21, 2022

Nesh said:
So when does the Series X have an advantage? This sounds surprising to me.

the bulk of the work is done on the ALU/Compute Units for both 3D Pipeline and Compute. That's where the actual computation takes place. Which is why you typically see higher resolutions for XSX, because there is more compute available. But frame rate is about frame time, and if you are not compute bound, reducing resolution won't improve your frame rate. So bottlenecks elsewhere in your pipeline can still slow the console down, despite it being able to do more math.

And bandwidth, XSX has a lot more of it. It can work a lot of data.

Riddlewire · Feb 21, 2022

Is there anything particular about AMD's architecture that prevents an implementation of 48 ROPS?
Considering the low power consumption of games on Series X thus far, I don't think 48 double ROPS would have burned up the chip.
The Xbox One X could have benefited from 48 ROPS.

BRiT · Feb 21, 2022

Riddlewire said:
Is there anything particular about AMD's architecture that prevents an implementation of 48 ROPS?

I think the limiting aspect was more SOC size and less thermal or power consumption. Their HotChips presentation stresses silicon savings more than other factors from what I recall.

Allandor · Feb 21, 2022

Nesh said:
So when does the Series X have an advantage? This sounds surprising to me.

In compute heavy situations and when new features are used (like mesh shading) the Xbox should have the advantage. Anything rop bound isn't that great

Riddlewire · Feb 21, 2022

BRiT said:
I think the limiting aspect was more SOC size and less thermal or power consumption. Their HotChips presentation stresses silicon savings more than other factors from what I recall.

If that's the case, then I gotta say, it seems like a mistake, given how relatively tiny those units are.

BRiT · Feb 21, 2022

Riddlewire said:
If that's the case, then I gotta say, it seems like a mistake, given how relatively tiny those units are.

Try to find the actual HotChips presentation as they give the actual savings, instead of looking at third party interpretations of images. That's not to say that interpretation was bad, just that to know their decision making logic is better in the source content.

iroboto · Feb 21, 2022

Riddlewire said:
Is there anything particular about AMD's architecture that prevents an implementation of 48 ROPS?
Considering the low power consumption of games on Series X thus far, I don't think 48 double ROPS would have burned up the chip.
The Xbox One X could have benefited from 48 ROPS.

I believe so yes, there is some optimal number do with the number of pixels per triangle, and the number of ROPs available to push out triangles.
I believe IIRC RDNA is capable of 4 triangles per clock after culling.
64/4 = 16 pixels per triangle, which is the optimum smallest triangle for this particular architecture. Once you start shrinking the number of pixels to 8 pixels per triangle the performance gets significantly worse. 4 px and 2px and 1px per triangle fall off a cliff entirely.

So 48 ROPS would be 3 triangles per clock. or 48 pixels per clock
64 ROPs is 4 triangles per clock or 64 pixels drawn per clock
double pumped would be 128 pixels drawn, but you're still limited by triangles per clock.

PSman1700 · Feb 21, 2022

iroboto said:
5700XT has 64 single ROPS. @1800 Mhz
PS5 has 64 single ROPS. @2230 Mhz
XSX has 32 double ROPS @1800 Mhz
6700XT has 64 double ROPS @2400 Mhz

The doubling only affects certain formats, not all. So having more is still advantageous. 64 single rops is better than 32 double rops.
Known Fixed Quantities

When PS5 and XSX are using formats that can be doubled. PS5 still has up to +24% advantage on XSX. (pending clock speed variation)

When they aren't using formats that can be doubled PS5 has up to 147% advantage. (pending clock speed variation)

If Mesh Shaders aren't being used or micro geometry compute engines - PS5 has up to 24% triangle generation advantage (pending clock speed variation), possibly more if they support NGG pipeline on the driver side (likely imo based on Cerny talks)

If Mesh Shaders aren't being used or micro geometry compute engines - It also has up to 24% more discard than XSX (pending clock speed variation), possibly significantly more if PS5 supports NGG pipeline on the driver side of things (likely imo based on Cerny talks)

CPU is largely the same. DX12 is a heavier and clunkier API than GNM, and PS5 doesn't need to run games in a VM like Xbox does, so the clock speed differentials may be a wash.

There is in general a lot less hiccupping/issues on PS5, likely due to API differences.

Unknown Quantities (that imo are over attributed to when you consider the above):

Memory pool splitting on series consoles

I/O latency / bandwidth

Kraken decompressor / Oodle

cache scrubbers

VRS

RT Units on XSX

TLDR:
If PS5 and XSX are performing relatively the same, then XSX is doing the catchup on the ALU side of things. Which depending on the sizing of the workload can expose up to a maximum of a 20% more TF advantage (there is better utilization of CUs in the compute pipeline over the 3D pipeline)

When you consider this, you can see how XSX has a wild time with consistency because developers are more than free and continue to make heavy usage of the 3D pipeline. Between PS5 and XSX, despite being up on ALU, XSX can run simply into more bottlenecks at the beginning or the end of the 3D pipeline that may not cap out PS5, and so the reliance on the compute differential is required to make up that shortfall.

RT Units on XSX there are more and that quantity is fixed, however the type of ray tracing will determine how well utilized these units are. The more incoherent the algorithm is, the less saturation of threads there will be. And it becomes a serialized race of actions pending how fast data is brought back from memory. Parallelization is very low in RT, so the advantages are not clear.

Ok thanks for explaining, a good read for anyone intrested in the differences between these consoles and what they are doing.
RX6700XT looks quite beasty there

It basically has, specs wise, all the advantages, ROPs count, (double at that), high clocks, mem bw. Would have been a very good gamers choice if it was at its MSRP.
Its also quite clear PS5 gpu is akin to the 5700XT in design (not in all areas ofcourse).

iroboto said:
Cache amount/speed is generally less important for the slower clocked XSX here, there is less penalization for XSX for a cache miss over PS5.

The reverse perspective is that PS5 goes fast, but to really take advantage of all of those additional cycles and not be wasted, it needs Infinity Cache which it doesn't have. Nor does it have any more cache that we know of.

Hit rates on cache will be better at lower resolutions lower workloads. But at high resolutions like 4K, or large data jobs you need bandwidth. And PS5 is very short compared to XSX here.

Seems both MS and Sony made trade-offs, just in different areas. Neither is bad by any means, the consoles are close enough.

Allandor · Feb 21, 2022

PSman1700 said:
Seems both MS and Sony made trade-offs, just in different areas. Neither is bad by any means, the consoles are close enough.

There are always compromises to be made. Currently the xbox chip is more or less filled with "useless" features, but as soon as these features are used, the extra area needed for these should pay off. But this needs time. The Playstation has a more dynamic approach to this. It uses power now in an optimal way to get the chip to high clock speeds. As long as the old featureset is used this shouldn't be a problem. But as soon as the real optimization process begins, there will be a balance how far you can "optimize". If you go to far with the workload the GPU might need more power so the clockspeed will decrease itsself.

It seems always that MS might have played a bit to much for safety on the chip-side. I guess the GPU is able to run much faster (RDNA2 is quite clock-friendly) and the cooling system would be good enough for the task. But on the other hand, than there might be a problem with getting running chips on the given clock speeds.
But in the end, both are quite fast consoles and should be able to deliver good games. And that's what counts. But from the technical perspective, it would just be interesting how fast the consoles could have been.

snc · Feb 21, 2022

PSman1700 said:
RX6700XT looks quite beasty there It basically has, specs wise, all the advantages, ROPs count, (double at that), high clocks, mem bw.

little misunderstanding here

https://twitter.com/x/status/1431716742292721664

its doubled in meaning that rb+ has now 8 color + 16 depth rops (vs 4 color + 16 depts as in rdna1) but that means when we say ps5 has 64 color rops it also has 2x more depths rops vs 6700xt and xsx with also 64 rops

PSman1700 · Feb 22, 2022

snc said:
little misunderstanding here

https://twitter.com/x/status/1431716742292721664
its doubled in meaning that rb+ has now 8 color + 16 depth rops (vs 4 color + 16 depts as in rdna1) but that means when we say ps5 has 64 color rops it also has 2x more depths rops vs 6700xt and xsx with also 64 rops

Intresting. 6700XT is the more modern approach, reading further comments from him. its probably better-specced in just about every other way aswell. Had to login (twitter) to read further comments which i dont want to do now, but its an intresting flow of tweets (mainly on ps5/xsx).

BRiT · Feb 22, 2022

PSman1700 said:
Intresting. 6700XT is the more modern approach, reading further comments from him. its probably better-specced in just about every other way aswell. Had to login (twitter) to read further comments which i dont want to do now, but its an intresting flow of tweets (mainly on ps5/xsx).

The other difference is the legacy ROPs don't have hardware support for VRS.

mr magoo · Feb 22, 2022

PSman1700 said:
Ok thanks for explaining, a good read for anyone intrested in the differences between these consoles and what they are doing.

yes, really appreciate time and effort @iroboto puts in his comments, very good read indeed.

Shortbread · Feb 22, 2022

The versions tested were 1.500.000 on PS5 and 1.5.0.2 on Xbox Series X|S. The graphics options such as Depth of Field and Motion Blur were enabled for this test.

00:00 - Ray Tracing Mode and Xbox Series S
06:31 - Performance Mode

PS5 in Ray Tracing Mode uses a dynamic resolution with the highest resolution found being 2560x1440 and the lowest resolution found being approximately 2368x1332. Pixel counts at 2560x1440 seem to be common on PS5 in Ray Tracing Mode.

Xbox Series X in Ray Tracing Mode uses a dynamic resolution with the highest resolution found being 2560x1440 and the lowest resolution found being approximately 2304x1296. Pixel counts at 2560x1440 seem to be common on Xbox Series X in Ray Tracing Mode.

Xbox Series S uses a dynamic resolution with the highest resolution found being 2560x1440 and the lowest resolution found being approximately 2062x1160. Pixel counts at 2560x1440 seem to be common on Xbox Series S.

PS5 in Performance Mode uses a dynamic resolution with the highest resolution found being 3840x2160 and the lowest resolution found being approximately 2062x1160. Pixel counts at 3840x2160 seem to be very rare on PS5 in Performance Mode.

Xbox Series X in Performance Mode uses a dynamic resolution with the highest resolution found being 3840x2160 and the lowest resolution found being approximately 2062x1160. Pixel counts at 3840x2160 seem to be very rare on Xbox Series X in Performance Mode.

Below are some example pixel counts for certain scenes on PS5 and Xbox Series X in Performance Mode. Note that these figures are approximate and not necessarily representative of how the entirety of a given area will render.

Kabuki Entrance - PS5 2176x1224, Series X 2304x1296
Near Police Station - PS5: 2435x1370, Series X: 2560x1440
Outside Tom's Diner - PS5: 2506x1410, Series X: 2631x1480
Corpo Start Building - PS5: 2656x1494, Series X: 2744x1544
Streetkid Start - PS5: 2062x1160, Series X: 2062x1160

The Xbox Series consoles appear to be using VRS.

FSR appears to be used to upscale the image to 3840x2160 on PS5 and Series X in both modes and 2560x1440 on Series S. The UI resolution is also rendered at 3840x2160 on PS5 and Series X in both modes and 2560x1440 on Series S.

Ray Tracing Mode adds Ray Traced Local Shadows. Ray Tracing Mode also improves Screen Space Reflections quality and also seems to improve Ambient Occlusion quality.

There aren't any selectable modes on Xbox Series S.

Stats: https://bit.ly/3LXNJZl
Frames Pixel Counted: https://bit.ly/3gTpwoH

mr magoo · Feb 22, 2022

Alex about VRS on his tweeter

https://twitter.com/x/status/1495790931341103112

good stuff

PSman1700 · Feb 22, 2022

mr magoo said:
yes, really appreciate time and effort @iroboto puts in his comments, very good read indeed.

Absolutely. Even for those who have not so much insight into tech can generally understand his explanations.

Current Generation Games Analysis Technical Discussion [2022] [XBSX|S, PS5, PC]

iroboto

Daft Funk

Globalisateur

Globby

Nesh

Double Agent

iroboto

Daft Funk

iroboto

Daft Funk

Riddlewire

BRiT

(>• •)>⌐■-■ (⌐■-■)

Allandor

Riddlewire

BRiT

(>• •)>⌐■-■ (⌐■-■)

iroboto

Daft Funk

PSman1700

Allandor

snc

PSman1700

BRiT

(>• •)>⌐■-■ (⌐■-■)

mr magoo

Shortbread

Island Hopper

mr magoo

PSman1700

Similar threads