Xbox Series X [XBSX] [Release November 10 2020]

thats was my impression about this cu count talk without clock and bandwidth ;)
I think there was an ask earlier on RT performance on whether clockspeed or more CUs would matter. Each CU contains it's own RT block, so naturally, it would get through the RT portion of the of the pipeline faster.
However.

From the other thread though in particular WRT the RE8 comparison, and Oleg's all seeing eye: https://forum.beyond3d.com/posts/2202188/

This would have been unexpected for RT heavy game with per pixel RayTracing.
But in this case it's likely that BVH building takes the most time and its cost is fixed and independent of resolution. If they are also doing something like RTXGI for GI, then ray tracing portion is also non depend on resolution. So increasing resolution would make raster portion of frame larger and RT portion of frame smaller, hence the scaling you see in the gamegpu results.

Just checked numbers and yep, that's the case, rasterization is 65% faster in 1080p, RT workload takes 2.32 ms.
In 4K, rasterization only mode is just 20% faster, and RT takes just 1.9 ms. That's for 3090

On 6900 XT, RT portion takes 4.47 ms in 1080p and 3.9 ms in 4K.

Probably RT is slightly limited by CPU in 1080p (this would explain slightly higher RT cost in 1080p)

So even if you have more RT units, you may still be CPU limited on the workload, meaning the RT units may not necessarily be fully utilized.
ie. RE8 is probably not an ideal benchmark for RT performance since both are likely CPU limited given they are both some form of checkerboard rendering.
 
I think there was an ask earlier on RT performance on whether clockspeed or more CUs would matter. Each CU contains it's own RT block, so naturally, it would get through the RT portion of the of the pipeline faster.
However.

From the other thread though in particular WRT the RE8 comparison, and Oleg's all seeing eye: https://forum.beyond3d.com/posts/2202188/



So even if you have more RT units, you may still be CPU limited on the workload, meaning the RT units may not necessarily be fully utilized.
ie. RE8 is probably not an ideal benchmark for RT performance since both are likely CPU limited given they are both some form of checkerboard rendering.
Rt capabilities on amd cards are connected with tmu units and scale also with clocks
 
Rt capabilities on amd cards are connected with tmu units and scale also with clocks
yea but a 23% clock difference isn't going to make up 44% less RT units.
If PS5 and XSX are running the same RT frame times, then the additional RT units aren't being used or they are bottlenecked somewhere down the line. Which isn't impossible, we see this happen with ROPs often (bandwidth bottleneck) but I think as a general statement more RT units should outperform less RT units at the same function. There isn't enough clockspeed to make up the differential, nothing controversial by those stating that above.
 
yea but a 23% clock difference isn't going to make up 44% less RT units.
If PS5 and XSX are running the same RT frame times, then the additional RT units aren't being used or they are bottlenecked somewhere down the line. Which isn't impossible, we see this happen with ROPs often (bandwidth bottleneck) but I think as a general statement more RT units should outperform less RT units at the same function. There isn't enough clockspeed to make up the differential, nothing controversial by those stating that above.
when did I write it will ? just pointed out that talking about cu count without clock and not simple use tflops and bandwidth as indicator is misleading
 
It's not really "hoping"; things are moving to favor parallelism and wider designs in the GPU space. MS wanted a design that was future-proofed in this respect, and we have benchmarks on mesh shader routines showing incredible degrees of performance uplift over the traditional 3D pipeline process that operates mainly off fixed-function hardware. So perhaps in some way it's a cost-cutting measure but it's no more of one than, say, Sony settling for pseudo-programmable logic with the primitive shaders in their design...and I'd say Microsoft's choices are easily the more future-proofed of the two even in spite of compromises.
I personally think it's more simpler than this.
They wanted a specific performance and this was the most efficient and cost effective way to achieve it.
They wanted a raw 2 times the 1X performance which is 12TF.

Its not particularly slow or wide, it's just considered that when compared to PS5.
 
Weird way to write 'reaction youtuber'.
Yan Chernikov worked at DICE on the Frostbite engine for 5 years, has been developing his own game engine called Hazel and has a ~10 year-old channel with hundreds of video tutorials on how to develop each stage of a videogame engine.

"Reaction youtuber" is a weird way to write "guy who knows more about videogame rendering than all the other graphics analysis youtubers/journalists combined".



What do you mean by better performance on the back end?
PS5 and SeriesX have the same number of Render Output Units, but the first runs their ROPs at >20% higher clocks therefore it has >20% higher pixel fillrate.


Please correct me if I'm wrong, but every argument I've seen in favour of the PS5's architecture seems to indicate that it's higher clocks mean it will consistently be somewhat superior when it comes to traditional rasterisation.
All games are using traditional rasterization still (even Dreams uses the ROPs at some points AFAIK). There are now games using a hybrid raster where they mix real time raytracing with rasterization, but they're still using a rasterizer in the pipeline.
 
So if we look at the rendering pipeline as a whole the only thing we can be sure for now is ps5 and xsx will complete different stages of rendering process at different speed and both have advantages in specific areas. As long as the load may differ between different steps, some applications at specific times may favour one over another. Right?
 
So if we look at the rendering pipeline as a whole the only thing we can be sure for now is ps5 and xsx will complete different stages of rendering process at different speed and both have advantages in specific areas. As long as the load may differ between different steps, some applications at specific times may favour one over another. Right?
From what I've been reading from developer statements, it's exactly this.
 
Last edited by a moderator:
One things that's great for us interested in rendering load differences is that many games ship with graphics modes and high refresh options that allow us to see differences that may not have been visible at lower framerates or resolutions. That's not something we got in generations past.
 
So if we look at the rendering pipeline as a whole the only thing we can be sure for now is ps5 and xsx will complete different stages of rendering process at different speed and both have advantages in specific areas. As long as the load may differ between different steps, some applications at specific times may favour one over another. Right?

Pretty much. However, from the data we have on both systems right now we can infer there are areas each have advantages on over the other. PS5's main GPU rendering advantages are pixel fillrate, triangle culling, triangle rasterization and faster L0$ on the CUs. Series X's main GPU rendering advantages are texture/texel fillrate, wider L0$ bandwidth and BVH traversal intersection tests for RT, though some of these rely on GPU saturation percentage.

Some engines work great saturating wider designs (in context to full RDNA 2 GPUs Series X's GPU isn't really that "wide" and at one point 36 CUs were also considered "wide" under some versions of GCN), some don't. But most engines just tend to "automatically" benefit from faster clocks, up to a limit anyway (thinking about certain game logic that might get thrown out of wack).

PS5 and SeriesX have the same number of Render Output Units, but the first runs their ROPs at >20% higher clocks therefore it has >20% higher pixel fillrate.

That does have an effect WRT pixel fillrate but the config of the ROP backend between the two systems differs; PS5's is older while Series X's is more recent (adhering to what the RDNA 2 GPUs offer on that front).
 
That does have an effect WRT pixel fillrate but the config of the ROP backend between the two systems differs; PS5's is older while Series X's is more recent (adhering to what the RDNA 2 GPUs offer on that front).
Being more recent doesn't automatically mean it's more performant.
IIRC the PS5 has more die area dedicated to ROPs but the SeriesX has ROPs that closely resemble RDNA2 PC GPUs. It could be that the new arrangement simply offers die area savings, or area savings with a small cost in performance.
As an example, one Kepler SM (192sp) provides higher compute performance than one Maxwell SM (128sp), yet the die area savings made it a worthwhile trade-off.
 
  • Like
Reactions: snc
PS5 and SeriesX have the same number of Render Output Units, but the first runs their ROPs at >20% higher clocks therefore it has >20% higher pixel fillrate

Thanks.

All games are using traditional rasterization still (even Dreams uses the ROPs at some points AFAIK). There are now games using a hybrid raster where they mix real time raytracing with rasterization, but they're still using a rasterizer in the pipeline.

Right. This is why I think the XSX will perform increasingly better than the PS5 as the generation goes on: the PS5's higher clocks make it a superior rasterisation machine, the XSX's wider GPU and higher bandwidth make it a superior ray tracing machine. As the generation goes on, we're likely to see more games move from full rasterisation, to hybrid, to a point where ray tracing is a major part of the rendering pipeline.

I suppose the Metro Exodus update will be quite a good early barometer of this.
 
Perhaps one final thought on the concept of clockspeed and CU count. Perhaps this hypothetical case may make more sense.
If you design a synthetic benchmark which only leveraged the fixed function portions of the hardware only and skipped the entire unified shader pipeline entirely;
if XSX ran this benchmark at 100fps, PS5 would run this benchmark at 123fps, or 23% faster as per their clock speed difference.
There is absolutely nothing XSX can do to mitigate this difference because they have exactly the same FF hardware but PS5 runs 23% faster here.
Which means, in any benchmark of which XSX and PS5 are pretty much identical, XSX essentially made up that deficit on the back half of the frame where compute and unified shaders do its work despite this part also being 23% slower.
And so that CU advantage there is putting in work to make up the clock speed differential twice.
I hope that makes sense.

Effectively the larger the back half of the portion is, or the further away the XSX can get away from fixed function pipeline, the more it will leverage it's silicon strengths.

I think typically this isn't the case with other GPU families. Often larger GPUS are also shipped with larger front ends so that you don't run into this scenario of clockspeed vs core count. It's more like you have it all so there is no way the more expensive card will perform worse than a model lower on the family.

Thinking about it this way, XSX is actually the one that is an outlier as it's front end performance and general compute performance are not well matched. Increasing the clock speed would help narrow that particular gap with respect to PS5, but it wouldn't make up for the fact that it's mismatched with respect to itself.

I don't know if MS had went this route in favour of hoping the transition to Mesh shaders would happen sooner since CUs are what they use to process geometry, and it's not clear on their intentions on whether they hoped developers intend to skip the 3D pipeline altogether in favour of just using compute shaders for rendering out pixels instead of ROPs. It does come across again, as a cost cutting measure.
However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.
 
Looks like Fable is using Forzatech

Do you want to have a major impact on 3 AAA titles in development across 2 beloved Xbox franchises? ForzaTech is the engine, tools, and pipelines that drive both the Forza Motorsport and Forza Horizon series of games. In addition to adding new features like raytracing to support the next console generation, we are also enriching the toolset to support an open world action RPG – Fable.

The Tech Share team at Turn 10 builds new systems and tools that benefit all games running on ForzaTech, helps teams design features and tools in a shareable way, and provides support for our cross-studio content teams. We are looking for a generalist who is comfortable working across many different areas and enjoys learning new systems to help us craft a better ForzaTech.

https://careers.microsoft.com/us/en/job/1039800/Software-Engineer-Turn10-Studios
 
However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.
I remember seeing a test (can't remember what site or where I saw it) where they tested CU count scaling on RDNA 2 GPUs and found that they scale in performance pretty well up to 60 CU, but above that the scaling drops off dramatically.
 
However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.

I remember seeing a test (can't remember what site or where I saw it) where they tested CU count scaling on RDNA 2 GPUs and found that they scale in performance pretty well up to 60 CU, but above that the scaling drops off dramatically.

Nor surprising. Designing GPUs is as much art as science. It's about finding a balance of components within the limitations of transistor budget and ancillary technologies (like memory speeds).

No matter what balance is struck, any given GPU will perform better or worse depending on what parts/components of the GPU are stressed in any given game/scenario.

Because GPUs are a balancing act, scaling one factor up without also scaling up other facets of the GPU is unlikely to see linear scaling (as that factor will become increasingly limited by other factors). And in the rare occasions you do have linear scaling, it won't infinitely scale linearly (eventually the factor you are scaling will become limited by other factors).

Both the PS5 and XBS systems are engineered with a certain balance of features that each engineering team felt was the best way to spend their transistors combined with the level and cost of ancillary technology to be used. Each are likely targeting different ideas about how game rendering will evolve over the generation. Neither architecture will be perfectly utilized because each individual game developer has different ideas about how they want to render their games or even what their game rendering budget requires.

In other words, overclocking an existing GPU (to use an easy example) is unlikely to show how effective the increased clocks of the PS5 are. Likewise, examining increased CU counts on PC GPUs is unlikely to show how effective the wider architecture of the XBS-X is. Especially if the comparison is between a fully enabled GPU versus a cut down salvage GPU. And even less informative if the comparison is between different GPUs of the same family. It may or may not give some hints of how the different choices will impact rendering, but it's unlikely to illuminate us on how they actually effect rendering.

What I find most interesting thus far is how evenly the different architectures perform despite the differences in engineering choices made. Much of that comes down to the fact that the architectures are far more similar than they are different. But a part is also the fact that for multiplatform games (the only ones where direct comparisons are possible) it's in the developers best interest to have the game run well on all platforms it releases on.

It does make me wonder if we'll see distinctly different approaches to how a scene is rendered in platform exclusive games towards the mid to end of the generation. Basically the exclusives that will start development after the launch of the current generation of consoles.

Regards,
SB
 
Nor surprising. Designing GPUs is as much art as science. It's about finding a balance of components within the limitations of transistor budget and ancillary technologies (like memory speeds).

No matter what balance is struck, any given GPU will perform better or worse depending on what parts/components of the GPU are stressed in any given game/scenario.

Because GPUs are a balancing act, scaling one factor up without also scaling up other facets of the GPU is unlikely to see linear scaling (as that factor will become increasingly limited by other factors). And in the rare occasions you do have linear scaling, it won't infinitely scale linearly (eventually the factor you are scaling will become limited by other factors).

Both the PS5 and XBS systems are engineered with a certain balance of features that each engineering team felt was the best way to spend their transistors combined with the level and cost of ancillary technology to be used. Each are likely targeting different ideas about how game rendering will evolve over the generation. Neither architecture will be perfectly utilized because each individual game developer has different ideas about how they want to render their games or even what their game rendering budget requires.

In other words, overclocking an existing GPU (to use an easy example) is unlikely to show how effective the increased clocks of the PS5 are. Likewise, examining increased CU counts on PC GPUs is unlikely to show how effective the wider architecture of the XBS-X is. Especially if the comparison is between a fully enabled GPU versus a cut down salvage GPU. And even less informative if the comparison is between different GPUs of the same family. It may or may not give some hints of how the different choices will impact rendering, but it's unlikely to illuminate us on how they actually effect rendering.

What I find most interesting thus far is how evenly the different architectures perform despite the differences in engineering choices made. Much of that comes down to the fact that the architectures are far more similar than they are different. But a part is also the fact that for multiplatform games (the only ones where direct comparisons are possible) it's in the developers best interest to have the game run well on all platforms it releases on.

It does make me wonder if we'll see distinctly different approaches to how a scene is rendered in platform exclusive games towards the mid to end of the generation. Basically the exclusives that will start development after the launch of the current generation of consoles.

Regards,
SB
And the other factor is that both the XSX and PS5 are using some RDNA 1 and some RDNA 2 parts, wether is the front end, or back end or ROPs.
So that introduces another variable.

I think from a typical GPU and CPU point of view, there isn't much between them, and that being shown so far with the actual games.
What I am interested to see is if other features such as Mesh Shaders, VRS, SFS and ML on the XSX as well as Primitive Shaders on PS5 (don't have much more to add on the PS5 side as Sony just won't talk about their console tech) get adopted by devs and incorporated into engines, and if this may give an advantage over just the raw GPU difference.
But there's no guarantees that they will be. Devs are generally slow at adopting new features. The slow uptake on DLSS and Ray Tracing for instance.

From everything you hear, I expect that Turn 10 have incorporated alot of these features into the Forzatech engine, and you would expect idtech 7 to be built around showcasing DX12U features.

Fun times ahead.
 
Being more recent doesn't automatically mean it's more performant.
IIRC the PS5 has more die area dedicated to ROPs but the SeriesX has ROPs that closely resemble RDNA2 PC GPUs. It could be that the new arrangement simply offers die area savings, or area savings with a small cost in performance.
As an example, one Kepler SM (192sp) provides higher compute performance than one Maxwell SM (128sp), yet the die area savings made it a worthwhile trade-off.

If the PS5 has more die area dedicated to ROPs then that's because proportionally speaking, it has a smaller APU than Series X but ROP units aren't going to change in size to scale with the CU counts. I mean, they're ROPs, they have their function and a set silicon/transistor budget that's going to stay more or less fixed.

It would only seem like Series X's are smaller because it has a larger APU its ROPs are contained in (due to higher CU count).

However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.

Performance actually can scale with clock increases; however the real point is that the scaling is not linear and at some point you run into a wall where you're exerting a lot more power for minimum performance gains. On some GPUs, this actually starts to crater performance long-term.

PS5 still has to adhere to these laws of physics even if it's using supplementary features like Smart Shift to handle distribution of power load between CPU and GPU of the system.
 
If the PS5 has more die area dedicated to ROPs then that's because proportionally speaking, it has a smaller APU than Series X but ROP units aren't going to change in size to scale with the CU counts. I mean, they're ROPs, they have their function and a set silicon/transistor budget that's going to stay more or less fixed.

The ROP implementations changed between RDNA1 and RDNA2. I believe that is what is being referenced there.
 
Back
Top