snc
Veteran
thats was my impression about this cu count talk without clock and bandwidthI don't think anyone here thinks its going to be a 44% advantage due to CUs. At least I haven't read it
thats was my impression about this cu count talk without clock and bandwidthI don't think anyone here thinks its going to be a 44% advantage due to CUs. At least I haven't read it
I think there was an ask earlier on RT performance on whether clockspeed or more CUs would matter. Each CU contains it's own RT block, so naturally, it would get through the RT portion of the of the pipeline faster.thats was my impression about this cu count talk without clock and bandwidth
This would have been unexpected for RT heavy game with per pixel RayTracing.
But in this case it's likely that BVH building takes the most time and its cost is fixed and independent of resolution. If they are also doing something like RTXGI for GI, then ray tracing portion is also non depend on resolution. So increasing resolution would make raster portion of frame larger and RT portion of frame smaller, hence the scaling you see in the gamegpu results.
Just checked numbers and yep, that's the case, rasterization is 65% faster in 1080p, RT workload takes 2.32 ms.
In 4K, rasterization only mode is just 20% faster, and RT takes just 1.9 ms. That's for 3090
On 6900 XT, RT portion takes 4.47 ms in 1080p and 3.9 ms in 4K.
Probably RT is slightly limited by CPU in 1080p (this would explain slightly higher RT cost in 1080p)
Rt capabilities on amd cards are connected with tmu units and scale also with clocksI think there was an ask earlier on RT performance on whether clockspeed or more CUs would matter. Each CU contains it's own RT block, so naturally, it would get through the RT portion of the of the pipeline faster.
However.
From the other thread though in particular WRT the RE8 comparison, and Oleg's all seeing eye: https://forum.beyond3d.com/posts/2202188/
So even if you have more RT units, you may still be CPU limited on the workload, meaning the RT units may not necessarily be fully utilized.
ie. RE8 is probably not an ideal benchmark for RT performance since both are likely CPU limited given they are both some form of checkerboard rendering.
yea but a 23% clock difference isn't going to make up 44% less RT units.Rt capabilities on amd cards are connected with tmu units and scale also with clocks
when did I write it will ? just pointed out that talking about cu count without clock and not simple use tflops and bandwidth as indicator is misleadingyea but a 23% clock difference isn't going to make up 44% less RT units.
If PS5 and XSX are running the same RT frame times, then the additional RT units aren't being used or they are bottlenecked somewhere down the line. Which isn't impossible, we see this happen with ROPs often (bandwidth bottleneck) but I think as a general statement more RT units should outperform less RT units at the same function. There isn't enough clockspeed to make up the differential, nothing controversial by those stating that above.
I personally think it's more simpler than this.It's not really "hoping"; things are moving to favor parallelism and wider designs in the GPU space. MS wanted a design that was future-proofed in this respect, and we have benchmarks on mesh shader routines showing incredible degrees of performance uplift over the traditional 3D pipeline process that operates mainly off fixed-function hardware. So perhaps in some way it's a cost-cutting measure but it's no more of one than, say, Sony settling for pseudo-programmable logic with the primitive shaders in their design...and I'd say Microsoft's choices are easily the more future-proofed of the two even in spite of compromises.
Yan Chernikov worked at DICE on the Frostbite engine for 5 years, has been developing his own game engine called Hazel and has a ~10 year-old channel with hundreds of video tutorials on how to develop each stage of a videogame engine.Weird way to write 'reaction youtuber'.
PS5 and SeriesX have the same number of Render Output Units, but the first runs their ROPs at >20% higher clocks therefore it has >20% higher pixel fillrate.What do you mean by better performance on the back end?
All games are using traditional rasterization still (even Dreams uses the ROPs at some points AFAIK). There are now games using a hybrid raster where they mix real time raytracing with rasterization, but they're still using a rasterizer in the pipeline.Please correct me if I'm wrong, but every argument I've seen in favour of the PS5's architecture seems to indicate that it's higher clocks mean it will consistently be somewhat superior when it comes to traditional rasterisation.
From what I've been reading from developer statements, it's exactly this.So if we look at the rendering pipeline as a whole the only thing we can be sure for now is ps5 and xsx will complete different stages of rendering process at different speed and both have advantages in specific areas. As long as the load may differ between different steps, some applications at specific times may favour one over another. Right?
So if we look at the rendering pipeline as a whole the only thing we can be sure for now is ps5 and xsx will complete different stages of rendering process at different speed and both have advantages in specific areas. As long as the load may differ between different steps, some applications at specific times may favour one over another. Right?
PS5 and SeriesX have the same number of Render Output Units, but the first runs their ROPs at >20% higher clocks therefore it has >20% higher pixel fillrate.
Being more recent doesn't automatically mean it's more performant.That does have an effect WRT pixel fillrate but the config of the ROP backend between the two systems differs; PS5's is older while Series X's is more recent (adhering to what the RDNA 2 GPUs offer on that front).
PS5 and SeriesX have the same number of Render Output Units, but the first runs their ROPs at >20% higher clocks therefore it has >20% higher pixel fillrate
All games are using traditional rasterization still (even Dreams uses the ROPs at some points AFAIK). There are now games using a hybrid raster where they mix real time raytracing with rasterization, but they're still using a rasterizer in the pipeline.
However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.Perhaps one final thought on the concept of clockspeed and CU count. Perhaps this hypothetical case may make more sense.
If you design a synthetic benchmark which only leveraged the fixed function portions of the hardware only and skipped the entire unified shader pipeline entirely;
if XSX ran this benchmark at 100fps, PS5 would run this benchmark at 123fps, or 23% faster as per their clock speed difference.
There is absolutely nothing XSX can do to mitigate this difference because they have exactly the same FF hardware but PS5 runs 23% faster here.
Which means, in any benchmark of which XSX and PS5 are pretty much identical, XSX essentially made up that deficit on the back half of the frame where compute and unified shaders do its work despite this part also being 23% slower.
And so that CU advantage there is putting in work to make up the clock speed differential twice.
I hope that makes sense.
Effectively the larger the back half of the portion is, or the further away the XSX can get away from fixed function pipeline, the more it will leverage it's silicon strengths.
I think typically this isn't the case with other GPU families. Often larger GPUS are also shipped with larger front ends so that you don't run into this scenario of clockspeed vs core count. It's more like you have it all so there is no way the more expensive card will perform worse than a model lower on the family.
Thinking about it this way, XSX is actually the one that is an outlier as it's front end performance and general compute performance are not well matched. Increasing the clock speed would help narrow that particular gap with respect to PS5, but it wouldn't make up for the fact that it's mismatched with respect to itself.
I don't know if MS had went this route in favour of hoping the transition to Mesh shaders would happen sooner since CUs are what they use to process geometry, and it's not clear on their intentions on whether they hoped developers intend to skip the 3D pipeline altogether in favour of just using compute shaders for rendering out pixels instead of ROPs. It does come across again, as a cost cutting measure.
Do you want to have a major impact on 3 AAA titles in development across 2 beloved Xbox franchises? ForzaTech is the engine, tools, and pipelines that drive both the Forza Motorsport and Forza Horizon series of games. In addition to adding new features like raytracing to support the next console generation, we are also enriching the toolset to support an open world action RPG – Fable.
The Tech Share team at Turn 10 builds new systems and tools that benefit all games running on ForzaTech, helps teams design features and tools in a shareable way, and provides support for our cross-studio content teams. We are looking for a generalist who is comfortable working across many different areas and enjoys learning new systems to help us craft a better ForzaTech.
I remember seeing a test (can't remember what site or where I saw it) where they tested CU count scaling on RDNA 2 GPUs and found that they scale in performance pretty well up to 60 CU, but above that the scaling drops off dramatically.However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.
However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.
I remember seeing a test (can't remember what site or where I saw it) where they tested CU count scaling on RDNA 2 GPUs and found that they scale in performance pretty well up to 60 CU, but above that the scaling drops off dramatically.
And the other factor is that both the XSX and PS5 are using some RDNA 1 and some RDNA 2 parts, wether is the front end, or back end or ROPs.Nor surprising. Designing GPUs is as much art as science. It's about finding a balance of components within the limitations of transistor budget and ancillary technologies (like memory speeds).
No matter what balance is struck, any given GPU will perform better or worse depending on what parts/components of the GPU are stressed in any given game/scenario.
Because GPUs are a balancing act, scaling one factor up without also scaling up other facets of the GPU is unlikely to see linear scaling (as that factor will become increasingly limited by other factors). And in the rare occasions you do have linear scaling, it won't infinitely scale linearly (eventually the factor you are scaling will become limited by other factors).
Both the PS5 and XBS systems are engineered with a certain balance of features that each engineering team felt was the best way to spend their transistors combined with the level and cost of ancillary technology to be used. Each are likely targeting different ideas about how game rendering will evolve over the generation. Neither architecture will be perfectly utilized because each individual game developer has different ideas about how they want to render their games or even what their game rendering budget requires.
In other words, overclocking an existing GPU (to use an easy example) is unlikely to show how effective the increased clocks of the PS5 are. Likewise, examining increased CU counts on PC GPUs is unlikely to show how effective the wider architecture of the XBS-X is. Especially if the comparison is between a fully enabled GPU versus a cut down salvage GPU. And even less informative if the comparison is between different GPUs of the same family. It may or may not give some hints of how the different choices will impact rendering, but it's unlikely to illuminate us on how they actually effect rendering.
What I find most interesting thus far is how evenly the different architectures perform despite the differences in engineering choices made. Much of that comes down to the fact that the architectures are far more similar than they are different. But a part is also the fact that for multiplatform games (the only ones where direct comparisons are possible) it's in the developers best interest to have the game run well on all platforms it releases on.
It does make me wonder if we'll see distinctly different approaches to how a scene is rendered in platform exclusive games towards the mid to end of the generation. Basically the exclusives that will start development after the launch of the current generation of consoles.
Regards,
SB
Being more recent doesn't automatically mean it's more performant.
IIRC the PS5 has more die area dedicated to ROPs but the SeriesX has ROPs that closely resemble RDNA2 PC GPUs. It could be that the new arrangement simply offers die area savings, or area savings with a small cost in performance.
As an example, one Kepler SM (192sp) provides higher compute performance than one Maxwell SM (128sp), yet the die area savings made it a worthwhile trade-off.
However real world tests never show that. Digital foundry did tests with the same tflop GPU, with one clocked narrow and fast and the other wider and slower, and the performance was better on the wider and slower GPU. Also, a website did tests on RDNA 1 cards if I recall correctly, where they over clocked the GPU by 18% or something and the game only improved performance by 10% or something similar, in other words the performance didn't scale with clock increases.
If the PS5 has more die area dedicated to ROPs then that's because proportionally speaking, it has a smaller APU than Series X but ROP units aren't going to change in size to scale with the CU counts. I mean, they're ROPs, they have their function and a set silicon/transistor budget that's going to stay more or less fixed.