Digital Foundry Article Technical Discussion [2021]

Status
Not open for further replies.
Or provide the tools to developers, which I'm sure they do.
Do you know how many games are being released and in development. Not a small amount.
If a developer needs help I'm sure if they reach out xbox provides it.

What MS needs to do is keep improving the dev environment, tools and produce good POC's etc.

Something I have been thinking though.
MS needs their own DF team for internal studios that advocate for technologies, analyse in development games, pretty much exactly what DF does.
Microsoft has profiling tools , they said in interviews that used that information on how to make the xbox one x better by profiling the games running on the one/s

in my experience they offer this already. They develop examples and tutorials of how to leverage things. Large Publishers are generally the ones providing additional support in the way you are describing. It's rare that MS will send their own teams to resolve issues unless there is profitable reason to do so.

ie XGS will do this for their own first party titles, but are not likely to engage in 3P support in this manner unless they are invested.

I think they need it for larger titles. Elden Ring is a big release , they should have been there making sure it was the best it could possibly be on their hardware. Maybe they just need to buy even more studios lol to make sure games run best on thier hardware
 
Microsoft has profiling tools , they said in interviews that used that information on how to make the xbox one x better by profiling the games running on the one/s
PIX tools are available to all developers if that is what you're referring to just like Sony offers SN Tools to all developers as well.

I get the dissatisfaction with the lack of optimization but this is sometimes the way the cookie crumbles.

Series consoles are lopsided for performance in compute/shaders; PS5 is really balanced respectively towards everything.
 

not surprising that the ps4 does much better than the xbox one. It shows the age of these systems with the Xbox one original at 900p with enemies updating at 15fps. You can see the Series S while it has a slight resolution deficit the rest of the image makes it a solid upgrade over the one x and a huge upgrade over the one. I mean the series s is 12.33 seconds to load vs the 22.58 seconds of the one.

I really think Microsoft needs to develop some teams that go around and help developers optimize for the series consoles. I think they would be able to get a lot of bang for their buck

I think more than anything this is an excellant example of why large customizations for consoles (which require heavy developer involvement and expertise to use) that won't exist in future hardware is an unwise investment within the concept of "rolling" generations.

Specifically talking about the large amount of die space allocated for the ESRAM on the base XBO. At this point, pretty much no-one outside of possibly a small number of MS 1st party studios is optimizing for ESRAM which is essential to the architecture to help mitigate it's anemic bandwidth to main memory. While there might be some automated optimizations that the MS SDK will attempt to use, it's not remotely on the level of explicitly using the ESRAM by the developers. Thus performance will suffer more or less depending on how well (or not well) automatic use of the ESRAM when compiling with the Xbox SDK aligns with what the game engine is doing.

It'd be fine if something is going to be carried forward and inherited by the immediate next console, but if it's not and it requires extensive optimization to use ... then it just isn't going to get used or optimized for once a new "generation" of consoles are out. But for ESRAM itself, it was such a huge silicon investment that existed almost purely to allow for the use of cheaper commodity RAM which also required such a huge investment from the developer to optimize for in order to try to mitigate the impact of cheap commodity RAM that it's just a perfect storm of what you want to try to avoid in a console going forward.

Regards,
SB
 
Series consoles are lopsided for performance in compute/shaders; PS5 is really balanced respectively towards everything.
Given the direction of engine development in the last 10 years, I wouldn't say its lopsided at all.
Or maybe guess I could see it framed that way if talking about cross gen, early gen games.
At this point, pretty much no-one outside of possibly a small number of MS 1st party studios is optimizing for ESRAM which is essential to the architecture to help mitigate it's anemic bandwidth to main memory
If devs wasn't optimising for it the games would be running a lot worse.
If this is all auto and taken care of in sdk without optimisation then xbox worked wonders and I'm amazed.
In my eyes would prove the exact opposite of what you say about having customised hardware.

Most differences between the PS4 and XO seem to come down to compute. I will say that the esram is an added issue though.
 
If devs wasn't optimising for it the games would be running a lot worse.
If this is all auto and taken care of in sdk without optimisation then xbox worked wonders and I'm amazed.
In my eyes would prove the exact opposite of what you say about having customised hardware.

Most differences between the PS4 and XO seem to come down to compute. I will say that the esram is an added issue though.

No, what I'm saying is that if the dev's were explicitly coding for and more importantly optimizing their code to use ESRAM that the performance in Elden Ring would be better on the base XBO. The gap between the XBO version (ESRAM optimizations needed) versus XBO-X (no ESRAM optimizations needed) is, IMO, far larger than we usually saw before the launch of the current gen consoles. Basically much larger than when XBO was still being optimized for.

Any auto-SDK compilation use of the ESRAM will vary in effectiveness from small to decent depending on how well the game code aligns with how compilation is designed to leverage the ESRAM, but would never match explicit developer optimizations for ESRAM.

Regards,
SB
 
Given the direction of engine development in the last 10 years, I wouldn't say its lopsided at all.
Or maybe guess I could see it framed that way if talking about cross gen, early gen games.
I think it's a good time to bring back a statement that @sebbbi put here a long time ago. Paraphrased: When we market TFLOPs as a catch all term for the power of a GPU, we make a lot of assumptions about the whole GPU. TFLOPs are a measure of the compute and unified shader theoretical throughput, we are largely assuming that the fixed function pipelines and the bandwidth are scaled proportionally to support the increase in compute and shader performance. But that's not always the case.

We see here that series consoles have an expanded ALU and bandwidth to support it, but fixed function hardware did not scale to meet the increase in compute, instead quite the opposite, it's frankly went below what you'd find on a 5700 for instance. That is lop sided because they used that silicon space to increase the CUs at the cost of ROPs and Geometry. Among all AMD GPUs released to date, XSX has the most number of CUs per Shader Engine; not even the 6900XT can compare which has a full 10 dual CUs per Shader Engine, the XSX has 14 but disabled 2 dual CUs. Now couple this with 1/2 the actual number of RBE's per shader engine you'd find in any current generation AMD GPU! The issue with MS setup is that poor code will expose obvious bottlenecks in the system. It doesn't really have a way to get around potential bottlenecks because of the fixed clocks, it requires optimization that suits its needs.

Whereas the PS5 actually finds itself in perfect balance for CUs per SE. It follows precisely what AMD determined would be the most optimal number of CUs for the available FF hardware within a shader engine; 10. Then they just disabled 2 dual CUs. They have the same RBEs and geometry you'd find in any current GPU. To make up for their compute performance loss of the disabled units, they moved to variable clocks and maximized performance across a spectrum of well optimized or not well optimized code, this more or less in line with AMD performance in general on PC for this gen (it looks like).

Overall the ceiling is likely higher on XSX, as more complicated graphics will require more and more shader/compute power, but if you're looking over a broad spectrum of different types of games, and studio talents, it's not really a wonder why PS5 can perform equally to XSX. Developers and engines and games are not the same, there's just too much variation and PS5 is more forgiving in this respect.
 
Specifically talking about the large amount of die space allocated for the ESRAM on the base XBO. At this point, pretty much no-one outside of possibly a small number of MS 1st party studios is optimizing for ESRAM which is essential to the architecture to help mitigate it's anemic bandwidth to main memory.
Given the age of XBO, and the type of graphical outputs there are out there, I'm fairly certain the ESRAM is being used extensively. You can't get away with that level of graphics on a 68GB/s that is shared between CPU and GPU., and that bandwidth would be cut down even further on read/write scenarios. The ESRAM is a simultaneous read/write beast, it's main benefits goes towards modify functions in which ROPS are likely to eat the majority of ESRAM bandwidth during blending operations. If you know you no longer need the buffer in esram, you write results to main memory, this is where a majority of 'work in progress' will happen.

optimizing for esram should not be nearly as difficult as it was optimizing for cell.
Take X from memory to ALU, perform calculations on it -> write results to esram if you need it again later, or write results back to memory if you no longer will need it for a long time.

Take Y from esram to ALU, and perform calculations on it -> write results back to esram if you need it again, or write results back to memory if you no longer need it.

Developers are not forced to move main memory to esram first before proceeding to do work on it.

GPU consumes data and writes results where the developer wants it to go.

I don't see this being an issue here for optimization after nearly 10 years of development, this should be trivial.
 
Last edited:
Overall the ceiling is likely higher on XSX, as more complicated graphics will require more and more shader/compute power, but if you're looking over a broad spectrum of different types of games, and studio talents, it's not really a wonder why PS5 can perform equally to XSX. Developers and engines and games are not the same, there's just too much variation and PS5 is more forgiving in this respect.
I agree that the ceiling is probably higher as engines are moving more of what was done in rops to compute.
In terms of studio talent, I don't have a view on this as I think forza h:5, flight sim, gears shows more than enough talent, and that's not even taking into account the new studios like ID. But sure Sony has produced some amazing games.
I'm far from surprised how PS5 is performing in comparison to XSX in general.
But I also expect that due to decisions made on the PS5 it would perform better compared to its specs than XSX would. I just think that it will take longer on XSX as more things are moved to compute.
I also don't think XSX is performing badly in general either to be fair.

And also what you was saying in your reply is that XSX would need more optimization, sure that's the nature of consoles. The question is how hard is it to do and how much overthead will it give you. Which is why I don't see it as unbalanced.

Regarding the general discussion of XSX needing a lot of support from MS to go to 3P studios. I'll also like to point out that a lot of the times the framerate probably could've been improved just by lowering resolution which they choose not to do.
 
I think it's a good time to bring back a statement that @sebbbi put here a long time ago. Paraphrased: When we market TFLOPs as a catch all term for the power of a GPU, we make a lot of assumptions about the whole GPU. TFLOPs are a measure of the compute and unified shader theoretical throughput, we are largely assuming that the fixed function pipelines and the bandwidth are scaled proportionally to support the increase in compute and shader performance. But that's not always the case.

We see here that series consoles have an expanded ALU and bandwidth to support it, but fixed function hardware did not scale to meet the increase in compute, instead quite the opposite, it's frankly went below what you'd find on a 5700 for instance. That is lop sided because they used that silicon space to increase the CUs at the cost of ROPs and Geometry. Among all AMD GPUs released to date, XSX has the most number of CUs per Shader Engine; not even the 6900XT can compare which has a full 10 dual CUs per Shader Engine, the XSX has 14 but disabled 2 dual CUs. Now couple this with 1/2 the actual number of RBE's per shader engine you'd find in any current generation AMD GPU! The issue with MS setup is that poor code will expose obvious bottlenecks in the system. It doesn't really have a way to get around potential bottlenecks because of the fixed clocks, it requires optimization that suits its needs.

Whereas the PS5 actually finds itself in perfect balance for CUs per SE. It follows precisely what AMD determined would be the most optimal number of CUs for the available FF hardware within a shader engine; 10. Then they just disabled 2 dual CUs. They have the same RBEs and geometry you'd find in any current GPU. To make up for their compute performance loss of the disabled units, they moved to variable clocks and maximized performance across a spectrum of well optimized or not well optimized code, this more or less in line with AMD performance in general on PC for this gen (it looks like).

Overall the ceiling is likely higher on XSX, as more complicated graphics will require more and more shader/compute power, but if you're looking over a broad spectrum of different types of games, and studio talents, it's not really a wonder why PS5 can perform equally to XSX. Developers and engines and games are not the same, there's just too much variation and PS5 is more forgiving in this respect.

I would also like to add that when it comes to bandwidth per CU, XSX has the lowest of any RDNA2 based GPU.
 
Sometimes "Lead Development Platform" explains everything.

Mass Effect Legendary Edition, for example. Identical 1440p resolutions with minimal scaling and identical visual settings on both PS4 Pro and One X, yet Pro outperforms X constantly, and not by a small margin.

I suspect this development focus shift has increased in Playstation's favor as the 8th generation continued and their sales lead increased.
 
We see here that series consoles have an expanded ALU and bandwidth to support it, but fixed function hardware did not scale to meet the increase in compute, instead quite the opposite, it's frankly went below what you'd find on a 5700 for instance. That is lop sided because they used that silicon space to increase the CUs at the cost of ROPs and Geometry. Among all AMD GPUs released to date, XSX has the most number of CUs per Shader Engine; not even the 6900XT can compare which has a full 10 dual CUs per Shader Engine, the XSX has 14 but disabled 2 dual CUs. Now couple this with 1/2 the actual number of RBE's per shader engine you'd find in any current generation AMD GPU! The issue with MS setup is that poor code will expose obvious bottlenecks in the system. It doesn't really have a way to get around potential bottlenecks because of the fixed clocks, it requires optimization that suits its needs.

Whereas the PS5 actually finds itself in perfect balance for CUs per SE. It follows precisely what AMD determined would be the most optimal number of CUs for the available FF hardware within a shader engine; 10. Then they just disabled 2 dual CUs. They have the same RBEs and geometry you'd find in any current GPU. To make up for their compute performance loss of the disabled units, they moved to variable clocks and maximized performance across a spectrum of well optimized or not well optimized code, this more or less in line with AMD performance in general on PC for this gen (it looks like).
It's fun to look back at past consoles and then at the current ones to try to guess why choices were made and what lessons were learned. With Xbox One compared to PS4, it's (Xbox) GPU had 66% of the shading units but only 50% of the ROPs. It's sort of hard to compare because Xbox One had so many deficiencies when compared to PS4, but it's interesting that they seam to favor more TF's and less ROPs, as I don't think they've released an Xbox that matches the ROPs of the directly Playstation competitor. It's always been half, right? At least with Series they are half as many that are (often) twice as fast.
 
but isn't bandwidth per tflops better metric ? why not including clocks here

It would be but CU's are a constant, XSX will always have 52 CU's.

It's also impossible to know exactly what clocks and thus what Tflop rating PS5 is running at any given frame to make a comparison.

Even though I believe PS5 isn't downclocking at all which is why it's competing so well.
 
One reason to go low on ROPs is the belief that, over the generation, these will be less important as algorithms develop around compute that require less rasterising.

Even today pixel shaders are only used for a few things right? Depth pass, primary visibility, shadows. I don’t think the first 2 are going away anytime soon.
 
I don't think we'll ever be able to give up on ROPs for the foreseeable future unless we ditch the traditional HW blending pipeline ...

The traditional graphics pipeline has always enforced the rule that blending must happen in API order. It essentially means that in the case that all triangles output the exact same depth value, the last submitted triangle will draw over geometry that was submitted before. Developers have taken this behaviour for granted for decades in order to do UI rendering, alpha blending, and possibly even decal rendering as well ...

All solutions being consistent with the last submitted geometry rule without ROPs would see massive performance hits to current applications and possibly even many applications in the future ...

Using ROVs (or D3D11/12 Rasterizer Ordered Views) is a really bad idea since it severely limits parallel execution of GPUs. No developers thinks that it's a sane concept to just outright potentially stall thousands of threads from execution hence why they don't use this feature at all in any of the games so far. ROVs are never going to scale on future GPU designs since it would constrain their parallelism ...

Implementing tile-based rendering just to gain programmable blending functionality is crazy. Sorting tons of geometry per-tile earlier in the pipeline would involve too much overhead for existing content ...

The most realistic solution might involve looking into per-pixel sorting (or linked list OIT) but bandwidth consumption gets out of control as we start to require more layers ...

The first two solutions above that I suggested is purely irresponsible from the standpoint of constraining the hardware vendors freedom to innovate their designs. If we even want to realistically get rid of the HW blending pipeline then graphics programmers must absolutely start doing their part to redefine a new graphics pipeline that is free from the burden of HW vendors providing such a behaviour that's consistent with old graphics pipelines. Nanite's software rasterizer as an example is free from such a constraint by the virtue of it not being compatible with translucency ...
 
I'd rather like it if DF did a Kid A Mnesia Exhibition video. I don't think that musically it's any of their cup's of tea, but it's a novel use of UE4.
 
Status
Not open for further replies.
Back
Top