Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
It literally is!

Using the PS5 (very similar architecture) as a comparison point, you can see that in Watch Dogs it's basically the same amount of maths being done and more or less the same amount bandwidth being used. This means that there is more CU time going unused, and more bandwidth going unused on XSX.

These games are significantly underutilising XSX execution units and available bandwidth. Why is up for discussion, but that this is happening is not.

First thought would be to use higher native resolution settings (rather than the current dynamic solution) and other higher IQ settings on utilizing that untapped/underutilized CU and bandwidth, but of course it isn’t. And this underutilization (of rendering units) seems very familiar of PC GPUs having memory bandwidth limitations (usually not enough VRAM), and or bottlenecking issues/limitations dealing with system resources, small cache (too many GPU rendering units hammering cache resources), I/O limitations, and/or CPU resources/bandwidth limitations.

But of course, I’m only thinking of these potential issues from a PC perspective, and not solely from a console development environment perspective. As such, none of these issues mayn’t relate to “why” XBSX GPU CUs are being underutilize in certain situations where it could/should have an advantage over PS5. I suspect I know the reasons why, but I will wait for more next-generations games on making such calls.
 
Last edited:
The idea of the GDK is: code once, deploy multiple platforms. From what we can see, its pretty successful at doing this, as to how optimized this is, is a big unknown. It's definitely successful in deploying the games without crashing, but we can see there are some bleed over issues with settings being shared/carried over.
Do we know which versions of the PC version of the games are built on GDK?

Something I was curious about in the Dirt 5 RDNA2 video, where it shows its using things like VRS but do we know if its using it on XSX|S as the engine obviously supports it.
 
Do we know which versions of the PC version of the games are built on GDK?

Something I was curious about in the Dirt 5 RDNA2 video, where it shows its using things like VRS but do we know if its using it on XSX|S as the engine obviously supports it.
unfortunately no. We do not know. I suspect it's possible. I'm not sure why a developer would now make a separate build for so many platforms unless they had to.
 
unfortunately no. We do not know. I suspect it's possible. I'm not sure why a developer would now make a separate build for so many platforms unless they had to.
I can think of a couple reasons.
  • PC development started before GDK availability.
  • Toolset maturity
  • Only need to port and worry about console builds
  • Why port a working PC build when timeframes are compressed
If the xbox version doesn't have VRS maybe it will get patched in, then the question is why didn't it have it at launch and how much difference does it make. Obviously if it doesn't currently use it.
 
First thought would be to use higher native resolution settings (rather than the current dynamic solution) and other higher IQ settings on utilizing that untapped/underutilized CU and bandwidth, but of course it isn’t.

Higher resolution could slow down an already rasterisation limited pipeline. You can end up with a situation where you're cutting more and more resolution to save increasingly small amounts of time rendering the frame. I'd guess this is why Horizon 4 on X1X at 30 fps is 4k but 60 fps is 1080p.

And this underutilization (of rendering units) seems very familiar of PC GPUs having memory bandwidth limitations (usually not enough VRAM), and or bottlenecking issues/limitations dealing with system resources, small cache (too many GPU rendering units hammering cache resources), I/O limitations, and/or CPU resources/bandwidth limitations.

Doesn't seem like BW to me. At least not dram anyway. The amount of data being read and written is the same or less than something like PS5 or 5700 is comfortable with. For it to be a bandwidth or cache issue would no indicate a catastrophic failure of design by AMD and MS.

Which is of course possible, but I consider it unlikely. Losing ~20% IPC due to an enormous cache or memory bus fuck up on an otherwise efficient architecture would be a stunningly big mis step.

More likely that workloads don't suit it currently IMO.

But of course, I’m only thinking of these potential issues from a PC perspective, and not solely from a console development environment perspective. As such, none of these issues mayn’t relate to “why” XBSX GPU CUs are being underutilize in certain situations where it could/should have an advantage over PS5. I suspect I know the reasons why, but I will wait for more next-generations games on making such calls.

PC perspective makes me think it's more likely something in the fixed function units - per pixel and per triangle workloads that favour relatively less maths [but more pixels or triangles] rather than more maths. That and sharing an undercooked dev environment with every possible PC configuration in mind.

But why wait to put your ideas forward? That's the fun of a speculation thread! Besides, you're already sort of saying what you think... I think. Might as well enjoy the discussion before the answers are known!
 
Last edited:
There's a lot of focus on the tools right now but that's mainly because people are looking at direct comparisons between XSX and PS5.

The problem is, as you know, the maximum theoretical number of teraflops of a given piece of hardware is not indicative or what to expect in terms of performance. Too many people have bought into the higher-number-is-better mindset. NX Gamer's January video explains, and literally demonstrates, why a hardware configuration with a lower theoretical number of teraflops can out-perform a hardware configuration with a higher theoretical number of teraflops.


As you would expect, there are a lot of factors to contribute to this: configuration of the GPU, functional computation units (regardless of the architecture), clock speeds, cache, memory, bandwidth, CPU and APIs. None of these things are equal between PS5 and Series X so why are people focussing on just one metric and expecting the higher to result in more performance?

The wattage numbers are concerning for XSX. For basically 35 more watts than PS5s menu it's running a 30fps ray traced title. Seems off.

How are you comparing two different pieces of hardware and software and assessing performance based on the amount of power draw? You know that PS5 runs higher clocks and that there is a logarithmic scaling of power draw coming into play. There is a reason that PS5 has a high-rated PSU than Series X?

The idea of the GDK is: code once, deploy multiple platforms. From what we can see, its pretty successful at doing this, as to how optimized this is, is a big unknown.

Nobody knows how optimized PS5's tools are either. The thing about optimizing is that effective techniques only comes with experience and both consoles are brand new. What techniques work better than other techniques and how tools will adapt to help developers exploit said techniques is something that will take a while. What we do know, because Dirt 5's technical director said so, is that Xbox tools easy use and mature.
 
Last edited by a moderator:
You make a good point but it's two different sites using different testing methodology and tools that could alter results so maybe not be apples to apples, would be curious to see DF test on WDL

A bit of both, i read here the benefits of moving i/o to same die and a quick duckgo search brought results of being on par on single core tests
https://www.cpu-monkey.com/en/compare_cpu-amd_ryzen_9_4900hs-1285-vs-amd_ryzen_7_3700x-929
https://nanoreview.net/en/cpu-compare/amd-ryzen-9-4900hs-vs-amd-ryzen-7-3700x
https://www.notebookcheck.net/AMD-R...in-startling-UserBenchmark-test.458960.0.html
Im not particularly familiar with these sites not sure how reputable they are.

Not all benchmarks will be cache size limited. Or rather, if the benchmark is already largely able to run within the Renior cache, then a larger cache on Matisse isn't going to make much difference. You're more likely to see that kind of behavior on single core tests which are naturally using a smaller dataset. The first 2 links show the 3700x performing on par (for the same clock speed more or less) with the 4900HS but they also show it comfortably beating the 4900HS by between 10-15% in the multicore tests which are likely to be more limited by cache size. That despite the 3700x only having a 2% clock speed advantage. And I imagine the more the workload spills out of the 8MB cache on the Renior, the more the Matisse would extend it's lead. Games are notoriously cache sensitive.

Obviously on consoles developers will be able to mitigate that somewhat by targeting the application at the available cache size to minimize misses. But that reduced flexibility is likely to come with it's own performance penalty and simply won't be possible to achieve with 100% success.

Keep in mind the off die I/O which affects the 3700x
If Renoir IPC is on par or close too, sharing L3 will further increase it's ipc

I'm not sure the separate IO die will be having a negative impact on the 3700x given it's a single chiplet design. Multi chiplet designs that have to communicate through the IO die could be negatively impacted but not so much single chiplet designs.

wdym? explain

Ridiculous was probably a bit too strong, apologies. However it seems to be a very big stretch to assume that there is an Infinity cache equivalent in the PS5 IO complex that can effectively be used as a GPU L3 cache when it sits outside of the GPU, we have no idea how large it is or what latencies it has, and there has been no previous suggestion from any source of it being used in that way. This sounds quite reminiscent of the never ending secret sauce debates.
 
The problem is, as you know, the maximum theoretical number of teraflops of a given piece of hardware is not indicative or what to expect in terms of performance. Too many people have bought into the higher-number-is-better mindset. NX Gamer's January video explains, and literally demonstrates, why a hardware configuration with a lower theoretical number of teraflops can out-perform a hardware configuration with a higher theoretical number of teraflops.


I think that most of the things discussed in that video don't apply to the current XSX vs PS5 comparisons though. NX is comparing very different architectures with very different memory bandwidths and capacities running with different CPU's and very different API's. I understand that's the entire point of his video - to show these things matter too, but I think you'd had to be pretty naive to look at TF in complete isolation when comparing across such different setups (although I know some do!). In the PC space specifically, one of the biggest determining factors of performance that doesn't apply at all to consoles and that would certainly be impacting the 750Ti today is driver and developer support. Being Maxwell 1.0 (already a niche architecture at the time of launch) it will have long since dropped off Nvidia's and game developers support radar.

And while I absolutely agree that comparisons of TF alone don't tell the full picture with these consoles, I do think they can offer some valuable insight into their relative capabilities, especially when taken in context of other components. Far more so than the NX comparison as with the consoles we're talking about near identical architectures with the same amount of VRAM connected to near identical CPU's, and both running low level console API's where developers can specifically target the game code at the hardware. Their relative memory bandwidths are also largely in line with the TFLOP difference.

In terms of API's and development environment I'd also expect them to be pretty comparable over the long term (although not necessarily right now). That really only leaves the differences in the front end, ROPS, cache and interconnect speeds (all clock speed related) as the major performance differentiators outside of "TFLOPS". People need to understand that the PS5 is faster than the XSX in those areas and it's likely both systems will be faster than the other under different scenario's that stress different parts of the system.
 
Last edited:
I think that most of the things discussed in that video don't apply to the current XSX vs PS5 comparisons though. NX is comparing very different architectures with very different memory bandwidths and capacities running with different CPU's and very different API's.

NX Gamer disagrees.


Mark Cerny spent a chunk of his Road to PS5 talk explaining why they went narrow and fast versus wide and slower. We could just ignore him though and wonder weirdly why fast/narrow can be better than wide/slower even though he explained it.

This whole weird denial of PS5 being competitive with a machine that is, on paper, faster, but isn't materially in practise is tedious at this point. We've had credible source like Jason Schreier saying this is what he's hearing from multiplatform developers. But no, we have to find some reason for it; broken tools, buggy APIs. If we keep digging we'll find it!

Let's ignore what Mark Cerny said, let's ignore analysis from folks who break down technological differences, let's ignore what journalists have back from developers, let's ignore all the enthusiasm from devs about PS5 being super cool to work with. Let's go with.. there must be something else.
 
Last edited by a moderator:
I imagine he's just being flippant. I can't believe he really thinks comparing a nearly 7 year Maxwell 1.0 GPU in PC to GCN in console is a good proxy for comparing RDNA 2 to RDNA 2 in contemporary consoles.

NXgamer is very biased to begin with, and very much so to everything playstation. Its a bad source and even the tweet dsoup posted has comments in it accusing NXgamer of console warring. Tweets like that even appearing here is astounding.
 
I imagine he's just being flippant. I can't believe he really thinks comparing a nearly 7 year Maxwell 1.0 GPU in PC to GCN in console is a good proxy for comparing RDNA 2 to RDNA 2 in contemporary consoles.
It was a demonstration, nothing more. A single number on it's own means nothing. Series X and PS5 are more different than alike even though they share a common APU architecture. Microsoft, as they've heavily hinted, have a RDNA2 chip. Sony have something else.
 
NX Gamer disagrees.


Mark Cerny spent a chunk of his Road to PS5 talk explaining why they went narrow and fast versus wide and slower. We could just ignore him though and wonder weirdly why fast/narrow can be better than wide/slower even though he explained it.

This whole weird denial of PS5 being competitive with a machine that is, on paper, faster, but isn't materially in practise is tedious at this point. We've had credible source like Jason Schreier saying this is what he's hearing from multiplatform developers. But no, we have to find some reason for it; broken tools, buggy APIs. If we keep digging we'll find it!

Let's ignore what Mark Cerny said, let's ignore analysis from folks who break down technological differences, let's ignore what journalists have back from developers, let's ignore all the enthusiasm from devs about PS5 being super cool to work with. Let's go with.. there must be something else.
Who is NXGamer and why should we care? I looked at that video "TF dont matter" and it does not describe current situation at all. Narrow and fast is better, yes, if we are talking about what Cerny said (identical GPU, same TF, different way to get to it - clk speed vs additional CU).

I imagine he's just being flippant. I can't believe he really thinks comparing a nearly 7 year Maxwell 1.0 GPU in PC to GCN in console is a good proxy for comparing RDNA 2 to RDNA 2 in contemporary consoles.
On top of it, that is not even what Cerny was comparing. Cerny compared two identical GPUs, with same TF number but one achieving it with faster clock speeds, other with more CUs.

These consoles do not have same TF, nor do they have same BW, or even L2 amount so they are not the same although I think clocking anything higher while under TDP limits is way forward duo to price per transistor on smaller nodes.

Its funny thing because many of B3D members, not to mention other fans and enthusiasts around internet, were shutting down narrow and fast because apparently Cerny likes wider GPU design, and GPU clocking 2.0GHz was consider way to fast. Enter 2.2GHz :)
 
It was a demonstration, nothing more. A single number on it's own means nothing. Series X and PS5 are more different than alike even though they share a common APU architecture. Microsoft, as they've heavily hinted, have a RDNA2 chip. Sony have something else.

Better source a dev with 24 AAA games release



https://medium.com/@mattphillips/te...-of-comparing-videogame-consoles-4207d3216523

He was probably knowing the surprising performance of PS5 compared to XSX and write this tweet and medium article. After maybe in two years XSX will be a bit above PS5 but at the end the gap is close.

We never had since the GPU being based on PC GPU a generation where the two platform holder decided to do some console easy to use for developer and centered on games. And they targeted the same MSRP.

No CPU difficult to program like the CELL with a last minute GPU change, no hardware made around Kinect or like with the PS4 PRo and XB1X one year of gap release and 100 dollars difference in MSRP.

And the APU is made by the same supplier, nothing so surprising about the PS5 and XSX situation.
 
Last edited:
Who is NXGamer and why should we care?

You should only care if you value technical analysis. If you don't, why are you here? Who is "we"?

I looked at that video "TF dont matter" and it does not describe current situation at all. Narrow and fast is better, yes, if we are talking about what Cerny said (identical GPU, same TF, different way to get to it - clk speed vs additional CU).

Of course it doesn't, it's there to demonstrate the principle that the teraflops are not the be-all-and-end-all of metrics when it comes to performance.

On top of it, that is not even what Cerny was comparing. Cerny compared two identical GPUs, with same TF number but one achieving it with faster clock speeds, other with more CUs.
Firstly it was two theoretical GPUs. Secondly, the purpose of comparing two theoretical GPUs with the same theoretical teraflops number to was demonstrate who even when the Tf numbers are the same, why're not comparable.

Its funny thing because many of B3D members, not to mention other fans and enthusiasts around internet, were shutting down narrow and fast because apparently Cerny likes wider GPU design, and GPU clocking 2.0GHz was consider way to fast. Enter 2.2GHz :)
I don't know what this is a reference too.
 

This take always galls me. I like it in spirit, its good to push back on the audience's incessant demands for constant expensive tech progress, but also it's pretty clear you're not closely involved in art or graphics if you think we're anywhere close to any kind of a ceiling.

Sure, there's a ton more R&D left to optimize and invent new ways around problems on current hardware, but its also very easy to think of essential things to spend extra rendering budget on. (And it's not rare at all to see smaller devs with less staff make games that push hardware limits!)
 
NX gamer shouldnt be used as a great source perhaps.

Why?

NXgamer is very biased to begin with, and very much so to everything playstation. Its a bad source and even the tweet dsoup posted has comments in it accusing NXgamer of console warring. Tweets like that even appearing here is astounding.

Are you serious? Please stop self-projecting your own behavior onto others. DF, NX, Cherno, VG, ElAnalistaDeBits and all of the rest whom provide quality analysis and/or opinions shouldn't be turned away because of your own personal biases.
 
Last edited:
Well, he is not neutral and always trying to explain stuff,he did not really understand as well ;)

It is nice that he makes Res/Framerate videos to spot differences, but he does not really know much of what he is talking about. DF is much more professional in that regard.

This is the same garbage talk leveraged against DF and many others. Please stop with the nonsense. Reality is quickly sinking in with these consoles, and no need to kill the messengers.
 
Status
Not open for further replies.
Back
Top