Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
That would be valid look at things. We don't actually have a RDNA with 52 CUs though, so it's hard to say how much the additional CUs are adding to performance here vs console advantages.

I would guess very little. They add some texture sampling performance and ALU for the most part. I guess it depends whether gears is TMU bound or ALU bound. I'd guess ALU bound for the most part like most games.
 
I would guess very little. They add some texture sampling performance and ALU for the most part. I guess it depends whether gears is TMU bound or ALU bound. I'd guess ALU bound for the most part like most games.
considering that Radeon VII is 60 CUs.. running 1800 Mhz, with 1TB of bandwidth and 16GB of capacity. I'm bound to agree. I'm not sure what the bottleneck is to be honest.

they definitely have an occupancy problem on the radeon though, RDNA 1 is running circles around it. RDNA 2 much more so.
 
What's does it even mean ? total cache ? It's totally useless and a smart...PR statement. They probably count every memory things (like registers) that are on the APU. Bullshit statement. They were much more specific (and proud) in the case of XBX L2 cache:



Here on Series X they are proud of their locked clocks, the number of CU, the total Tflops count and finally the presence of AI integer silicon (I am talking about hardware features, not the rest which is software features). Not L1 or L2 cache, or they would have stated it. "We have twice more L2 cache than 5700XT" (meaning we have twice more L2 cache than PS5). Like they did here: "we have locked clocks (PS5 doesn't), "We have 52CUs" (PS5 only 36) etc.

You seem a tad bit emotional here over something that may not be worth discussing. You're comparing 2 different products here. Scorpio was meant to make XBO games go to 4K. They had to take an existing architecture and make it work at 4K resolution. There were going to be changes to things to make that happen.
If the cache levels are sufficient for 4K given the number of CUs, why change things? The silicon could be useful elsewhere. GCN 2 was never designed around 4K. But RDNA 2 probably is.

MS is going through the same build process like they did with Scorpio, they're simulating live code on the device well before they burned the chip. They released Ray tracing earlier and I'm sure that gave them a fairly real look at RT performance on their console as well. All of this before they started burning chips. And I'm sure once the first few chips were made, they invited developers in to test their game code on it. And they talk about that, you can see the different existing games running on Scarlett, and it's hitting the performance marks where they want them to.

They (DF) literally wrote about Gears 5 performance hitting 2080 (whoops wrote TI earlier) performance. They talked about Minecraft RTX, which is not quite at 2080 performance. It's all there. We don't need to guess at Xbox's performance, we have some benchmarks already that can tell us where it sits approximately.

If they didn't care to beef the cache, maybe there was bigger items to resolve. I don't think it's bullshit, maybe its just not needed this time around.

I suspect they might have thought Sony could go the mobile route of 8MB L3, which would make their number look very big in comparison if they retained a full 32MB L3. I think that if you add up the various register files, they’ll pale in comparison to the L2 and L3 sizes. Someone did the math with a full Zen 2 chiplet’s caches and RDNA 1 caches and still came up several MB short, so perhaps there’s a DMA buffer or some sort of ML or Audio processor cache on die as well?
 
I suspect they might have thought Sony could go the mobile route of 8MB L3, which would make their number look very big in comparison if they retained a full 32MB L3. I think that if you add up the various register files, they’ll pale in comparison to the L2 and L3 sizes. Someone did the math with a full Zen 2 chiplet’s caches and RDNA 1 caches and still came up several MB short, so perhaps there’s a DMA buffer or some sort of ML or Audio processor cache on die as well?
It may have not been a worthwhile talking point for both of them. It's such a small detail that only a small cache of the population would care
 
I suspect they might have thought Sony could go the mobile route of 8MB L3, which would make their number look very big in comparison if they retained a full 32MB L3. I think that if you add up the various register files, they’ll pale in comparison to the L2 and L3 sizes. Someone did the math with a full Zen 2 chiplet’s caches and RDNA 1 caches and still came up several MB short, so perhaps there’s a DMA buffer or some sort of ML or Audio processor cache on die as well?
Mm... I wouldn’t put too much faith in my numbers tbh. :p

@3dilettante brought up some good points about not being able account for a large chunk of SRAM from previous attempts over the years against PR comments while the server-vs-console discussion does make it dubious.
 
You're both right. When we remove the marketing talk and the potential that features have, and just look at raw power, then I find the Gears 5 benchmark to be useful. But it's not indicative of potential performance in the future.

I think the easiest way (only because we have 1 XSX benchmark here) is to just compare a 5700XT running Gears 5 4K @ Ultra settings. DF reports XSX is 60fps flawless and it's running higher than ultra settings. Though that might be console advantage there, or the results of testing were not thorough enough I'm not sure. But the 5700XT here is woefully behind. The 5700XT 50th Anniversary edition is fairly lock match for the PS5 running the same memory and compute performance. However in benchmarks the 50th Anniversary edition doesn't really pull all that far away from the 5700XT.
9199_502_gears-benchmarked-1080p-1440p-4k-rtx-2080-ti-pushes-60fs.png

I'm pretty sure they clarified, possibly here on B3D that that the benchmark version did not include the above PC Ultra enhancements, definitely not Ray Tracing at least. They also stated it was basically 2080 performance and solid 60fps which means they're using a different benchmark to the one you've posted results to above where the 2080 doesn't come close to 60fps.

Based on the 5700XT's gaming clock of 1755Mhz the XSX has 30% more fill rate, 35% more ALU and 25% more memory bandwidth than the 5700XT. Lets hypothetically say that'll result in 35% more performance (ignoring console efficiencies but assuming at least a little uplift from RDNA2. That would equate to performance in the above benchmarks of around 53 fps. So 2080/2080s level performance seems to be in the right ballpark based on that example (again ignoring console optimisations). Obviously there are too many variables to draw too much a conclusion from that though. RDNA2 uplift over RDNA1 and memory contention could probably swing things down to a 2070 or up past a 2080Ti. And of course this is just one game.
 
I'm pretty sure they clarified, possibly here on B3D that that the benchmark version did not include the above PC Ultra enhancements, definitely not Ray Tracing at least. They also stated it was basically 2080 performance and solid 60fps which means they're using a different benchmark to the one you've posted results to above where the 2080 doesn't come close to 60fps.

Based on the 5700XT's gaming clock of 1755Mhz the XSX has 30% more fill rate, 35% more ALU and 25% more memory bandwidth than the 5700XT. Lets hypothetically say that'll result in 35% more performance (ignoring console efficiencies but assuming at least a little uplift from RDNA2. That would equate to performance in the above benchmarks of around 53 fps. So 2080/2080s level performance seems to be in the right ballpark based on that example (again ignoring console optimisations). Obviously there are too many variables to draw too much a conclusion from that though. RDNA2 uplift over RDNA1 and memory contention could probably swing things down to a 2070 or up past a 2080Ti. And of course this is just one game.
indeed. I don't disagree. Too many variables, too many unknowns. I would like to say it's easy to give or take 10-15% uncertainty, but that is the difference in both price and performance between some of these products. We will need to wait for the official gears 5 XSX edition to see where things are really landing.
 
All we can realistically state right now is the percentage difference in flops. It's a simple calculation to understand. No need to loosely abstract it by comparing to a different company's architecture.

True comparisons can surely only be made when the devices are released and a third party game is released with and unlocked framerate. I suspect we'll see exactly the calculated difference as resolution or framerate - all else being equal. Anyone suggesting it'll be significantly more or less than that amount are only fooling themselves.
 
Last edited by a moderator:
What are the underlying basis and figures are these based on?
Current 4pro games, future engines, how much of the overall rendering is it referring to?

There will case scenarios from a low numbers of CUs to a high number. The optimum number of CUs, were they to exist, could be many more than Series X. That's just the nature of solving some problems, some parallelise well and some do not. But as I said above, and speaking from a good tens years re-writing many conventional single-core algorithms to work efficiently on wide-parallel architectures before Core2Duo was a glint in Intel's eye, sometimes going wider is detrimental because there are other considerations like there enough cache before you have to hit slow memory.

The issue of core utilisation redundancy is an issue many folks encountered nigh on 20 years ago. Sometimes trying to do too much in a parallel architecture is worse than idling the cores, and doing some tasks in a sequential linear way - old skool.

Otherwise just as simple to say 52 is the optimal amount compared to 36 or 60 for the exact same reasons you gave.
Yeah, and I said that would happen. It's in my post! :yes: And to repeat, there will be things that PS5's approach is better suited for, there will be things that Series X's approach is better suited at. Because they're different.

I am oft reminded of places in Fallout 4 running faster on Xbox One than PS4 in areas of the Far Harbour DLC because that it was speculated that One's EDRAM setup was really good for the way Bethesda initially implemented the fog effects.
 
The issue of core utilisation redundancy is an issue many folks encountered nigh on 20 years ago.
CPU's and GPUs are pretty different in this regard. There literally designed and built for parallel work loads(which graphics is well suited for). That's the tasks that they are for. Not the same as trying to turn sequential code to work in parallel or on a transputer.
But yea, we don't actually know the optimal amount. But I'm sure all things being equal that where not hitting it just yet. Although that's obviously dependent on architecture and workloads. What's the odds that amd release a bigger rdna 2 chip?
Yeah, and I said that would happen. It's in my post! :yes: And to repeat, there will be things that PS5's approach is better suited for, there will be things that Series X's approach is better suited at. Because they're different.
Sorry, should've been clear was general statement not directed at you.
 
CPU's and GPUs are pretty different in this regard. There literally designed and built for parallel work loads(which graphics is well suited for). That's the tasks that they are for. Not the same as trying to turn sequential code to work in parallel or on a transputer.
But yea, we don't actually know the optimal amount. But I'm sure all things being equal that where not hitting it just yet. Although that's obviously dependent on architecture and workloads. What's the odds that amd release a bigger rdna 2 chip?

Sorry, should've been clear was general statement not directed at you.
Yup. Super easy for me to call in as many threads and threadblocks as I need in cuda. Literally changing 2 variables on the submission, <<threadblocks, thread, number of shit I need to work on>>
And it's super fast.
Complete and total PITA to do it on a CPU. Threadpools are supposed to be the easiest, then you gotta do async calls and functions and shit. So messy and painful
 
Based on the 5700XT's gaming clock of 1755Mhz the XSX has 30% more fill rate, 35% more ALU and 25% more memory bandwidth than the 5700XT. Lets hypothetically say that'll result in 35% more performance (ignoring console efficiencies but assuming at least a little uplift from RDNA2. That would equate to performance in the above benchmarks of around 53 fps. So 2080/2080s level performance seems to be in the right ballpark based on that example

2080/2080S performance for XSX right?

And to repeat, there will be things that PS5's approach is better suited for, there will be things that Series X's approach is better suited at. Because they're different.

In their respective situations yes. The XSX still has a gpu advantage in raw power.
 
In their respective situations yes. The XSX still has a gpu advantage in raw power.
No one's debating this man ;)

We covered a lot of ground on the potential of performance for both these devices; it was a good discussion.

I do happen to think that XSX is in my mind, a little more 'locked' in terms of knowing what to expect from its performance. It can only go up (and not really down if you think about it) from when DF reported on it; so at least we have a baseline figure to work with. While I tried to profile PS5, at the end of the day, we don't have anything metric to really compare it against except to draw some further out conclusions with an older architecture running on PC.

I don't think there's more to discuss on this front until more information is unveiled. Currently focusing my attention back on texture compression to understand more about BCPack and Kraken, I do have questions, researching some answers right now.
 
No one's debating this man ;)

We covered a lot of ground on the potential of performance for both these devices; it was a good discussion.

I do happen to think that XSX is in my mind, a little more 'locked' in terms of knowing what to expect from its performance. It can only go up (and not really down if you think about it) from when DF reported on it; so at least we have a baseline figure to work with. While I tried to profile PS5, at the end of the day, we don't have anything metric to really compare it against except to draw some further out conclusions with an older architecture running on PC.

I don't think there's more to discuss on this front until more information is unveiled. Currently focusing my attention back on texture compression to understand more about BCPack and Kraken, I do have questions, researching some answers right now.

yes we need the full specs of all the hardware to see how the systems perform.
 
2080/2080S performance for XSX right?

Yes in this one example at least. But then on TechPowerUp they list the average performance advantage of the 2080 over the 5700XT in 4k across a range of modern games at only 23% suggesting the XSX is going to be faster than a 2080 in current games even before it starts seeing any architecture and console specific optimisations.
 
2080/2080S performance for XSX right?
It's somewhere around 2080S (or was it even slightly higher) if we assume the card clock to clock as fast as Navi 10 and scales perfectly with FLOPS. However, we do know that AMD is claiming IPC improvements and whatnot, so it could push it's performance higher too, and then again if there's some bottleneck it could end up slower too
 
However, we do know that AMD is claiming IPC improvements and whatnot,
Those IPC improvements will be mostly used to drive the power consumption down. Which is what AMD stated in the first place, 50% more performance per watt.

Looking at the preliminary Series X Gears 5 performance we see that is indeed being the case, the Series X is about 25% faster than the 5700XT flops vs flops, and it's delivering almost the same performance as that amount ( a couple more percents higher) by delivering the same fps as the RTX 2080 at 4K.
 
Last edited:
Status
Not open for further replies.
Back
Top