Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

chris1515 · May 4, 2020

DSoup said:
Microsoft committed to no first-party nextgen exclusive game at launch and I'm sure I saw a more explicit statement several months back committing to two years on this - from the point of that statement. That would have the first ones potentially showing up - at least in marketing, if not actual release - in 2021. I don't think anybody really expects Microsoft and Sony to blindly cling on to 2013-tech indefinitely.

Microsoft but not Sony...

iroboto · May 4, 2020

DSoup said:
Microsoft committed to no first-party nextgen exclusive game at launch and I'm sure I saw a more explicit statement several months back committing to two years on this - from the point of that statement. That would have the first ones potentially showing up - at least in marketing, if not actual release - in 2021. I don't think anybody really expects Microsoft and Sony to blindly cling on to 2013-tech indefinitely.

Correct. 2 years, so the first 'xsx exclusives' could show up as early as the beginning of 2022, but not within calendar year of 2021.

RDGoodla · May 4, 2020

iroboto said:
have you ever compared 2080TI vs the rest of the Turing lineup?
Hint: way less clocks, way more cuda cores.

And 2080Ti has more ROPs. So I am not sure how to estimate the gap between xsx and PS5 with Turing GPUs.

Is there any actual number of in game performance gap? Single-digit percentage or 15% or 25%, even more?

iroboto · May 4, 2020

RDGoodla said:
And 2080Ti has more ROPs. So I am not sure how to estimate the gap between xsx and PS5 with Turing GPUs.

Is there any actual number of in game performance gap? Single-digit percentage or 15% or 25%, even more?

ROPs depend on bandwidth to do their work. So the number of ROPs won't matter if you're bandwidth bound. In this case, XSX provides more bandwidth than PS5. So it will outperform it in those measures. I posted on this a while back.

That's always been the catch of increasing the clocks, you're still bound by bandwidth limitations, gpu clock speeds don't increase memory clock speeds.

Link here:
https://forum.beyond3d.com/posts/2111559/

ROP discussion doesn't seem all that critical for next gen though, we have been moving towards more compute not less.

All in all, both consoles have more than sufficient texture units and render outputs to saturate more than bandwidth availability.
Simplistically this makes it a min of the amount of compute and bandwidth. (only simplistically because we're comparing the same architecture)
Assuming they are using the exact same architecture, everything else is the same, then generally speaking this should largely be the performance delta between the two using the same code.

On the compute side of things, PS5 will be combating memory bandwidth and latency issues since its pipelines are more likely to sit idle waiting for memory to come in (less CUs to schedule + faster clockrate) than XSX. XSX will need to rely on good scheduling to keep things filled, poor scheduling/poorly designed code will lower performance.

Globalisateur · May 4, 2020

iroboto said:
ROPs depend on bandwidth to do their work. So the number of ROPs won't matter if you're bandwidth bound. In this case, XSX provides more bandwidth than PS5. So it will outperform it in those measures. I posted on this a while back.

That's always been the catch of increasing the clocks, you're still bound by bandwidth limitations, gpu clock speeds don't increase memory clock speeds.

Link here:
https://forum.beyond3d.com/posts/2111559/

ROP discussion doesn't seem all that critical for next gen though, we have been moving towards more compute not less.

All in all, both consoles have more than sufficient texture units and render outputs to saturate more than bandwidth availability.
Simplistically this makes it a min of the amount of compute and bandwidth.
Assuming they are using the exact same architecture, everything else is the same, then generally speaking this should largely the performance delta between the two using the same code.

No. you can't bench the ROPs using napkin math formulas. I know the MS side is known to use theoretical numbers in this or that ideal condition. But it really depends of what kind of alphas are used. In some cases fillrate will be higher on PS5 like it's already the case on Pro. On pro a typical benchmarck Sony use shows a measured fillrate higher than the max theoretical fillrate possible on XBX, and that bench was using about 160GB/s of bandwidth.

function · May 4, 2020

Globalisateur said:
No. you can't bench the ROPs using napkin math formulas. I know the MS side is known to use theoretical numbers in this or that ideal condition. But it really depends of what kind of alphas are used. In some cases fillrate will be higher on PS5 like it's already the case on Pro. On pro a typical benchmarck Sony use shows a measured fillrate higher than the max theoretical fillrate possible on XBX, and that bench was using about 160GB/s of bandwidth.

Do you have a link to that particular benchmark? I'd be interested to see what they were doing that was using so little bandwidth.

In actual games the X1X can be putting out anywhere from 40% to 100% more pixels [edit: in a given period of time], so I'm inclined to think that is most cases, in the real world, bandwidth is the real limiter.

Karamazov · May 4, 2020

Pinstripe said:
You mean like AC: Valhalla? That looks like a X1/PS4 game with 4K textures and less blur.

I don't think we'll see bona fide Next-gen games prior 2022.

Oh you already saw next gen versions ? Where ?

AlNom · May 4, 2020

function said:
Do you have a link to that particular benchmark? I'd be interested to see what they were doing that was using so little bandwidth.

In actual games the X1X can be putting out anywhere from 40% to 100% more pixels [edit: in a given period of time], so I'm inclined to think that is most cases, in the real world, bandwidth is the real limiter.

Maybe a depth-only pass where depth is easier to compress, and so having double the units would still be effective with less raw bandwidth.

ultragpu · May 4, 2020

DSoup said:
Microsoft committed to no first-party nextgen exclusive game at launch and I'm sure I saw a more explicit statement several months back committing to two years on this - from the point of that statement. That would have the first ones potentially showing up - at least in marketing, if not actual release - in 2021. I don't think anybody really expects Microsoft and Sony to blindly cling on to 2013-tech indefinitely.

I mean MS can release all the crossgen titles they want in the first two years after their new consoles release, but nothing is stopping them from showing trailers of their native next gen games coming 2-3 years down the line. That's the scenario I'm seeing from their statement. One example being Hellblade 2, despite the fact it's all pre rendered in engine, they still wanna show us what they want to achieve graphically in the foreseeable future and tell us how they can compete with Sony on a more equal ground. That been said I hope MS don't release too many of those pre rendered trailers like HB2 as they're still not a true indication of the actual gameplay graphics and might cause more backlash if they downgraded it later on.

Pinstripe · May 4, 2020

Karamazov said:
Oh you already saw next gen versions ? Where ?

AC: Valhalla is confirmed to be a cross-platform title. So will most titles being released this and next year. And those few first-party titles showsn will either be silly gimmicks or canceled PS4-games repurposed at the last moment for PS5. It will take a while for true Next-Gen experiences.

ultragpu · May 4, 2020

iroboto said:
ROPs depend on bandwidth to do their work. So the number of ROPs won't matter if you're bandwidth bound. In this case, XSX provides more bandwidth than PS5. So it will outperform it in those measures. I posted on this a while back.

That's always been the catch of increasing the clocks, you're still bound by bandwidth limitations, gpu clock speeds don't increase memory clock speeds.

Link here:
https://forum.beyond3d.com/posts/2111559/

ROP discussion doesn't seem all that critical for next gen though, we have been moving towards more compute not less.

All in all, both consoles have more than sufficient texture units and render outputs to saturate more than bandwidth availability.
Simplistically this makes it a min of the amount of compute and bandwidth. (only simplistically because we're comparing the same architecture)
Assuming they are using the exact same architecture, everything else is the same, then generally speaking this should largely be the performance delta between the two using the same code.

On the compute side of things, PS5 will be combating memory bandwidth and latency issues since its pipelines are more likely to sit idle waiting for memory to come in (less CUs to schedule + faster clockrate) than XSX. XSX will need to rely on good scheduling to keep things filled, poor scheduling/poorly designed code will lower performance.

Would you say a 9TF 2070s (PS5) vs a 11TF 2080s (XSX) to be the most comparable difference here? 2080s has more CUDA cores, more memory bandwidth, faster memory and boost clock speed. Despite having a 2 TF compute lead, in practice it's only 4-8fps faster at 4k, the extra fps wouldn't even warrant a decent increase in resolution.
https://www.digitaltrends.com/computing/rtx-2080-super-vs-rtx-2080-vs-rtx-2070-super/
RDNA 2 might scale a bit differently tho, who knows.

BRiT · May 4, 2020

ultragpu said:
Would you say a 9TF 2070s (PS5) vs a 11TF 2080s (XSX) to be the most comparable difference here? 2080s has more CUDA cores, more memory bandwidth, faster memory and boost clock speed.

Hard to say since those RTX Super cards only have a 10% memory bandwidth difference (448 vs 496) instead of a 25% memory bandwidth difference (448 vs 560).

ultragpu · May 4, 2020

BRiT said:
Hard to say since those RTX Super cards only have a 10% memory bandwidth difference (448 vs 496) instead of a 25% memory bandwidth difference (448 vs 560).

But aren't you forgetting the ram amount for the 25% difference is 10GB vs 16 GB? Not to mention the much higher difference in clock speed too? But I agree it is bit hard to tell with those things are muddied up like that.

iroboto · May 4, 2020

Globalisateur said:
No. you can't bench the ROPs using napkin math formulas. I know the MS side is known to use theoretical numbers in this or that ideal condition. But it really depends of what kind of alphas are used. In some cases fillrate will be higher on PS5 like it's already the case on Pro. On pro a typical benchmarck Sony use shows a measured fillrate higher than the max theoretical fillrate possible on XBX, and that bench was using about 160GB/s of bandwidth.

That's fair, I can't account for everything like seeing how delta colour compression, or different ways memory is access.
Unfortunately, the counter point here results in favour X1X in terms of overall performance - and in amounts that are in range between 40-100% resolution differences, but the resolution differences far exceed the difference in compute capacity. And ROPS are higher on 4Pro by double, so that only leaves bandwidth as the largest limiting factor.

ultragpu said:
Would you say a 9TF 2070s (PS5) vs a 11TF 2080s (XSX) to be the most comparable difference here? 2080s has more CUDA cores, more memory bandwidth, faster memory and boost clock speed. Despite having a 2 TF compute lead, in practice it's only 4-8fps faster at 4k, the extra fps wouldn't even warrant a decent increase in resolution.
https://www.digitaltrends.com/computing/rtx-2080-super-vs-rtx-2080-vs-rtx-2070-super/
RDNA 2 might scale a bit differently tho, who knows.

No, I won't compare the 2 rdna 2 gpus to turing architecture. There are nuances to simplicity. Simplicity is supposed to provide a general view of things, but it's not supposed look at exception cases, or in the case of games, if the exception is too good to pass up, the exception becomes the norm. -- See Cell. Comparing the same architectures however, you can sort of start crossing things out.

Sebbbi did build a performance analyzer here for types of shader workloads vs bandwidth:
https://github.com/sebbbi/perftest

These are all done on DX11 btw. So not a complete view of things.

But before we go further, I'm just going to reiterate sebbbi's caveat before fanboys use this as fodder:
The purpose of this application is not to benchmark different brand GPUs against each other. Its purpose is to help rendering programmers to choose right types of resources when optimizing their compute shader performance.

He states:
All results are compared to Buffer<RGBA8>.Load random result (=1.0x) on the same GPU.
Random loads: I add a random start offset of 0-15 elements for each thread (still aligned). This prevents GPU coalescing, and provides more realistic view of performance for common case (non-linear) memory accessing. This benchmark is as cache efficient as the previous. All data still comes from the L1 cache.

.
Benchmarks are unfortunately all over the place when you compare across architectures, but they are quiet similar when you compare within architectures. (see GCN2-5). Both Intel and Nvidia had completed some benchmarks that provide certain areas some massive improvements.
See here for example: Navi 5700XT
Buffer<RGBA8>.Load uniform: 12.519ms 1.008x
Buffer<RGBA8>.Load linear: 12.985ms 0.972x
Buffer<RGBA8>.Load random: 12.617ms 1.000x

Compared to Maxwell 980TI
Buffer<RGBA8>.Load uniform: 2.452ms 14.680x
Buffer<RGBA8>.Load linear: 35.773ms 1.006x
Buffer<RGBA8>.Load random: 35.996ms 1.000x

Comparing to Kepler 600/700 series
Buffer<RGBA8>.Load uniform: 3.598ms 53.329x
Buffer<RGBA8>.Load linear: 193.676ms 0.991x
Buffer<RGBA8>.Load random: 191.866ms 1.000x

So those driver improvements on uniform memory address loads, made a massive significant boost in performance whenever those types of workloads come up. I don't know what the driver/api performance situation is like on console (for obvious reasons sebbbi cannot show it). but whenever I think back to developers trying to optimize for nvidia, I mean, yea I'd try to take advantage of uniform loads if the opportunity arises.

But here is Volta:
Buffer<RGBA8>.Load uniform: 5.155ms 3.538x
Buffer<RGBA8>.Load linear: 16.726ms 1.090x
Buffer<RGBA8>.Load random: 18.236ms 1.000x

And you can see from architecture to architecture things can change. And as you can see, throughput benchmarks don't tell the whole story either.
sebbbi says:
NVIDIA Volta results (ratios) of most common load/sample operations are identical to Pascal. However there are some huge changes raw load performance. Raw loads: 1d ~2x faster, 2d-4d ~4x faster (slightly more on 3d and 4d). Nvidia definitely seems to now use a faster direct memory path for raw loads. Raw loads are now the best choice on Nvidia hardware (which is a direct opposite of their last gen hardware). Independent studies of Volta architecture show that their raw load L1$ latency also dropped from 85 cycles (Pascal) down to 28 cycles (Volta). This should makes raw loads even more viable in real applications. My benchmark measures only throughput, so latency improvement isn't visible.

Uniform address optimization: Uniform address optimization no longer affects StructuredBuffers. My educated guess is that StructuredBuffers (like raw buffers) now use the same lower latency direct memory path. Nvidia most likely hasn't yet implemented uniform address optimization for these new memory operations. Another curiosity is that Volta also has much lower performance advantage in the uniform address optimized cases (versus any other Nvidia GPU, including Turing).

And here is Turing 2080ti:
Buffer<RGBA8>.Load uniform: 1.336ms 12.247x
Buffer<RGBA8>.Load linear: 16.825ms 0.973x
Buffer<RGBA8>.Load random: 16.364ms 1.000x

NVIDIA Turing results (ratios) of most common load/sample operations are identical Volta. Except wide raw buffer load performance is closer to Maxwell/Pascal. In Volta, Nvidia used one large 128KB shared L1$ (freely configurable between groupshared mem and L1$), while in Turing they have 96KB shared L1$ which can be configured only as 64/32 or 32/64. This benchmark seems to point out that this halves their L1$ bandwidth for raw loads.

Uniform address optimization: Like Volta, the new uniform address optimization no longer affects StructuredBuffers. My educated guess is that StructuredBuffers (like raw buffers) now use the same lower latency direct memory path. Nvidia most likely hasn't yet implemented uniform address optimization for these new memory operations. Turing uniform address optimization performance however (in other cases) returns to similar 20x+ figures than Maxwell/Pascal.

TLDR; As you can see, coding for the hardware makes a dramatic difference in performance. In this case, having a single profile for console makes things straight forward. I think it's easier to make general statements about two pieces of hardware of the exact same architecture. But comparing different architectures is not going to work.

I did purposefully leave out the benchmarks between 5700xt and 20870TI turing. lol just to ensure we aren't going off topic with what I was trying to point out. But yea, I mean, 5700 XT is a good piece of hardware under a variety of workloads. As is GCN in general. I think you'll find some developers here that really praise it's compute ability. And I think the benchmarks showcase how effective it is as different workloads.

iroboto · May 4, 2020

ultragpu said:
But aren't you forgetting the ram amount for the 25% difference is 10GB vs 16 GB? Not to mention the much higher difference in clock speed too? But I agree it is bit hard to tell with those things are muddied up like that.

it doesn't work like that. They aren't split pools.

Jay · May 4, 2020

ultragpu said:
I mean MS can release all the crossgen titles they want in the first two years after their new consoles release, but nothing is stopping them from showing trailers of their native next gen games coming 2-3 years down the line.

I think they will change their message that they won't show things years away.
They just never had anything to show before. Especially from internal studios which they will have more confidence in.
I don't think they'll go as far as 3 years out though.

ultragpu said:
aren't you forgetting the ram amount for the 25% difference is 10GB vs 16 GB? Not to mention the much higher difference in clock speed too? But I agree it is bit hard to tell with those things are muddied up like that.

Memory bandwidth is still faster for all aspects, gpu, cpu, etc. Not that anything apart from gpu can consume anything like the bandwidths these systems have.

GPU frequency, may or may not make a difference, but dont expect it to overturn the raw TF advantage in any but the very minor corner cases, as mentioned possibly RT though, but I expect the raw TF to still be advantageous. Xsx gpu isn't exactly running slow either, so front end shouldn't be a problem at all.

You need to understand who was saying it and their perspective. Cerny wasn't lieing but it was from their perceptive and benchmarking.
Also it was about what they felt was the best way to get to 10TF, not that a higher raw TF value isn't better.

Think I heard that battlefield 6 is next gen only also

PSman1700 · May 4, 2020

ultragpu said:
Would you say a 9TF 2070s (PS5) vs a 11TF 2080s (XSX) to be the most comparable difference here? 2080s has more CUDA cores, more memory bandwidth, faster memory and boost clock speed. Despite having a 2 TF compute lead, in practice it's only 4-8fps faster at 4k, the extra fps wouldn't even warrant a decent increase in resolution.
https://www.digitaltrends.com/computing/rtx-2080-super-vs-rtx-2080-vs-rtx-2070-super/
RDNA 2 might scale a bit differently tho, who knows.

That’s in the pc world though. XSX never had to downclock either. It’s 2070 (9 to 10tf) vs 2080Ti level. Rather large diff i would think. +2TF’s (assuming highest clocks for ps5) of Turing or rdna difference, that’s what, close to a Pro or one x in performance.
With patented vrs, sampler feedback etc and todays reconstructure, resolution won’t be the differinator as much as more stable fps, higher settings etc.

Pinstripe said:
And those few first-party titles showsn will either be silly gimmicks or canceled PS4-games repurposed at the last moment for PS5. It will take a while for true Next-Gen experiences.

Sony never has had anything really good or impressive at launch, like order 1886 and SF. Those appear through the consoles life span usually. Tech demos aside.
Btw, didnt they say something similar to MS about first year having focus on current and next gen?

Deleted member 11852 · May 4, 2020

chris1515 said:
Microsoft but not Sony...

I see what you're saying. Sony will drop PS4/Pro like a stone and launch their 2021+ games only on PS5 and PC going forward. :yep2:

Jay · May 4, 2020

PSman1700 said:
todays reconstructure, resolution won’t be the differinator as much as more stable fps, higher settings etc.

If they was pushing native 4k 2TF wouldn't amount to too much difference overall.
But with reconstruction, that 2TF makes a bigger difference as the amount of pixels needing to be shaded is less.
How much difference I don't know, but it's a lot better use of TF (imo)

DSoup said:
I see what you're saying. Sony will drop PS4/Pro like a stone and launch their 2021+ games only on PS5 and PC going forward.

not sure I'd quite phrase it like that lol
But Sony will probably have some ps5 exclusives out sooner than ms for xsx.

chris1515 · May 4, 2020

DSoup said:
I see what you're saying. Sony will drop PS4/Pro like a stone and launch their 2021+ games only on PS5 and PC going forward.

Out of MLB I think after Ghost of Tsuhima , all Sony title will be PS5 exclusive. Third party will continue to release title on Ps4 and PS4 Pro. This is the reason multiple executive said multiples times they believe in generations. And they learn the lesson with GT6 to release late old gen title.

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

chris1515

iroboto

Daft Funk

RDGoodla

iroboto

Daft Funk

Globalisateur

Globby

function

None functional

Karamazov

AlNom

Moderator

ultragpu

Pinstripe

ultragpu

BRiT

(>• •)>⌐■-■ (⌐■-■)

ultragpu

iroboto

Daft Funk

iroboto

Daft Funk

Jay

PSman1700

Deleted member 11852

Guest

Jay

chris1515

Similar threads