PS5 Pro *spawn

vjPiedPiper · Sep 19, 2024

One thing I haven't seen definitively answered yet.

Is PSSR running on Hardware that is NOT part of the RDNA4 architecture?

Because, AMD may have added some hardware to RDNA4 that allows for PSSR to run.
Will we see AMD announce a very similar ML based upscaling tech, that is supported by RDNA4 based GPUs?
I guess they could call it FSR4, but it would basically be the same as PSSR.

OR is PSSR totally unique to the PS5Pro, using custom ML based hardware and API's totally external to RDNA4?

I guess it would take a dev with access to a PS5Pro to do some internal test to determine if or how much the PSSR upscale takes.

Dictator · Sep 19, 2024

snc said:
Taking into account that ps5 use bilinear upscale that is cheap and pro still after pssr use same internal res + has more stable framerate we can defenitly expect increase internal res when base ps5 use fsr2 like in Alan Wake2

Who is to say AW2 will not increase settings instead of internal res? Do not forget, the game was on the lowest settings possible from the PC version on console.

davis.anthony · Sep 19, 2024

Considering PSSR is already at 2ms processing time I bet it's the best they could get on the hardware they have, and within the frame budget they allocated.

DLSS got faster over the years because Nvidia optimised it, but more importantly, their GPU's just got faster at ML performance.

So there might not be a lot of scope to improve PSSR all that much, as there might not be enough ML performance to run a more complex/better model within a 2ms window.

Tkumpathenurple · Sep 19, 2024

vjPiedPiper said:
One thing I haven't seen definitively answered yet.

Is PSSR running on Hardware that is NOT part of the RDNA4 architecture?

Because, AMD may have added some hardware to RDNA4 that allows for PSSR to run.
Will we see AMD announce a very similar ML based upscaling tech, that is supported by RDNA4 based GPUs?
I guess they could call it FSR4, but it would basically be the same as PSSR.

OR is PSSR totally unique to the PS5Pro, using custom ML based hardware and API's totally external to RDNA4?

I guess it would take a dev with access to a PS5Pro to do some internal test to determine if or how much the PSSR upscale takes.

We don't yet know. I share your eagerness to find out.

I'm leaning more towards it all being done on the CU's using Sony's own ML model, and that an AMD equivalent will follow soon with RDNA4.

Dictator said:
Who is to say AW2 will not increase settings instead of internal res? Do not forget, the game was on the lowest settings possible from the PC version on console.

I've not followed DLSS stuff all that closely, but isn't it generally the case that a sub-1080p image is quite undesirable for DLSS'ing to 4K? Or is that really only an FSR issue?

Because, if the former, that would suggest a 1080p minimum would take priority over higher settings IMO.

davis.anthony said:
Considering PSSR is already at 2ms processing time I bet it's the best they could get on the hardware they have, and within the frame budget they allocated.

DLSS got faster over the years because Nvidia optimised it, but more importantly, their GPU's just got faster at ML performance.

So there might not be a lot of scope to improve PSSR all that much, as there might not be enough ML performance to run a more complex/better model within a 2ms window.

True. I don't expect a night and day difference, but the possibility of improvements could make firmware updates a touch more fun.

snc · Sep 19, 2024

Dictator said:
Who is to say AW2 will not increase settings instead of internal res? Do not forget, the game was on the lowest settings possible from the PC version on console.

I hope they wont

At least not in perf mode and increase internal res. In 30fps mode different story.

snc · Sep 19, 2024

techuse said:
Why do you think FSR 2 would fare better than PSSR at a resolution lower than 1080p?

Dont understand ? They should increase internal res to 1080p and apply pssr.

snc · Sep 19, 2024

davis.anthony said:
So there might not be a lot of scope to improve PSSR all that much, as there might not be enough ML performance to run a more complex/better model within a 2ms window.

Ps5pro has 300 tops ml at int8. For example ~~rtx 3080 has 238~~ tops int8. There is room for improvement.

davis.anthony · Sep 19, 2024

snc said:
Ps5pro has 300 tops ml at int8. For example rtx 3080 has 238 tops int8. There is room for improvement.

And 476 with sparsity enabled.

snc · Sep 19, 2024

davis.anthony said:
And 476 with sparsity enabled.

True, so better comparison rtx 3070, 325tops

Globalisateur · Sep 19, 2024

vjPiedPiper said:
One thing I haven't seen definitively answered yet.

Is PSSR running on Hardware that is NOT part of the RDNA4 architecture?

Because, AMD may have added some hardware to RDNA4 that allows for PSSR to run.
Will we see AMD announce a very similar ML based upscaling tech, that is supported by RDNA4 based GPUs?
I guess they could call it FSR4, but it would basically be the same as PSSR.

OR is PSSR totally unique to the PS5Pro, using custom ML based hardware and API's totally external to RDNA4?

I guess it would take a dev with access to a PS5Pro to do some internal test to determine if or how much the PSSR upscale takes.

According to leaks, it's from RDNA4 new features: Sparse WMMA with int-8 instructions. Something RDNA3 has not.

techuse · Sep 19, 2024

snc said:
True, so better comparison rtx 3070, 325tops

But the 3070 has separate tensor cores and can run tensor and shader instructions in parallel. The 300 TOPs of PS5 pro may very well be complete utilization of the shader core. If that is the case, the performances aren't comparable at all. Also, and someone more knowledgeable correct me if this is wrong, but I am under the impression that matrix math used for ML is very bandwidth heavy. That could be a problem for an APU that shares bandwidth with the CPU.

iroboto · Sep 19, 2024

snc said:
True, so better comparison rtx 3070, 325tops

But it’s completely separate silicon. Which means it doesn’t interfere with the rendering power.

If 5pro is also like that, I would have expected some slightly higher resolutions given the performance profile and RDNA 4 CUs.

iroboto · Sep 19, 2024

techuse said:
but I am under the impression that matrix math used for ML is very bandwidth heavy.

This depends on the size of the neural network.

snc · Sep 19, 2024

techuse said:
But the 3070 has separate tensor cores and can run tensor and shader instructions in parallel. The 300 TOPs of PS5 pro may very well be complete utilization of the shader core.

We dont know how its implemented on pro, for now we know based on ff7r results that its not that intensive if it can be added to game without fsr, keeping internal res and improving frames stabililty.

techuse · Sep 19, 2024

snc said:
We dont know how its implemented on pro, for now we know based on ff7r results that its not that intensive if it can be added to game without fsr, keeping internal res and improving frames stabililty.

I think those results suggest it is quite heavy.

iroboto said:
This depends on the size of the neural network.

Do we have any info on the size of XESS and DLSS?

davis.anthony · Sep 19, 2024

iroboto said:
This depends on the size of the neural network.

It's slightly below SkyNet.

Globalisateur · Sep 19, 2024

techuse said:
But the 3070 has separate tensor cores and can run tensor and shader instructions in parallel. The 300 TOPs of PS5 pro may very well be complete utilization of the shader core. If that is the case, the performances aren't comparable at all. Also, and someone more knowledgeable correct me if this is wrong, but I am under the impression that matrix math used for ML is very bandwidth heavy. That could be a problem for an APU that shares bandwidth with the CPU.

PS5 Pro has the advantage of been a closed system and Sony can fully optimize and use others ressources to help processing PSSR. For instance I wouldn't be surprised if ID Buffer and others custom (PS4 Pro / PS5) techs were also used in PSSR. But what should matter are the results, right? Results are exactly (and even better in some aspects) as promised by Cerny. And BTW it's not because of UE4 TAA as we could already see the better IQ / details using 1440p PSSR vs native 4K in TLOU2 from the first Youtube footage.

What we don't know yet are the improvements from new RT tech.

see colon · Sep 19, 2024

davis.anthony said:
It's slightly below SkyNet.

Are you sure? T800's run on a 6502 or 6510, do we think Skynet is using something above 300 TOPs?

iroboto · Sep 19, 2024

techuse said:
I think those results suggest it is quite heavy.

Do we have any info on the size of XESS and DLSS?

You can probably open the model and take a peak at how many layers and how many inputs. But they have more than 1 NN in DLSS and several steps inbetween. They do at least 1 for AA and at least 1 for upscaling. Then there is optical flow and probably interim steps as well and I believe 1 more NN for frame gen.

I recall a very old Super Resolution demo available for download by MS that leverages Nvidia's model. That might actually be a starting point. But there's just no comparison point here between PSSR and DLSS. If we can never run PSSR on PC, we will never know how big or small it is.

If you want to go deep into how Tensor Cores work and the math behind them this is the best place for it:

The Best GPUs for Deep Learning in 2023 — An In-depth Analysis

Here, I provide an in-depth analysis of GPUs for deep learning/machine learning and explain what is the best GPU for your use-case and budget.

timdettmers.com

I don't think the 5Pro is bad at it, perse, I do believe that if they hit 300TOPs with CU, those would be the customizations. It does save a TON of silicon and brings the costs down significantly over having separate silicon that may never be leveraged. But as you pointed out, CUs need to share this with rendering, so it cannot do work in parallel, perhaps at best via async compute, but it's not the type fo task you can just switch into and out of.

But also, as you say, extremely bandwidth heavy. So the bottleneck is more likely to be bandwidth than it is compute here.

that being said, the bandwidth for running a model and training a model are very different. There is a whole section on memory bandwidth and cache bandwidth as being a factor in the article. It's important to leverage the knowledge here, but 5Pro is only required to run the model, not train it, and that's significantly less work to do.

but for those of you who aren't interested in reading (believe me, it is well worth the read because everyone seems to have a very strong understanding of graphical pipelines, take some time to learn the ML ones too!)

Memory Bandwidth
From the previous section, we have seen that Tensor Cores are very fast. So fast, in fact, that they are idle most of the time as they are waiting for memory to arrive from global memory. For example, during GPT-3-sized training, which uses huge matrices — the larger, the better for Tensor Cores — we have a Tensor Core TFLOPS utilization of about 45-65%, meaning that even for the large neural networks about 50% of the time, Tensor Cores are idle.

This means that when comparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth. For example, The A100 GPU has 1,555 GB/s memory bandwidth vs the 900 GB/s of the V100. As such, a basic estimate of speedup of an A100 vs V100 is 1555/900 = 1.73x.

Globalisateur · Sep 19, 2024

About bandwidth worth reminding that Sony said the bandwidth available would be comparably more efficient on PS5 Pro, somehow.

As the memory system is more efficient on the PS5 Pro, “the bandwidth gain may exceed 28 percent,” says Sony.

On top of better GPU cache architecture (doubled L0 and L1) that could help explain that claim I think memory clock (2.25ghz) been very close to GPU clocks (? 2.18-2.35ghz ?) could also help. On PS5 the difference of clocks (1.75ghz vs 2.23ghz) was not helping bandwidth efficiency according to Cerny.

About the only downside is that system memory is 33% further away in terms of cycles. But the large number of benefits more than counterbalanced that.

PS5 Pro *spawn

vjPiedPiper

Dictator

davis.anthony

Tkumpathenurple

snc

snc

snc

davis.anthony

snc

Globalisateur

Globby

techuse

iroboto

Daft Funk

iroboto

Daft Funk

snc

techuse

davis.anthony

Globalisateur

Globby

see colon

All Ham & No Potatos

iroboto

Daft Funk

The Best GPUs for Deep Learning in 2023 — An In-depth Analysis

Memory Bandwidth

Globalisateur

Globby

Similar threads

PS5 Pro *spawn

Globby

Daft Funk

Daft Funk

Globby

All Ham & No Potatos

Daft Funk

Memory Bandwidth​

Globby

Similar threads

Memory Bandwidth