Digital Foundry Article Technical Discussion [2023]

Status
Not open for further replies.
<snip>
  • The same awful screen space reflections as in all ReEngine games, with no option for RT to fix them. Alex just recommends to turn them off.

  • That and of course, turn off SSR which looks like crap anyways.

I tried talking about this a few pages back, but it got lost in the noise created by a certain poster. Well, that or my post was boring and no-body cared about what I had to say.

It really is quite something how much SSR can spoil SF6. To quote a bit of my previous post:

I increasingly find SSR to be a double edged sword. Even taking into account its relatively low cost, the occlusion artefacts can lead to some really nasty looking scenes. The battle hub in Street Fighter six has huge shiny floor areas and a large PC walking around. Reflections disappear and reappear below your characters limbs as they animate. It affects a large part of the screen and is very distracting, like your character is dripping white spirit that is erasing detail from existence. Not my video, but these grabs hopefully show what I mean:
SSR nope 1.png



And here is the PC walking over an inverted ghost of himself, while the desk infront on its right edge is also nicely breaking SSR:

SSR nope 2.png


It's one of those things that my brain can't un-see now, and the SSR ghost haunts my dreams.

Clearly, the series S version wasn't bad enough so they used its extra power over last gen to turn on SSR. :/
 
I think what XeSS showed is that it isn't solely ML that is most important, rather it is dedicated hardware accelerated ML that is important. The DP4a version is slower in implementation, simpler, and produces worse image quality than the XMX version.

I did some very rough back of the envelope calculations of the number of operations required for DLSS/FSR2/DP4aXeSS that I'd gladly accept feedback on from anyone that has domain knowledge (I'm an engineer in an unrelated field):

The 3090 has 142 16-bit tensor TFLOPS and from the DLSS programming guide requires 0.79ms to upscale 1080p to 4k. DLSS is a convolutional autoencoder and typical utilization rates for similar types of models are 33-75 %. To be a bit conservative, ill go with 60 %. That implies that DLSS requires ~67 16bit GFLOP per frame.

The 6950XT has 47 16-bit TFLOPS and from the FSR2 github requires 0.8 ms to upscale 1080p->4k. Frame capture graphs on twitter show the upscale component to have utilization rates >50% so ill go with 60 % as well. That implies FSR2 requires ~23 16bit GFLOP per frame.

I can't find official numbers for XeSS nor any frame captures, so I took a frame capture with my 1070 in The Witcher 3. The 1070 has 6.5 32bit TFLOPs and required 9.2 ms to do the upscale compute pass according to the Pix capture. Also assuming a 60% utilization and we get 36 DP4a GFLOP per frame. I guess, though am not sure, that that implies ~144 actual GFLOP per frame because DP4a packs 4 8bit operations. I think that this is an overestimate because not every operation of the upscale would be performed on 4 values at once. For completeness, FSR2 runs in 5.3 ms for the same scene implying ~21 GFLOP per frame (so my estimates are roughly well calibrated with empirical measurements).

This suggests that DLSS and XeSS require more than three times, and maybe up to 7 times as many operations as FSR2, implying that current ML models would run too slow on console to be competitive without dedicated hardware acceleration. These are, again, very rough estimates, but even if I'm off by 2x, that is still likely >2 ms more time to run an ML model on the console compared to FSR2 and the question becomes what the best use for that ~2 ms is: better upscaling quality or other effects/base resolution.
 
I think what XeSS showed is that it isn't solely ML that is most important, rather it is dedicated hardware accelerated ML that is important. The DP4a version is slower in implementation, simpler, and produces worse image quality than the XMX version.

I did some very rough back of the envelope calculations of the number of operations required for DLSS/FSR2/DP4aXeSS that I'd gladly accept feedback on from anyone that has domain knowledge (I'm an engineer in an unrelated field):

The 3090 has 142 16-bit tensor TFLOPS and from the DLSS programming guide requires 0.79ms to upscale 1080p to 4k. DLSS is a convolutional autoencoder and typical utilization rates for similar types of models are 33-75 %. To be a bit conservative, ill go with 60 %. That implies that DLSS requires ~67 16bit GFLOP per frame.

The 6950XT has 47 16-bit TFLOPS and from the FSR2 github requires 0.8 ms to upscale 1080p->4k. Frame capture graphs on twitter show the upscale component to have utilization rates >50% so ill go with 60 % as well. That implies FSR2 requires ~23 16bit GFLOP per frame.

I can't find official numbers for XeSS nor any frame captures, so I took a frame capture with my 1070 in The Witcher 3. The 1070 has 6.5 32bit TFLOPs and required 9.2 ms to do the upscale compute pass according to the Pix capture. Also assuming a 60% utilization and we get 36 DP4a GFLOP per frame. I guess, though am not sure, that that implies ~144 actual GFLOP per frame because DP4a packs 4 8bit operations. I think that this is an overestimate because not every operation of the upscale would be performed on 4 values at once. For completeness, FSR2 runs in 5.3 ms for the same scene implying ~21 GFLOP per frame (so my estimates are roughly well calibrated with empirical measurements).

This suggests that DLSS and XeSS require more than three times, and maybe up to 7 times as many operations as FSR2, implying that current ML models would run too slow on console to be competitive without dedicated hardware acceleration. These are, again, very rough estimates, but even if I'm off by 2x, that is still likely >2 ms more time to run an ML model on the console compared to FSR2 and the question becomes what the best use for that ~2 ms is: better upscaling quality or other effects/base resolution.

Can i just say, WOW, what a post.
Even if the numbers are wrong. you did the Math! a good hypothesis going in, a few tests ( i cant comment to validity, but they look OK to me ) showing your work,
and attempting to draw sensible conclusions.
No platform bias, or preconceived notions.

Assuming your conclusion is correct i would say that FSR is doing an amazing job, for using much less compute resources.
I'd love to see some other try a similar approach and see what numbers they get.
 
I think what XeSS showed is that it isn't solely ML that is most important, rather it is dedicated hardware accelerated ML that is important. The DP4a version is slower in implementation, simpler, and produces worse image quality than the XMX version.

I did some very rough back of the envelope calculations of the number of operations required for DLSS/FSR2/DP4aXeSS that I'd gladly accept feedback on from anyone that has domain knowledge (I'm an engineer in an unrelated field):

The 3090 has 142 16-bit tensor TFLOPS and from the DLSS programming guide requires 0.79ms to upscale 1080p to 4k. DLSS is a convolutional autoencoder and typical utilization rates for similar types of models are 33-75 %. To be a bit conservative, ill go with 60 %. That implies that DLSS requires ~67 16bit GFLOP per frame.

The 6950XT has 47 16-bit TFLOPS and from the FSR2 github requires 0.8 ms to upscale 1080p->4k. Frame capture graphs on twitter show the upscale component to have utilization rates >50% so ill go with 60 % as well. That implies FSR2 requires ~23 16bit GFLOP per frame.

I can't find official numbers for XeSS nor any frame captures, so I took a frame capture with my 1070 in The Witcher 3. The 1070 has 6.5 32bit TFLOPs and required 9.2 ms to do the upscale compute pass according to the Pix capture. Also assuming a 60% utilization and we get 36 DP4a GFLOP per frame. I guess, though am not sure, that that implies ~144 actual GFLOP per frame because DP4a packs 4 8bit operations. I think that this is an overestimate because not every operation of the upscale would be performed on 4 values at once. For completeness, FSR2 runs in 5.3 ms for the same scene implying ~21 GFLOP per frame (so my estimates are roughly well calibrated with empirical measurements).

This suggests that DLSS and XeSS require more than three times, and maybe up to 7 times as many operations as FSR2, implying that current ML models would run too slow on console to be competitive without dedicated hardware acceleration. These are, again, very rough estimates, but even if I'm off by 2x, that is still likely >2 ms more time to run an ML model on the console compared to FSR2 and the question becomes what the best use for that ~2 ms is: better upscaling quality or other effects/base resolution.

Excellent post!

Just to add that an RTX2060 offers:
  • 455 INT4 TOPS
  • 228 INT8 TOPS

Series-X offers:
  • 97 INT4 TOPS
  • 49 INT8 TOPS
That is a huge gulf in potential ML upscaling performance and Series-S would be even worse off as it's 3x lower than even Series-X (assuming 3x scaling with TFLOPS as a rough guide)

So my assumption that the performance just might not even be there in the real world to do ML based upscale might not that far off.

The question becomes what the best use for that ~2 ms is: better upscaling quality or other effects/base resolution.

That would ultimately depend on what using ML upscaling would allow you to do with the resolution.

It may allow devs to drop using a native 1440p input to FSR2 (Like in the new Avatar and Star Wars game) and use native 1080p in to an ML based upscale while having no loss in IQ or even an improvement.

So native 1080p + ML upscaling vs native 1440p + FSR..... how many ms would be freed up?
 
Last edited:
Assuming your conclusion is correct i would say that FSR is doing an amazing job, for using much less compute resources.
In my experience, FSR does a fine job considering it's performance cost. It's not perfect, and does cause some visual instabilities, of course. And I don't thin I've ever seen it in a game enhance the image quality like DLSS can, but in terms of a performance enhancing feature, it offers a tangible performance uplift with acceptable image quality at the higher settings in most cases. It's near universal compatibility is also a plus in it's favor, including it's inclusion in game and in apps like Lossless Scaling make it a great tool for GPUs without the performance penalty of XESS or the support for DLSS. Yes, it's the worst of the 3 upscalers mentioned in terms of image quality, but it's still usually better than running at it's internal resolution.
 
I do not believe that happened - what did happen was an unfinished version of Gears 5 was demonstrated on Series X at the time using the "Ultra" settings. They even showed off the benchmark running on Series X.

IIRC, John mentioned that VRS allowed The Coalition to save something like 12% of rendering time per frame, thus, allowing Gears 5 to have a higher average resolution on keeping closer to 4K imagery.


Just to clear this up, I wasn't referring to either of these things. However, I was still mistaken (stupid old man memory).

What I was remembering was this video where Rich was talking about using Heutchy on Xbox One games instead of just OG and 360 games.
And then a year later John referenced that same meeting that Rich was talking about in the first video.
 
I have to strongly disagree here, I've tested both FSR2 and DLSS back to back and DLSS can look substantially better.

Probably not better enough to warrant AMD to support two separate technologies with the second solution limited to a subset of its own hardware. AMD is a business so FSR performance alone isn't going to guide their decision.

AMD isn't Nvidia as it doesn't own 70-80% of the market where it can expect to each arch to represent 10s of millions of unit sales. AMD has to consider the cost of investment, probability of adoption by devs or the cost of driving adoption against the small marketshare held by its PC GPUs and the 1/3rd of the console market that the X series hold (the PS5 doesn't even sport matrix cores). Then there is the question if a ML solution is viable for the Series S. If not, whats the value of an AMD ML solution that limited to small subset of gaming PCs and only a portion of XB Series consoles?

All the while FSR is fastest adopted software tech ever for AMD.
 
Last edited:
Another point here is, for the avg gamer, is it noticeable for them to the point of it mattering?

For the 'average gamer', is there a significant difference between bilinear upscaling and any form of reconstruction at all? We can hand-wave away a ton of image quality improvements over the years by referring to some hypothetical 'average gamer' who only tuns on their console for CoD and Madden.
 
In this episode, as had been voiced before, the DF team speculate that games will increasingly target a 30fps frame rate on current generation consoles (PS5/XSX in particular) yet they don't see value for a mid-generation hardware upgrade.

I'm baffled by this position. When it was accepted that most console games were 30fps, because 95% were, the expectation was 30fps with some visual compromises swallowed for some that did target 60fps. I am less convinced that console gamers who have now gotten accustomed to 60fps (and/or 40fps and or VRR) options will be as chill with a return to 30fps after years of 60gps game options.

A Plague's Tale 2, which DF mention a lot because of the overwhelming demand from console gamers to include a 60fps mode, paints what I think is a clear picture about how receptive console gamers will be to a step backwards on the framerate side. ¯\_(ツ)_/¯
 
In this episode, as had been voiced before, the DF team speculate that games will increasingly target a 30fps frame rate on current generation consoles (PS5/XSX in particular) yet they don't see value for a mid-generation hardware upgrade.

I'm baffled by this position. When it was accepted that most console games were 30fps, because 95% were, the expectation was 30fps with some visual compromises swallowed for some that did target 60fps. I am less convinced that console gamers who have now gotten accustomed to 60fps (and/or 40fps and or VRR) options will be as chill with a return to 30fps after years of 60gps game options.

A Plague's Tale 2, which DF mention a lot because of the overwhelming demand from console gamers to include a 60fps mode, paints what I think is a clear picture about how receptive console gamers will be to a step backwards on the framerate side. ¯\_(ツ)_/¯

But if they're limited to 30fps by CPU performance there's no way they are getting a 2x faster CPU in a mid gen upgrade. Also the prospect of a reasonably priced mid gen upgrade even just focussed on GPU improvements seems unlikely given both Sony and now Microsoft have been forced to increase the prices of their current gen consoles. Clearly there are no costs savings with them so more performance is certainly going to come with an appreciably higher price tag.
 
In this episode, as had been voiced before, the DF team speculate that games will increasingly target a 30fps frame rate on current generation consoles (PS5/XSX in particular) yet they don't see value for a mid-generation hardware upgrade.

I'm baffled by this position. When it was accepted that most console games were 30fps, because 95% were, the expectation was 30fps with some visual compromises swallowed for some that did target 60fps. I am less convinced that console gamers who have now gotten accustomed to 60fps (and/or 40fps and or VRR) options will be as chill with a return to 30fps after years of 60gps game options.

A Plague's Tale 2, which DF mention a lot because of the overwhelming demand from console gamers to include a 60fps mode, paints what I think is a clear picture about how receptive console gamers will be to a step backwards on the framerate side. ¯\_(ツ)_/¯

I agree that I think there's definitely a market for a higher performing console and that there clearly is a 'need' for more power with a few recent releases struggling to hit a consistent 60fps (or doing so at a very low base resolution). I don't necessarily hold the view that the new consoles have been 'barely tapped into', they're not magical devices. There is a limitation to a Zen2 CPU + ~2080 GPU performance.

However I think a good part of their skepticism is driven by what they believe can actually be brought to the market in 2024 at a reasonable price point. There's just little indication you're going to be able to get that 2X+ performance bump you would want for a marginally higher cost. Either the performance boost will be less, or the cost will be high enough (and power draw) to the point where it becomes a boutique item and doesn't really matter to the majority of console buyers.

I'm not entirely sure on that either, I think most didn't expect the digital PS5 to come in at $400 either, and we're still ~18 months away from the rumored release date of this Pro model too. But recent indications (the $350 Series S, the price hike for the PS5 outside of the US still exists, RDNA3 providing meagre gen-on-gen improvements) warrant a fair bit of skepticism that we should expect a significant boost.
 
However I think a good part of their skepticism is driven by what they believe can actually be brought to the market in 2024 at a reasonable price point.
That was the other aspect that I didn't mention. There does not seem to be any acknowledgement that AMD may be designing unconventional APU configurations for consoles that differ what what is known compared to AMDs disclosed GPU roadmap.

Consoles are not PCs, the architecture and bottlenecks are different. Inserting higher-clocked CPUs and increasing core GPU resources can deliver quite significant performance profile compared to a PC.
 
I don’t imagine Sony could offer the bare minimum 2x general performance improvement required to even begin to justify a pro console without losing over one hundred dollars per unit at $600.
 
In this episode, as had been voiced before, the DF team speculate that games will increasingly target a 30fps frame rate on current generation consoles (PS5/XSX in particular) yet they don't see value for a mid-generation hardware upgrade.

I'm baffled by this position. When it was accepted that most console games were 30fps, because 95% were, the expectation was 30fps with some visual compromises swallowed for some that did target 60fps. I am less convinced that console gamers who have now gotten accustomed to 60fps (and/or 40fps and or VRR) options will be as chill with a return to 30fps after years of 60gps game options.

A Plague's Tale 2, which DF mention a lot because of the overwhelming demand from console gamers to include a 60fps mode, paints what I think is a clear picture about how receptive console gamers will be to a step backwards on the framerate side. ¯\_(ツ)_/¯

If we look at the ps4 to the ps4 pro we gained less than 1ghz clocks on the same cpu cores and count as the base model. The gpu was also just a small step up technology wise going from gcn 1.x to another 1.x. We didn't even get a ram increase. If we look at the one x we had a similar experiance but we did get more ram.

So if we are looking at ps5 pro in the same way the ps4 pro is then we would get another zen2 8 core cpu that maybe hits 4.5ghz and a refresh of the rdna inside of ps5 and likely no ram upgrade. I doubt it will do much to alleviate the performance issues of the next gen console.

I would think its just better to shorten the traditional cycle of consoles and instead of waiting until year 7 they just release a new console at year 5. I think moving to zen 5 with a big/little core structure that uses the little cores for os and the big cores for gaming , along with more ram and perhaps a newer rdna like 4 or 5 would be a much more compelling upgrade path. The time frame of game development has increased so much that it's best just to keep rolling hardware forward .

That is my perspective
But if they're limited to 30fps by CPU performance there's no way they are getting a 2x faster CPU in a mid gen upgrade. Also the prospect of a reasonably priced mid gen upgrade even just focussed on GPU improvements seems unlikely given both Sony and now Microsoft have been forced to increase the prices of their current gen consoles. Clearly there are no costs savings with them so more performance is certainly going to come with an appreciably higher price tag.
If you wait another year or so and aren't letting yourself be tied to previous consoles then you could end up with zen 5/6 with big little cores. LIke I said above the little cores can be used for the OS while the big cores can be reserved only for gaming. With zen 5 or 6 we should see some really large gains in performance over zen 2.
 
Status
Not open for further replies.
Back
Top