Digital Foundry Article Technical Discussion [2022]

PSman1700 · Apr 5, 2022

techuse said:
AFAIK we don't have any info on whether or not matrix math throughput is the limiting factor in DLSS performance.

According to NV it is. Maybe their lying, maybe their not. Il take it at face value untill proven differently.

techuse · Apr 5, 2022

PSman1700 said:
According to NV it is. Maybe their lying, maybe their not. Il take it at face value untill proven differently.

Where have they stated that?

That's all I've seen and it suggests the scaling is not quite inline with the tensor throughput. 2060s takes 50-70% longer than a 2080ti depending on the resolution when it only has half the throughput.

PSman1700 · Apr 5, 2022

https://developer.nvidia.com/rtx/dlss

''NVIDIA DLSS is a deep learning neural network that boosts frame rates and generates sharp images. Powered by Tensor Cores, the dedicated AI processors on NVIDIA RTX™ GPUs, DLSS gives you the performance headroom to maximize ray-tracing settings and increase output resolution.''

Right so in laymens terms they basically say that DLSS is accelerated using the Tensor Cores, the dedicated AI processors. That and the fact that DLSS isn't supported (in the same way) on non-RTX gpus. Theres dozens of other articles out there which imply DLSS is running on the tensor cores, including DF's assumption that even AMD would go the hw AI accelerated route going forward.

As mentioned, il take it at face value that these tensor cores (and the cores Intel uses) are enabling for higher performance due to hardware acceleration. It is no different in the mobile vendor space, look at Apple, since the A11/A12 NPU hardware acceleration has been key to device performance in many ways. A11 NPU wasnt fast enough and hence doesnt support on-device machine learning capabilities, since A12 according to Apple the NPU got fast enough for these new functions in IOS15.

davis.anthony · Apr 5, 2022

techuse said:
AFAIK we don't have any info on whether or not matrix math throughput is the limiting factor in DLSS performance.

As we see a decrease in frame time for DLSS when you move up through the RTX series it would indicate it is a limiting factor to some degree, although it doesn't scale linear.

XSS has 1/6th the INT4 TOPS from I can work out so if a 2060s takes 0.736ms for DLSS at 1080p how long is a GPU with 6x less performance going to take to do the same job?

Surely there's a point where an ML upscale simply takes up too much frame time that it can't be used in the real world as it delays other parts of the pipeline.

see colon · Apr 5, 2022

davis.anthony said:
I wonder if XSS has enough performance to actually use it without killing frame times, XSX has half the INT4 TOPS as an RTX2060 and XSS's GPU is 1/3 of XSX's.

XSS has 1/6? (My maths is fuzzy) the INT4 TOPS as an RTX 2060, so is that even enough to do an ML based upscale in a reasonable amount of frame time?

Those TOPS numbers for nVidia though.... They are for the tensor cores. On AMD, it's just the regular shaders. If you spend your entire budget per second doing upscaling, you wouldn't have any time to render anything to upscale to begin with.

davis.anthony · Apr 5, 2022

see colon said:
If you spend your entire budget per second doing upscaling, you wouldn't have any time to render anything to upscale to begin with.

That's what I'm saying, does XSS even have enough performance to actually use ML based upscaling in an actual game.

Or will it end up like ray tracing? Barely used and avoided in 90% of cases because the performance isn't there.

PSman1700 · Apr 5, 2022

davis.anthony said:
That's what I'm saying, does XSS even have enough performance to actually use ML based upscaling in an actual game.

Or will it end up like ray tracing? Barely used and avoided in 90% of cases because the performance isn't there.

If the premium consoles dont (meaningfull RT and ML upscaling) then sure the XSS wont.

Seanspeed · Apr 5, 2022

PSman1700 said:
If the premium consoles dont (meaningfull RT and ML upscaling) then sure the XSS wont.

Well MS has already made noise about having this hardware in XSX specifically for this use case:

https://cdn.wccftech.com/wp-content/uploads/2020/08/xbox_series_x_tricks.jpg

I would certainly hope they still have plans to do so.

For Sony, I'd guess not.

davis.anthony · Apr 5, 2022

Seanspeed said:
Well MS has already made noise about having this hardware in XSX specifically for this use case:

https://cdn.wccftech.com/wp-content/uploads/2020/08/xbox_series_x_tricks.jpg

I would certainly hope they still have plans to do so.

For Sony, I'd guess not.

Well I'm not talking about XSX, I'm talking about XSS and whether it has enough performance to do ML in an actual game and not whether the hardware supports it or not.

PSman1700 · Apr 5, 2022

Seanspeed said:
Well MS has already made noise about having this hardware in XSX specifically for this use case:

Yeah, they do, and for XSX (or even XSS) it might be performant enough fo warrant using it, but compared to dedicated AI cores i'd guess its not as capable, and that is what DF was coming from in their latest DF Direct.

Seanspeed · Apr 5, 2022

davis.anthony said:
Well I'm not talking about XSX, I'm talking about XSS and whether it has enough performance to do ML in an actual game and not whether the hardware supports it or not.

I only responded to the other person to suggest that MS do seem to have real interest in using it for XSX as they sounded more skeptical about it.

Seanspeed · Apr 5, 2022

PSman1700 said:
Yeah, they do, and for XSX (or even XSS) it might be performant enough fo warrant using it, but compared to dedicated AI cores i'd guess its not as capable, and that is what DF was coming from in their latest DF Direct.

Doesn't need to be super performant, really. If you can gain even 20% performance overhead with negligible image quality loss, then that's still a win and provides either more performance or more room to push the graphics harder. Obviously this needs to compete with other reconstruction techniques, but I do expect MS to use this at some point, even if it just for 1st party games at the least.

And I'd agree with the skepticism of XSS being able to do so as well. That thing is all kinds of hampered and I really hope developers see the XSX as the 'baseline' console and let games on XSS suffer if need be.

PSman1700 · Apr 5, 2022

Seanspeed said:
Doesn't need to be super performant, really. If you can gain even 20% performance overhead with negligible image quality loss, then that's still a win and provides either more performance or more room to push the graphics harder. Obviously this needs to compete with other reconstruction techniques, but I do expect MS to use this at some point, even if it just for 1st party games at the least.

Even a 10% performance overhead might be worth the implementation, together with other technologies it might be the enabler to 60fps for some games, for example.
I was merely agreeing with DF's findings that dedicated hardware cores for AI/ML acceleration is the more performant solution (intel/NV). ML reconstruction on XSX and XSS might and probably will still be worth it going forward, and an advantage over their competitors. The XSS might not either need it as much either as i personally see the XSS as a 1080p console, and seeing the price to afford one, its very okay to me.

davis.anthony · Apr 5, 2022

Seanspeed said:
That thing is all kinds of hampered and I really hope developers see the XSX as the 'baseline' console and let games on XSS suffer if need be.

I do think it'll be a build for XSS and then scale up to XSX rather than the other way around.

With multiplats it might be build for PS5 and slightly scale up for XSX and massively scale down for XSS.

I do understand the purpose of XSS but I do feel Microsoft have shot themselves in the foot over the long term with it.

If Sony release a PS5 Pro in another 2 years they'll have a base of 10.2Tflops and a max of 22Tflops (Going on PS4 Pro's scaling vs PS4) on the top end.

Compared to 4Tflops for Microsoft on the base (XSS) and 12Tflops on the top-end (XSX) meaning that Sony's base and top end will be at least double of Microsoft.

Would it be wise for Microsoft to release a third SKU?

techuse · Apr 5, 2022

PSman1700 said:
https://developer.nvidia.com/rtx/dlss

''NVIDIA DLSS is a deep learning neural network that boosts frame rates and generates sharp images. Powered by Tensor Cores, the dedicated AI processors on NVIDIA RTX™ GPUs, DLSS gives you the performance headroom to maximize ray-tracing settings and increase output resolution.''

Right so in laymens terms they basically say that DLSS is accelerated using the Tensor Cores, the dedicated AI processors. That and the fact that DLSS isn't supported (in the same way) on non-RTX gpus. Theres dozens of other articles out there which imply DLSS is running on the tensor cores, including DF's assumption that even AMD would go the hw AI accelerated route going forward.

As mentioned, il take it at face value that these tensor cores (and the cores Intel uses) are enabling for higher performance due to hardware acceleration. It is no different in the mobile vendor space, look at Apple, since the A11/A12 NPU hardware acceleration has been key to device performance in many ways. A11 NPU wasnt fast enough and hence doesnt support on-device machine learning capabilities, since A12 according to Apple the NPU got fast enough for these new functions in IOS15.

None of that even touches on what the limiting factor of DLSS performance is.

davis.anthony said:
As we see a decrease in frame time for DLSS when you move up through the RTX series it would indicate it is a limiting factor to some degree, although it doesn't scale linear.

XSS has 1/6th the INT4 TOPS from I can work out so if a 2060s takes 0.736ms for DLSS at 1080p how long is a GPU with 6x less performance going to take to do the same job?

Surely there's a point where an ML upscale simply takes up too much frame time that it can't be used in the real world as it delays other parts of the pipeline.

I have no doubt it's a factor, we just don't have the info to draw any educated guesses on where Xbox would land. There is also the additional question mark of how much performance is lost using INT ops in the absence of actual ML instructions.

see colon said:
Those TOPS numbers for nVidia though.... They are for the tensor cores. On AMD, it's just the regular shaders. If you spend your entire budget per second doing upscaling, you wouldn't have any time to render anything to upscale to begin with.

Nvidia GPUs can't use the shader core while Tensors are operating either.

see colon · Apr 5, 2022

techuse said:
Nvidia GPUs can't use the shader core while Tensors are operating either.

Yeah, you rare right. I thought that one of the new features in the 30 series cards was concurrent Tensor/Shader operations, but it's RT/Shader. My bad.

SmooTh · Apr 5, 2022

troyan · Apr 5, 2022

techuse said:
Nvidia GPUs can't use the shader core while Tensors are operating either.

nVidia claims the opposite and with Ampere all three cores could run concurrently.

see colon · Apr 5, 2022

troyan said:
nVidia claims the opposite and with Ampere all three cores could run concurrently.

I thought I had read this as well, but when I went back to look I could only see nVidia talking about RT and shading. Do you have a link to where they say Tensor as well?

iroboto · Apr 5, 2022

troyan said:
nVidia claims the opposite and with Ampere all three cores could run concurrently.

This is correct.
The only reason it appears to be non running concurrent on DLSS is because technically the pipeline is serial at that moment. Developers can choose to run Async compute calls while DLSS is running on tensor cores however.

Digital Foundry Article Technical Discussion [2022]

PSman1700

techuse

PSman1700

davis.anthony

see colon

All Ham & No Potatos

davis.anthony

PSman1700

Seanspeed

davis.anthony

PSman1700

Seanspeed

Seanspeed

PSman1700

davis.anthony

techuse

see colon

All Ham & No Potatos

SmooTh

troyan

see colon

All Ham & No Potatos

iroboto

Daft Funk

Similar threads