Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Improvements to HW Lumen in UE 5.2:
  • Ray-traced shadows for Rect Lights and lights with a source size are now more accurate and more closely resemble results from the Path Tracer.
  • Better approximation of secondary bounces in reflections with Hardware Ray Tracing (HWRT) hit lighting.
  • Two-sided foliage support in HWRT hit lighting.
  • We now support async compute for inline ray-tracing passes.

 
Last edited:
Layers of Fear is releasing next month on UE5, it will support HW Lumen for Reflections and Shadows, Niagra for particles, as well as DLSS and XeSS.

Recommended specs for 1080p60 is RTX2070/RX 6800XT. For 4K, a 3080Ti is minimum spec.

 

Well, I guess this is indicative of how long it can take a 'less major' new API and hardware feature to work its way through to a big engine's core release. Still no guarantee other's will use it (or offer it as an option) any time soon though I suppose. It still takes work to tune it for your particular games, and perhaps there's the chance of it interfering with the inputs to some of the third party upscaling techs out there.

Still, positive news about T2 VRS at last!

Edit: Apparently the XeSS is recommended not to use with VRS, but The Coalition says they find it works well with UE5 TSR:


Edit 2:

Intel have been working on their own VRS solution to work with XeSS:


And it appears the implementation has to query the hardware to know tiles size and adjust accordingly. Something you'd never need to do on console, as it'd be constant for a platform.
 
Last edited:
There is quite a bit of detail about TSR in this feedback thread. Details on optimisations, changing how they measured it as the quality depends on the framerate as well as the input resolution.

Interesting, and this is maybe the reason TSR is not used in many games for the moment.

Release schedule 5.1 compared to Chapter 4 and all the changes needed in TSR made it such that is was still not stable enough which is not great considering absolutely all 3D rendered pixels are going through it. This is where the upcoming 5.2 release is in fact a big deal for TSR being the first release both production proven state, stable and with more debuging tools.
EDIT:
Interesting too
What we can do specifically on PS5 and XSX is that conveniently most of their AMD GPU is public ( https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf ) , so we can go really deep into hardware details and imagine and experiments crazy ideas. And this is what TSR did in 5.1, it exploit performance characterists of RDNA’s 16bit instructions heavily which can have huge performance benefits. In UE5/Main for 5.3, added shader permutation ( https://github.com/EpicGames/UnrealEngine/commit/c83036de30e8ffb03abe9f9040fed899ecc94422 ) to finaly tap on these instructions exposed in standard HLSL in Shader model 6.2 ( 16 Bit Scalar Types · microsoft/DirectXShaderCompiler Wiki · GitHub ) and for instance on an AMD 5700 XT, the performance savings in TSR are identical to how much these consoles are optimised too:
What makes TSR in 5.1+ so different from 5.0 is in how many more convolutions it does using this exposed hardware capabilities. That RejectShading is doing like 15 convolutions in 5.1+, each 3x3 on a current hardware compared to 3 in 5.0, which allows to make TSR substentially smarter thanks to very neat discovered properties chaining some very particular convolutions do. And while this number of convolutions massively increased by a factor of 5, the runtime cost of this part of TSR didn’t change, and yet this gain in smartness of the algorithm allowed to cut significant amount of other costs that was no longer required in the rest of the TSR algorithm which is core reason behind this performance saving from 3.1ms to 1.5ms on these console. Sadly this expose hardware capabilities in standard HLSL are not benefiting all GPUs equally because how they decided to architure their hardware too.
 
Last edited:
So they are saying TSR are super optimized for the console GPUs in 5.2 release?

Spell it out for the dummies in the room

They use FP16 and dual FP16but it is used too on PC with GPU compatible with it.

So much this will saves in the runtime cost is largely if the drivers says it supports, but also how the hardware is capable to take advantage of this optimisation or not. 16bit often saves register pressure that when register bound can saves up to 2x performance improvement, but for instance RDNA GPUs also have the packed instructions like v_pack_mul_f16 capable to two two multiplications of the price of 1 which is another 2x. So that is the use of 16bit instruction on that shader can almost do a x4 perf improvement.
 
They use FP16 and dual FP16but it is used too on PC with GPU compatible with it.
Vega, RDNA1 and RDNA2 support Rapid Packed Math, where FP16 operations run twice as fast as FP32 operations.

For NVIDIA, Turing, Ampere and Ada support running FP16 ops on the tensor cores for huge throughput, while also allowing for simultaneous FP16 (on Tensors) and FP32 (on ALUs) action.
 
Vega, RDNA1 and RDNA2 support Rapid Packed Math, where FP16 operations run twice as fast as FP32 operations.

For NVIDIA, Turing, Ampere and Ada support running FP16 ops on the tensor cores for huge throughput, while also allowing for simultaneous FP16 (on Tensors) and FP32 (on ALUs) action.

It seems they don't use Tensor core but games can use XeSS or DLSS.

Also one last thing. It would be great if we could switch the unreal engine Anti-Aliasing pipeline to tensor cores, Xe cores, AMD matrix, for performance reasons. (A console command would allow devs to let players to decide to tap into unused GPU components if DLSS,XeSS, etc is not available).
TSR could run on tensor cores instead of the main graphical computing units providing the games main render frames.

Answer from Guillaume Abadie
Would be great! But we are limited to the APIs exposed to us: for instance DirectML that requires roundtrip to main memory between a programmed shader versus matrix multiplications which is not in the interest of performance. This is where each HIV have a lot less constraints compared to us because knows exactly what and how their hardware can do and exploit many advantages that are not standardized like avoiding roundtrip to main memory to squeeze as much as possible runtime GPU performance to invest even more on quality.

What we can do specifically on PS5 and XSX is that conveniently most of their AMD GPU is public ( https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf ) , so we can go really deep into hardware details and imagine and experiments crazy ideas.
 
Back
Top