Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Such great strides on the animation front.

Must be intimidating to anyone who wants to roll their own engine. So much powerful stuff out of the box. Though it wasn’t clear from the video though how the new animation tools compare to the established apps like Maya.
 
And perfect timing on both the software VRS and general (Nanite) improvements questions:

Props to Graham for the great presentation (slides linked there)!
I remember a long time ago, nearly a decade ago when Sebbbi said that we would finally see more games move entirely to compute bypassing the need for the 3D pipeline and therefore ROPS.

And honestly that transition is still happening. It’s crazy how long it takes, but it’s interesting to see that progress is still happening, and happy to see that it still is!
 
Are the latest UE5.4 features already in Fortnite? Would be cool to see improved performance.

Also, maybe we can get a new compiled UE5.4 city sample demo.
 
Are the latest UE5.4 features already in Fortnite? Would be cool to see improved performance.

Also, maybe we can get a new compiled UE5.4 city sample demo.

Yah fortnite already takes advantage of a lot of this, I think.


Fortnite looks a lot better than it used to. A lot of people see gameplay on low graphics or they think of what it looked like five years ago. It’s very nice now. It’s also had motion matching for quite a while now I think when chapter 5 season 1 came out.
 
Last edited:
Some notes on the GPU-driven materials presentation:

This doesn't really matter anymore since UE5 is moving to compute materials but some HW (particularly Nvidia) can't support the SV_StencilRef extension because they don't have independent depth/stencil planes ...

ExecuteBundleIndirectX API - A GPU-driven command buffer generation API is really nice in addition to the GPU-driven state changes API (Xbox ExecuteIndirect for PSO swapping) ...

ds_ordered_count - I don't know if the team working on Nanite knows this but AMD intends to deprecate this HW functionality in future HW designs ...
 
And perfect timing on both the software VRS and general (Nanite) improvements questions:

Props to Graham for the great presentation (slides linked there)!

This is a really detailed presentation. A lot is over my head, but in terms of the pc-space the section on empty bin compaction is really interesting. Basically the issue of empty draw calls with executeindirect, which I've seen mentioned before. There are solutions on console which look like they bring great performance benefits, but so far there is not an equivalent on pc. Work graphs look like a good candidate from the slides, and that's great to see because it means potentially some nice performance wins on pc in the future.

It seems like it's related to this "mesh nodes" option that's coming to work graphs, as described by AMD here.

Looks like UE 5 has undergone massive changes from launch up until 5.4 and that'll continue with 5.5 and onward. Really cool to see this detail. The performance differences in a scene like the matrix demo are pretty pronounced by the metrics in the slide set. Really hoping to see some of that optimization on pc with work graphs sooner than later.
 
And perfect timing on both the software VRS and general (Nanite) improvements questions:
Is the SW VRS compatible with the temporal upscalers? I mean, are they aware of each other and can they be used together to aid each other's weaknesses? It seems like it could be more beneficial now to have a higher internal resolution and more screen areas covered with SW VRS, while having less aggressive scaling factors for temporal upscalers. This way, the most pronounced artifacts, such as low res aliasing on edges in occluded areas (or when temporal upscaling fails to accumulate details), would be less noticeable because edges would preserve higher definition due to the higher base image resolution. Meanwhile, temporal upscalers should be able to reconstruct the missing details at 4x and possibly higher upscaling factors in the low resolution regions affected by VRS.
 
Is the SW VRS compatible with the temporal upscalers? I mean, are they aware of each other and can they be used together to aid each other's weaknesses? It seems like it could be more beneficial now to have a higher internal resolution and more screen areas covered with SW VRS, while having less aggressive scaling factors for temporal upscalers. This way, the most pronounced artifacts, such as low res aliasing on edges in occluded areas (or when temporal upscaling fails to accumulate details), would be less noticeable because edges would preserve higher definition due to the higher base image resolution. Meanwhile, temporal upscalers should be able to reconstruct the missing details at 4x and possibly higher upscaling factors in the low resolution regions affected by VRS.

Yes they're compatible, raytracing/marching (in screenspace or otherwise) has been undersampled for ages now and TAA has been used to fix the image up afterward.

The only thing VRS does is extend this to directly shading pixels in an undersampled manner. There's interesting work on stochastic shading, filtering, etc.: https://research.nvidia.com/labs/rtr/publication/pharr2024stochtex/stochtex.pdf

So creating a spatiotemporal noise pattern that optimally selects VRS samples over time: https://arxiv.org/pdf/2310.15364
could extend stochastic undersampling all the way through the pipeline. As long as you avoid undershading sharp transitions, no mirrorlike reflections, shadow edges, etc. you could extend VRS to a large portion of shading work with minimal impact on final image quality.
 
raytracing/marching (in screenspace or otherwise) has been undersampled for ages now and TAA has been used to fix the image up afterward
Ray tracing/marching are poor examples because this is where temporal upscalers have struggled for a while. The requirements for making them work with temporal upscalers are quite strict - jittering should be aligned, separate shader denoisers should not destroy the jittering (which they typically do), and rays should be launched per pixel, among other factors. Thus, it's a complex area, and only DLSS RR has succeeded here so far.

There's interesting work on stochastic shading, filtering, etc.
Stochastic filtering is simple. This is essentially what DLSS SR already does at the junction areas of mip levels. Since these levels are positioned farther from the camera (due to the negative LOD bias), you can have a few texels per pixel there, which will add temporal noise (or shimmering, if you prefer). DLSS typically averages the noise out and produces higher resolution surfaces at the farther away levels, though it sometimes results in moire, as do other temporal upscalers, unfortunately. The key point here is that you can have different samples by sampling the same texture location, and DLSS SR will average this out for you, hence the stochastic filtering. With VSR, it's the opposite - there are fewer real samples than actual pixels. The temporal upscaler should be able to integrate all the samples piece by piece and assemble them into a higher-resolution image as if it were a puzzle. If anything is wrong with the sample locations, the process would not converge.
 
Last edited:
Ray tracing/marching are poor examples because this is where temporal upscalers have struggled for a while. The requirements for making them work with temporal upscalers are quite strict - jittering should be aligned, separate shader denoisers should not destroy the jittering (which they typically do), and rays should be launched per pixel, among other factors. Thus, it's a complex area, and only DLSS RR has succeeded here so far.


Stochastic filtering is simple. This is essentially what DLSS SR already does at the junction areas of mip levels. Since these levels are positioned farther from the camera (due to the negative LOD bias), you can have a few texels per pixel there, which will add temporal noise (or shimmering, if you prefer). DLSS typically averages the noise out and produces higher resolution surfaces at the farther away levels, though it sometimes results in moire, as do other temporal upscalers, unfortunately. The key point here is that you can have different samples by sampling the same texture location, and DLSS SR will average this out for you, hence the stochastic filtering. With VSR, it's the opposite - there are fewer real samples than actual pixels. The temporal upscaler should be able to integrate all the samples piece by piece and assemble them into a higher-resolution image as if it were a puzzle. If anything is wrong with the sample locations, the process would not converge.

You just need to jointly denoise/upscale at the same time, choosing jitter and sampling patterns jointly for example, it's not super hard beyond putting in the manual work to actually make sure the two work together properly, other than the fact that you need to do both at once (thus it not just being available on all titles). Intel got it working before Nvidia did, I'm pretty sure DLSS 3.5 is based on that paper and the only reason Intel hasn't released XESS 2 is they're saving it for a Battlemage launch. AMD also has an upscaling denoiser under research, possibly launching in FSR4 launching this year. And raymarching has been "upscaled" since forever, that's generally how raymarching works unless you're maxxing out a game that even allows you to do that, there's not a ton of difficulty there once you can account for raymarching and upscaling at the same time.

And VRS isn't that different, in fact stochastic sampling should really be integrated regardless. A good stochastic pattern for VRS upscaling would probably look something like interleaved gradient noise, with its guaranteed coverage. For upscaling the guaranteed coverage would now have to account for upscaling factor, creating a spatially and temporally wider coverage to account for the base pass being "undersampled" as well. But again that's just another "make sure your sampling pattern/camera jitter account for upscaling factor" just like everything else.

The real benefit of VRS though is as an alternative to upscaling at all of course, assuming an average scene you still pay for filling the g-buffer at full res, but otherwise get a "smarter" shading savings and better image quality than brute force upscaling. And dimensionally speaking the more you undersample the more likely your are to end up with outliers, and not know what an outlier even is. Thus while you 100% can account for VRS and upscaling at the same time, it's not going to be some savings that stack linearly on top of each other. Heck the higher your base resolution is lower your shading rate for VRS is going to be anyway, given realistic content.
 
Last edited:
Intel got it working before Nvidia did, I'm pretty sure DLSS 3.5 is based on that paper and the only reason Intel hasn't released XESS 2 is they're saving it for a Battlemage launch. AMD also has an upscaling denoiser under research, possibly launching in FSR4 launching this year. And raymarching has been "upscaled" since forever, that's generally how raymarching works unless you're maxxing out a game that even allows you to do that, there's not a ton of difficulty there once you can account for raymarching and upscaling at the same time.
I think the major upscaling/denoising difference is Nvidia's incorporates AI (tensor cores). Intel and AMD (yet to be revealed approach) might follow suit or take a completely different tangent.
 
Last edited:
You just need to jointly denoise/upscale at the same time, choosing jitter and sampling patterns jointly for example
That is not as easy as it may seem. As mentioned earlier, upscaling and denoising have different requirements. The input for denoisers is sparse, so spatial blurring is a requirement. The direction of the blur is typically anisotropic and depends on normals, samples density and other parameters, which can vary per frame, so it can easily skew the jittering by blurring in different directions in different frames, leaving no good details to accumulate with the temporal upscaler. And that is just one example. I can easily see that the same problems are possible with VSR, and it's not about who first demonstrated the joint denoise/upscale (which is a different topic, since between the initial demonstrations in a small, controlled environment and a fully-featured production implementation in a game, a year or more can easily pass).
 
I think the major upscaling/denoising difference is Nvidia's incorporates AI (tensor cores). Intel and AMD (yet to be revealed approach) might follow suit or take a completely different tangent.
Intel has been using matrix accelerators ("AI" "tensor cores", in Intels case "XMX cores") since day 1
edit: or did you mean frame generation instead? That's something Intel hasn't released yet.
 
The real benefit of VRS though is as an alternative to upscaling at all of course, assuming an average scene you still pay for filling the g-buffer at full res, but otherwise get a "smarter" shading savings and better image quality than brute force upscaling.
The gains of a few milliseconds outlined in the presentation are certainly nowhere near the TSR's x factors, so the only realistic use case for VRS in games with advanced graphics should be in conjunction with TSR. Regarding quality, I'd not say that VRS has any strengths, as it uses a regular pixel grid for low res samples, essentially functioning like integer upscaling by clustering pixels, which results in a very visible quality loss compared to native resolution. Another reason why it should work together with a temporal upscaler. However, given that you still need to render the high resolution gbuffer with VRS, more attractive alternatives may exist, such as rendering a coverage mask at a higher resolution and guide an upscaler with it to produce perfect high resolution edges, potentially providing even better scaling factors.
 
Intel has been using matrix accelerators ("AI" "tensor cores", in Intels case "XMX cores") since day 1
edit: or did you mean frame generation instead? That's something Intel hasn't released yet.
The discussion referred to DLSS 3.5 so ray reconstruction and using tensor cores for AI trained denoisers instead of using a manually hand tuned, game provided denoiser. Yeah, Intel has had XMX cores since product introduction though only used for specific use cases in gaming.
 
Back
Top