Unreal Engine 5, [UE5 Developer Availability 2022-04-05]


The idea comes from Microsoft R&D and everyone need to read this for understand how they did. This is clever.

http://hhoppe.com/proj/gim/

Abstract: Surface geometry is often modeled with irregular triangle meshes. The process of remeshing refers to approximating such geometry using a mesh with (semi)-regular connectivity, which has advantages for many graphics applications. However, current techniques for remeshing arbitrary surfaces create only semi-regular meshes. The original mesh is typically decomposed into a set of disk-like charts, onto which the geometry is parametrized and sampled. In this paper, we propose to remesh an arbitrary surface onto a completely regular structure we call a geometry image. It captures geometry as a simple 2D array of quantized points. Surface signals like normals and colors are stored in similar 2D arrays using the same implicit surface parametrization — texture coordinates are absent. To create a geometry image, we cut an arbitrary mesh along a network of edge paths, and parametrize the resulting single chart onto a square. Geometry images can be encoded using traditional image compression algorithms, such as wavelet-based coders.
Hindsights: Geometry images have the potential to simplify the rendering pipeline, since they eliminate the "gather" operations associated with vertex indices and texture coordinates. Although the paper emphasizes the exciting possibilities of resampling mesh geometry into an image, the same parametrization scheme can also be used to construct single-chart parametrizations over irregular meshes, for seam-free texture mapping. The irregular "cruft" present in several of the parametrizations is addressed by the inverse-stretch regularization term described in the 2003 Spherical Parametrization paper.


They solve the tiny triangle problem @Dictator , the idea was there but not the hardware

http://graphicrants.blogspot.com/2009/01/virtual-geometry-images.html

y plan on how to get this running really fast was to use instancing. With virtual textures every page is the same size. This simplifies many things. The way detail is controlled is similar to a quad tree. The same size pages just cover less of the surface and there are more of them. If we mirror this with geometry images every time we wish to use this patch of geometry it will be a fixed size grid of quads. This works perfectly with instancing if the actual position data is fetched from a texture like geometry images imply. The geometry you are instancing then is grid of quads with the vertex data being only texture coordinates from 0 to 1. The per instance data is passed in with a stream and the appropriate frequency divider. This passes data such as patch world space position, patch texture position and scale, edge tessellation amount, etc.

If patch tessellation is tied to the texture resolution this provides the benefit that no page table needs to be maintained for the textures. This does mean that there may be a high amount of tessellation in a flat area merely because texture resolution was required. Textures and geometry can be at a different resolution but still be tied such as the texture is 2x the size as the geometry image. This doesn't affect the system really.

If the performance is there to have the two at the same resolution a new trick becomes available. Vertex density will match pixel density so all pixel work can be pushed to the vertex shader. This gets around the quad problem with tiny triangles. If you aren't familiar with this, all pixel processing on modern GPU's gets grouped into 2x2 quads. Unused pixels in the quad get processed anyways and thrown out. This means if you have many pixel size triangles your pixel performance will approach 1/4 the speed. If the processing is done in the vertex shader instead this problem goes away. At this point the pipeline is looking similar to Reyes.
 
@iroboto Maybe the renderer is entirely compute driven and the rasterizer is skipped entirely. If you generate some data structure that's culled down to 1 triangle per pixel, that what is the triangle coverage test for? You know which primitive covers that pixel, especially if you're doing primary ray tests. You probably only need to know which texel plus material data maps to that triangle, and the surface normal.
 
@iroboto Maybe the renderer is entirely compute driven and the rasterizer is skipped entirely. If you generate some data structure that's culled down to 1 triangle per pixel, that what is the triangle coverage test for? You know which primitive covers that pixel, especially if you're doing primary ray tests. You probably only need to know which texel plus material data maps to that triangle, and the surface normal.

https://forum.beyond3d.com/posts/2125480/

Read my post they don't need have the problem of tiny triangle anymore

If the performance is there to have the two at the same resolution a new trick becomes available. Vertex density will match pixel density so all pixel work can be pushed to the vertex shader. This gets around the quad problem with tiny triangles. If you aren't familiar with this, all pixel processing on modern GPU's gets grouped into 2x2 quads. Unused pixels in the quad get processed anyways and thrown out. This means if you have many pixel size triangles your pixel performance will approach 1/4 the speed. If the processing is done in the vertex shader instead this problem goes away. At this point the pipeline is looking similar to Reyes.
 
Those are still static non-rigged meshes (unlike an animated character for example).
They mentioned deformation though.


Discarding triangles I think is not the issue here.

I actually think the bottleneck is the aim for 1 triangle per pixel, that should have been a red flag right away for us.
(white paper here: https://www.amd.com/system/files/documents/rdna-whitepaper.pdf)
RDNA rasterizer only output 1 triangle and emit 16 pixels per clock cycle. But if you're doing 1 triangle per pixel, you've reduced your output dramatically. You're heavily primitive bound, you're 1/16 slower. Rasterization efficiency is going to be super low. And it doesn't matter how much you cull your triangles, if you're using fixed function in this way, you're bound. I don't know how they got around this. If the aim is for 100% effective rasterization, each triangle should cover 16 pixels. Among many things that need consideration, I'm not sure if you can render 1 triangle to take up 3 pixels and use the individual vertices to represent 1 pixel. So you claw back 3 pixels as opposed to 1.

https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf
Graham W, who wrote this gpu culling article while with DICE (now with Epic) wrote a lot about GPU based culling and GPU based workloads.

We don't know if the PS5's raw geometry performance is the same as Navi 10's, or if they really meant 1 triangle-per-pixel.
But if it's 1 triangle every 16 pixels, then the PS5 at 2.2GHz should have a 140.8 GP/s fillrate. Divide that by 16 and you get 8.8 GTriangles/s.
1440p is 3.7 MPixels. At 30 FPS, you need to be able to process 3.7 x 30 triangles = 110 MTriangles/s.

Unless I made some miscalculation somewhere, it seem that triangle throughput isn't the bottleneck for 1 Pixel = 1 Triangle which in theory should be good even for 4K120, but rather triangle culling (or of course some other factor I'm missing here).
 
Basically this solution should provide perfect lod and eliminate edge aliasing entirely. TAA should only be needed to remove shader aliasing. Low resolutions should actually look much better than they would in a typical rasterizer.

... I think.
 
I hope on the PC side of things we are able to use main ram as a buffer. Ram is pretty cheap , I have 32 gigs of main system ram. If an UE5 engine game would use my nvme drive plus 16 gigs of DDR4 that could enable some pretty amazing things I would think.
 
@eastmen Yah, that's one way to overcome some of the nvme performance differential to ps5. Keep way more stuff in RAM. Hopefully they have a number of options to make the best of various different hardware configs.
 
@eastmen Yah, that's one way to overcome some of the nvme performance differential to ps5. Keep way more stuff in RAM. Hopefully they have a number of options to make the best of various different hardware configs.

I assume they can make a slider like a texture one now to affect the quality of the meshes. Or even have one that devotes system ram. If games made use of it I would buy 64 gigs of ram. Devote 32as a streaming buffer plus a Pci-e 4 nvme drive. In 2021 I am sure we will see faster than 6GB/s drives in pc land and there is nothing stopping performance mobos with pci-e 4 8x for 16GBs.

Would love to see what this tech can do pushed to its maximum
 
I assume they can make a slider like a texture one now to affect the quality of the meshes. Or even have one that devotes system ram. If games made use of it I would buy 64 gigs of ram. Devote 32as a streaming buffer plus a Pci-e 4 nvme drive. In 2021 I am sure we will see faster than 6GB/s drives in pc land and there is nothing stopping performance mobos with pci-e 4 8x for 16GBs.

Would love to see what this tech can do pushed to its maximum

Really curious to see if they'll demo it with Nvidia whenever consumer ampere launches. If it supports DLSS 2.0 they'll get easy wins in terms of lowering the amount of texture data, geometry data they need to stream in per frame because with this solution it looks like pretty much everything scales somewhat linearly with resolution.
 
Seeing all thoes polygons I assume they have to cull loads from out of sight geometry,


Using mesh shaders which I assume is happening on next gen.

Also the idea of loading "film" assets seems to tie with other work flows Unreal have been perusing.


Funny people mention star wars as I suspect a fair share of asset reuse here.
 
Would using Nanite solely for environment and non character models potentially drastically increase polybudget for animated character models?
Also next gen jungle/forest would look so real that you might literally get lost in the level lol.
 
That's running on a Warp 560 turbo. I went looking for YT examples but they are all on emulators and accelerators AFAICS. I remember playing Doom clones like Alien Breed 3D on tiny windows with chunky pixels to get some sort of playable framerate.
This is best we have at the moment.
Amiga 500 1MB of memory.
Basically this solution should provide perfect lod and eliminate edge aliasing entirely. TAA should only be needed to remove shader aliasing. Low resolutions should actually look much better than they would in a typical rasterizer.

... I think.
If shading is moved to object space, shader aliasing will certainly be quite different problem to what we have used to.
 
Last edited:
It should be noted that it currently looks like Lumen only casts hard-edged shadows which everybody hates & has been trying to fix or is fixing for decades with contact hardening etc.
 
Would using Nanite solely for environment and non character models potentially drastically increase polybudget for animated character models?
Also next gen jungle/forest would look so real that you might literally get lost in the level lol.

you do not have infinite GPU ressource sadly, and seeing how we still see some polygon edges on the character in the demo, they are still limited by some factors.
 
Last edited:
The way I understand it is that the artist will have a multi-million polygon asset, or whatever the number is, and only need to reduce it to the highest possible LOD that the PS5 can take (also in terms of storage) without the need to create multiples of the same asset for each LOD step. The system will simply dynamically scale the geometry down according to the distance. Which also means there is no visible LOD swapping going on.

Ue4 already have auto LOD generations so you just plop your high model and there will be lod 0,1,2 generated.

But with ue 5, it seems the lod will not be distinct quality like lod 0,1,2,3 but It will be gradual. So no weird lod transition / pop
 
We don't know if the PS5's raw geometry performance is the same as Navi 10's, or if they really meant 1 triangle-per-pixel.
But if it's 1 triangle every 16 pixels, then the PS5 at 2.2GHz should have a 140.8 GP/s fillrate. Divide that by 16 and you get 8.8 GTriangles/s.
1440p is 3.7 MPixels. At 30 FPS, you need to be able to process 3.7 x 30 triangles = 110 MTriangles/s.

Unless I made some miscalculation somewhere, it seem that triangle throughput isn't the bottleneck for 1 Pixel = 1 Triangle which in theory should be good even for 4K120, but rather triangle culling (or of course some other factor I'm missing here).
It’s 1 triangle per primitive engine. There are 1 primitive engines per shader array or 2 per shader engine. So total 4 triangles per clock and 64 pixels (ROPS).

So you're doing 4 triangles per clock, actually. But that is optimal triangle/pixel emission. The number can actually go down. And it will go down as you go below the thresholds like 1 triangle and exponentially worse once you get into sub pixel sizes. This blog illustrates this well

So using simple triangle fillrate test, the tiles are primitives, so they should divide into 2. Starting at 1 tile of 1080p, and shrinking to 1x1 pixels. You can see at tile (1,1) Performance has completely fallen off a cliff.
0662-1.png


Subpixel performance becomes an exponential graph the smaller the triangle is. WRT to what we saw, UE5 is at 1
0662-9.png


So we used optimal numbers which was wrong, it's simply not possible to do small triangle per pixel performance using fixed function pipeline, at least not in the traditional sense. They must have done something entirely different, and I suspect UE5 is a very heavy compute based pipeline.
 
Back
Top