Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

milk · May 15, 2020

The demo seems pretty grounded in reality to me. Despite still being a tech demo, they did frame it like a 3rd person game. Unlike their recent RT demos which seemed more targeted at animation studios, this new UE5 demos seems very focused on GAME making.

I'll point out a few thoughts still running through my head:

LODing: Any game with large environmenta still needs LOD management. When they say they don't need LODs, I am certain they actually mean "artists don't need to worry about LODs"

How are the LODs transitions handled? Maybe with micro-polygons they feel no transition is needed at all. New polys pop in and out of screen with no attempt to smooth that out and it still looks ok because they are all pixel sizef changes.

The implication of that are that this tech then NEEDS geometry to to be near-pixel sized for it to actually look good. It is a whole paradigm shift. So I'm very curious about how that can scale down.

Texturing/shading. So with a reyes-like world, do they bother to texture these micro-polies traditionally? They might just use gourad shaded polys at this point, becausw the mesh is dense enough for that to be enoigh and sounds so fucking high tech to me.

That also changes a lot about how one can encode/store/compress that data. And for XBSX's HW decompression to be leveraged, as I understand it, that data must be encoded as a texture, and holefuly in a way that ends up highly compressable in that format. SONY's more flexible choice for Kraken compression might be a big win here.

And Shading. Do you guys actually think they went full reyes here and went with object space shading? Doing it all per vertex? Ironic how old becomes new hugh... Anyway, I don't think they are going that route. my guess is they are sticking to deferred shading. For a generalist engine like UE, it is a good idea to have a G-Buffer, and it keeps stuff well separated. Object space shading just entangles lighting and material shading with geometry processing and all their culling and LODing wizzardly. Makes the engine too unwildly. I'd bet money that they are sticking to deferred.

Scott_Arm · May 15, 2020

chris1515 said:
They said 1 billion of triangle per frame before culling to 20 millions. Don't know if they are all fetched in memory. I doubt it.

No way, in fact it would be dumb to do so. If you have fine-grained loading, which is what I imagine part of virtual geometry is, then you can load mostly only the geometry you need. That makes disk io even better, because you don't waste much of that io on things you won't see on screen.

iroboto · May 15, 2020

patsu said:
It is not always about big numbers. Efficiency is important too.

In software, if you do things differently and save resources, you can do more.

If you put that SSD subsystem into a PS4 (along with the dependencies), devs would be able to make games differently. It would be a different and better game altogether. They may have more time to do things to surprise you. Personally I wouldn’t be looking at only the resolution. That is what the presentation is trying to tell people.

“Developers, developers, developers”

Yea for sure but I'm talking wrt this demo here.
Using REYES the goal of the engine is to produce 1 triangle per pixel; resolution is directly tied to detail seen; or 4K gaming will actually be leaps and shoulders above what 1080p gaming because we have largely been held back by the inability to do 1 triangle per pixel with FF hardware. But because this is software rasterization, there is an incredible amount of instructions and overhead deployed to do this, and normally software rasterization cannot exceed the capabilities of FF hardware (4 triangles per clock * 2.230 GHz is insanely over 3.7M triangles), but because sub pixel triangles and 1 triangle per pixel makes performance exponetially worse, this is the only method to do it.

So on the topic of RDNA, the question should be about compute and bandwidth before we get into discussion about SSD speeds. Because compute and bandwidth is going to ultimately be the largest determination of bottleneck here. You're using it to cull and draw triangles here. It's got a lot of work to do. If we look at RDNA, and the Work Group Processor design, (dual CU), each dual CU shares instruction cache, scalar data cache, LDS etc, so each dual CU that reuses information there doesn't need to go back to L2 or VRAM, making more CUs more efficient with embarrassingly large workloads, workloads like 20 Million triangles that need to be sorted, and then culled, and drawn to resolution. Each CU has 32 threads available, a WGP having 64. In this case having way more CUs will enable more work to be put in threads. For PS5 you've got like 1152 threads available for processing vs 1664 threads that can be cycled through and processed. We don't know how long it takes for the instructions to process, but having more dual CUs and thus more local caches to hit, before hitting L2 or VRAM, more CUs are going to be beneficial for this type of engine that's a lot more you can do in parallel. So then it comes with bandwidth feeding, in which more bandwidth means more work can be done.

I don't think SSD is the biggest bottleneck here. Think about it from this perspective. How upset would people be if a 2080 was only capable of 1440p30 with no ray tracing on. Because that's nearly what's happening here. It's pure GPU that is holding performance back from hitting higher numbers not the SSD. I think this should be obvious, most GPUs are really going to be tested here because they can't rely on FF hardware performance to bail them out.

I dare say, Radeon VII and Veg64 may finally have their moment to redeem themselves. I hope for Google's sake.

Scott_Arm · May 15, 2020

@milk Not sure. Object-space shading may be out.

https://twitter.com/i/web/status/1260671077216210944

Also, they talk about having multiple 8k textures per mesh/model, so they haven't abandoned texturing. They have a virtual texturing system. I just imagine the mapping between texels and polygons is close to 1:1 because the polys are so small.

Scott_Arm · May 15, 2020

@iroboto I don't think you can necessarily know where the bottleneck is. To visibility test the geometry to know what to cull, you have to get it into RAM first. One system may be slower at getting the geometry into memory, but cull it faster, and the other may be faster at getting the geometry into memory and cull it slower. It's two different bottlenecks for the same problem. All we know is for the same workload on both platforms PS5 would not bottleneck by IO, and the Xbox would not bottleneck from compute performance.

milk · May 15, 2020

Thoughts on linitations:

I think the setting of the demo is not accidental. A rocky moutain full of carved sculptures is the perfect place to show off ultra-high geometric density. It also happens to de-emphasize the limitations of the tech...

From all the rummors and whispers about nanite, it seems like what they have is tech that encodes movie quality meshes into some format that they can stream that mesh in chunks (meshlet-like), LOD and cull it in real time highly efficiently. Thay must have some sort of data strucuture and hierarchy to make tha efficient.
So that all makes it sound to me like it relies of these meshes being STATIC. Not that the whole environment needs to be so, but individual objects do.

So that leaves out skinned characters, deformable terrain, physics-driven soft bodies, swaying vegetation, cloth...

We will sure need other novel aproaches to process and render highly dense animated meshes if we don't want their apearence to clash with these super detailed environments they'll be in.

I'm still waiting for a real time catmull-clark like sub-div sheme. Some have tried doing it with DX11 tesselation this gen but it seems to have been a dead-ende there. Maybe primitive-shaders will do the trick this time...

Scott_Arm · May 15, 2020

milk said:
Thoughts on linitations:

I think the setting of the demo is not accidental. A rocky moutain full of carved sculptures is the perfect place to show off ultra-high geometric density. It also happens to de-emphasize the limitations of the tech...

From all the rummors and whispers about nanite, it seems like what they have is tech that encodes movie quality meshes into some format that they can stream that mesh in chunks (meshlet-like), LOD and cull it in real time highly efficiently. Thay must have some sort of data strucuture and hierarchy to make tha efficient.
So that all makes it sound to me like it relies of these meshes being STATIC. Not that the whole environment needs to be so, but individual objects do.

So that leaves out skinned characters, deformable terrain, physics-driven soft bodies, swaying vegetation, cloth...

We will sure need other novel aproaches to process and render highly dense animated meshes if we don't want their apearence to clash with these super detailed environments they'll be in.

I'm still waiting for a real time catmull-clark like sub-div sheme. Some have tried doing it with DX11 tesselation this gen but it seems to have been a dead-ende there. Maybe primitive-shaders will do the trick this time...

They do show deforming terrain in the demo. As the player is flying near the end the towards and canyon walls are crumbling and falling. Now that's likely some alembic cache type stuff, playing back a recorded animation rather than actual physics bodies.

Shifty Geezer · May 15, 2020

chris1515 said:
They said 1 billion of triangle per frame before culling to 20 millions. Don't know if they are all fetched in memory. I doubt it.

They're not. The data won't fit in RAM. The whole showcase (well 50% of it) was being able to stream data from storage as virtualized micropolygons.

So the question is how many triangles are they loading, what data rate that requires, and what seek-performance is needed as well.

From the vid, "A billion triangles per frame, which Nanite crunches down losslessly to around 20 million drawn triangles." At 1440p, that's 4 million pixels. This to me suggests every frame needs 20 million triangles loaded (not unrealistic) to be culled for what's drawn on screen. That won't be 20 million unique triangles per frame as the same triangles will be mostly reused. We can start hazarding guesses about the storage requirements*, but it's not non-zero and it's certainly capable of being as much as PS5 delivers based on seek rates. So yeah, this demo could be limited by PS5's SSD performance just as much by the GPU. I don't see how people can imagine it's not stressing the storage at all. We can't possibly know and need Epic to tell us.

* random illustrative examples, ignoring seek times that may affect amounts because with slower seeks, you'd need larger tiles of data and to store more, larger tiles.
20,000,000 x 12 bytes per triangle is 240,000,000 bytes/frame, at 30 fps is 6.7 GB/s
1,000,000 x 6 bytes per triangle is 6,000,000 bytes/frame, at 30 fps is 0.17 GB/s

But then the end scene with the fast traversal is clearly needing to stream data much faster than the early scenes. I expect some degree of dynamic LOD.

Shifty Geezer · May 15, 2020

Scott_Arm said:
@iroboto I don't think you can necessarily know where the bottleneck is. To visibility test the geometry to know what to cull, you have to get it into RAM first. One system may be slower at getting the geometry into memory, but cull it faster, and the other may be faster at getting the geometry into memory and cull it slower. It's two different bottlenecks for the same problem. All we know is for the same workload on both platforms PS5 would not bottleneck by IO, and the Xbox would not bottleneck from compute performance.

I also don't think they use 'cull' to mean not drawn, but selected from the data on storage. There's a billion triangles in the scene, but they only need to load 20 million to include the triangles that need to be drawn. My guess is a high-level representation that selects the necessary tiles from storage; I can't see it working any other way with that much data and is the basis of virtualised assets which is what the geometry is described as. If you're loading all the triangles and then culling them, it's not virtualised.

disco_ · May 15, 2020

ToTTenTranz said:
The demo released yesterday was recorded in March.

Yeah, saw the card on UE's YT page. Shit.

patsu · May 15, 2020

Scott_Arm said:
@iroboto I don't think you can necessarily know where the bottleneck is. To visibility test the geometry to know what to cull, you have to get it into RAM first. One system may be slower at getting the geometry into memory, but cull it faster, and the other may be faster at getting the geometry into memory and cull it slower. It's two different bottlenecks for the same problem. All we know is for the same workload on both platforms PS5 would not bottleneck by IO, and the Xbox would not bottleneck from compute performance.

Yes, the boxes are architected differently. Developers will need some time to understand their nuances as they figure out near real-time asset loading. e.g., whether it uses up more computing resources in some cases, whether some OS layer, drivers introduce unexpected overhead, does the clocking scheme and profile work as advertised, is there a bad stall or bubble somewhere. On top of that, if they go for 3D audio, they may have to reserve CUs for the audio workload.

The UE5 developers went through that for both consoles. Even for them, they may need more iterations to figure it all out.

The real bottleneck is often on the human side, especially with this COVID-19 lockdown business.

We should at least wait for the upcoming UE5 tech presentation. Someone twitted about it.

Scott_Arm · May 15, 2020

I'm guessing the blocks that you'd read off disk would probably cache friendly sizes for the gpu, which is still fairly small. You're not going to load exactly the number of polygons you need, but you're going to load way less than the full model. You'll probably load gpu cache sized chunks and hope that for a lot of the screen there's spatial locality so you can cover many pixels from the same chunk. Where that falls apart is the distance. Two pixels side by side may be from two totally different models if those models are distant. I'm wondering if at some point they actually switch to sdf representations or imposters. At some point it doesn't make sense to load geometry, like when you'd be reducing the entire model down to 5-10 polygons and selecting one to fit a pixel.

Barrabas · May 15, 2020

function said:
"Press X to gameplay"

Demo looked really, really nice for the most part, but I know it's a linear, limited interactivity, single character demo.

When we do start seeing games they're going to have to have things like traversable environments (for AIs as well as human players), cover areas, spawn points positioned for gameplay purposes, far, far less predictable stress points (many enemies + AI + multiple players + grenades and special weapons), plus ... just, like, etc etc.

I'm sure the engine will still be leading edge and produce amazing visuals, but when actual games land the gameplay bits will often have to look rather more .... game like.

Not sure what you meant with this where my response was just to your comment that you wouldn't be surprised if it was running on a PC. I don't expect "real" games to look like this before several years ahead, not even sure at the coming console generation at all. As I have said before, I will keep my expectations in check.:smile2:

iroboto · May 15, 2020

Shifty Geezer said:
How many triangles are being fetched from storage and then culled to what's rendered on screen?

20 million on frame before being culled down. So there is 20 million in memory. Not sure about fetch amount from SSD.

Scott_Arm · May 15, 2020

@patsu Series X has an audio dsp that can do convolution reverb, just probably not to the same extent as PS5. Their entire audio solution is built around pre-computing the impulse responses for audio playback, but the convolution will be done on the dsp. Means the GPU shouldn't have to be involved up to some unknown number of sound sources.

function · May 15, 2020

Barrabas said:
Not sure what you meant with this where my response was just to your comment that you wouldn't be surprised if it was running on a PC. I don't expect "real" games to look like this before several years ahead, not even sure at the coming console generation at all. As I have said before, I will keep my expectations in check.:smile2:

It just started with my first comment "Press X to gameplay", which is a little how the demo felt to me. Then my thoughts just kept rolling from there.

Not everything really applied to what you'd said, sorry it came off that way. Yes, cautiously optimistic, but expectations in check!

Scott_Arm · May 15, 2020

Shifty Geezer said:
I also don't think they use 'cull' to mean not drawn, but selected from the data on storage. There's a billion triangles in the scene, but they only need to load 20 million to include the triangles that need to be drawn. My guess is a high-level representation that selects the necessary tiles from storage; I can't see it working any other way with that much data and is the basis of virtualised assets which is what the geometry is described as. If you're loading all the triangles and then culling them, it's not virtualised.

Yah, I think "culling here" means visibility testing from whatever chunk of data they're loading. Like a virtual texturing system has a map that associates pixels to texels in texture pages, there's probably something similar where there's a map of screen pixels to polygons in a chunk from a mesh. But who knows. I think the data structure could be fairly complicated.

iroboto · May 15, 2020

Shifty Geezer said:
They're not. The data won't fit in RAM. The whole showcase (well 50% of it) was being able to stream data from storage as virtualized micropolygons.

So the question is how many triangles are they loading, what data rate that requires, and what seek-performance is needed as well.

From the vid, "A billion triangles per frame, which Nanite crunches down losslessly to around 20 million drawn triangles." At 1440p, that's 4 million pixels. This to me suggests every frame needs 20 million triangles loaded (not unrealistic) to be culled for what's drawn on screen. That won't be 20 million unique triangles per frame as the same triangles will be mostly reused. We can start hazarding guesses about the storage requirements*, but it's not non-zero and it's certainly capable of being as much as PS5 delivers based on seek rates. So yeah, this demo could be limited by PS5's SSD performance just as much by the GPU. I don't see how people can imagine it's not stressing the storage at all. We can't possibly know and need Epic to tell us.

* random illustrative examples, ignoring seek times that may affect amounts because with slower seeks, you'd need larger tiles of data and to store more, larger tiles.
20,000,000 x 12 bytes per triangle is 240,000,000 bytes/frame, at 30 fps is 6.7 GB/s
1,000,000 x 6 bytes per triangle is 6,000,000 bytes/frame, at 30 fps is 0.17 GB/s

But then the end scene with the fast traversal is clearly needing to stream data much faster than the early scenes. I expect some degree of dynamic LOD.

You’re not loading 20 million triangles 30fps from storage.
Majority of this is happening from memory. You’re only loading in new data coming in and old data out. There is still a buffer period. PS5 can have at most a theoretical 50% shorter buffer length than XSX (very unlikely). Doesn’t mean buffer is gone.

the only way SSD is the bottleneck on pS5 is to require the hard drive to be running a full 5.5 GB/S every single second without rest. Be reasonable. That’s not virtual texture streaming at all.

there is a large buffer wrt to the performance of the hard drive or we would be hitching everywhere.

TLDR; IMO, and I think I make a fairly decent case here, the demo is compute bound. It will disproportionately favour wide GPUs over narrow. Compute shaders are no longer augmenting the FF pipeline, they are doing all the work. This will benefit wide over clockrate. Streaming speeds can be mitigated by buffer size if 2x peak throughout is truly that big of an issue.

Scott_Arm · May 15, 2020

iroboto said:
You’re not loading 20 million triangles 30fps from storage.
Majority of this is happening from memory. You’re only loading in new data coming in and old data out. There is still a buffer period. PS5 can have at most a theoretical 50% shorter buffer length than XSX. Doesn’t mean buffer is gone.

Yah, like virtual texturing has a tile cache, the virtual geometry system has a cache. That way if the camera shifts slightly you most likely already have most of the tiles you need. I don't think it's just going to be a 2D cache of tiles like with textures, but you want to prevent as much loading as possible like you would in virtual texturing.

lilli · May 15, 2020

Wonder how completely static tree with leafs made out of triangles instead of alpha cut outs would like with this tech.

Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

milk

Like Verified

Scott_Arm

iroboto

Daft Funk

Scott_Arm

Scott_Arm

milk

Like Verified

Scott_Arm

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

disco_

patsu

Scott_Arm

Barrabas

iroboto

Daft Funk

Scott_Arm

function

None functional

Scott_Arm

iroboto

Daft Funk

Scott_Arm

lilli

Similar threads