Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

no idea we will have to wait for tech paper I suppose, Note Im not claiming there is a billion triangles on the screen, Im saying the world prolly consists of at least a billion tri's at LOD0, eg perhaps that grass tuft is 2000 tri, now if theres 10s of thousands of them in the world well thats a lot right there, it rapidly adds up, the world in my personal game has ~30 million tris and its no where the level of what I've seen in R&C not even in the same ballpark
I didn't know we were tallying up triangles per game level when that's not what's being rendered in the camera frustum.
 
I didn't know we were tallying up triangles per game level when that's not what's being rendered in the camera frustum.

I realized some people were interpreting his intent wrongly, but it felt pretty obvious to me he meant "source assets geometry" and not actual rendered geometry per frame since that was the only possible interpretation that made any sense and was not internally contradictory. It's also what the unreal devs meant when they were saying models had XYZ polys during the UE5 demo. They were pointing to source assets. They made the same "slight of hand" when demoing UE3 back in the day.
 
I realized some people were interpreting his intent wrongly, but it felt pretty obvious to me he meant "source assets geometry" and not actual rendered geometry per frame since that was the only possible interpretation that made any sense and was not internally contradictory. It's also what the unreal devs meant when they were saying models had XYZ polys during the UE5 demo. They were pointing to source assets. They made the same "slight of hand" when demoing UE3 back in the day.
hehe
even with REYES, you're looking at a maximum of 3,686,400 drawn triangles on the screen @ 1440p. REYES with Native 4K 8,294,400 triangles.

I know we don't like resolution counting anymore because it's affect on image quality seems to be not as important as quality per pixel, but in an ideal world, native 4K is going to be significantly more detail than 1440p with REYES.

I think that's sort of the challenge I think going forward for video games, I just don't see assets being that large, or even shippable for a full length 8-12 hour game where the player can do some exploration; most of these triangles being discarded as well.
 
hehe
even with REYES, you're looking at a maximum of 3,686,400 drawn triangles on the screen @ 1440p. REYES with Native 4K 8,294,400 triangles.

I know we don't like resolution counting anymore because it's affect on image quality seems to be not as important as quality per pixel, but in an ideal world, native 4K is going to be significantly more detail than 1440p with REYES.

I think that's sort of the challenge I think going forward for video games, I just don't see assets being that large, or even shippable for a full length 8-12 hour game where the player can do some exploration; most of these triangles being discarded as well.

Isn't it ironic that just as we go back to the "cartige-like" relative performance for loading data from storage, so are the "cartrige-like" relative bottleneck on game size and "cartrige-era-like" relative RAM limitations.
 
"That lets us devote all of our system memory to the stuff in front of you right now,"

So if you believe what's being said there then they're using the full 16GB to render the current viewport only.

I can not believe it.
First, there are obvious things like: Game world needs sime simualtion - AI, physics, etc., and nothing of this can work if you clip away your backside. Now with RT this goes into graphics as well - R&T reflections would not work if they would indeed only have the frustum stuff in memory.
Ofc. those things can do with lower LODs and are maybe out of context of the quote you posted, but still - not clarifying this context makes the quote harder to take serious.

Second, i still don't believe SSD and IO has so low latency the loading could catch up with fast moving viewports. Loading and uncompressing big about of assets within a single frame.... is this indeed possible?
 
I can not believe it.
First, there are obvious things like: Game world needs sime simualtion - AI, physics, etc., and nothing of this can work if you clip away your backside. Now with RT this goes into graphics as well - R&T reflections would not work if they would indeed only have the frustum stuff in memory.
Ofc. those things can do with lower LODs and are maybe out of context of the quote you posted, but still - not clarifying this context makes the quote harder to take serious.

Second, i still don't believe SSD and IO has so low latency the loading could catch up with fast moving viewports. Loading and uncompressing big about of assets within a single frame.... is this indeed possible?

Pal, try to exercise some less literal interpretations of things when its obvious people are using figures of speech or over-simplification to just make a point in a tweet.
 
Last edited:
Pal, try to exercise some less literal interpretations of things when its obvious people are using figures of speech or over-sinplification to just make a point in a tweet.
That's what i meant - did not intend to do nit picking.
But it does not help to clarify anything really. Such twitter posts build up attention but no knowledge. We discuss many pages, and after that it's still all guessing, sympathizing and assumptions. Modern times... :/
 
I think it’s a poor example to use the ‘as the player turns’ comment

It's an example of how the ability to keep more data in VRAM can be advantageous over being able to fill VRAM up quickly (but still much slower than the data already being there).

Now - let’s random fast travel in a game to a new part of the map - or to another planet- or beam down from a ship to planet or in an massive open world have super zoom to extreme details miles away (etc). The speed will be directly limited to the data transfer speed.

All of these things represent potential advantages to having a faster IO, no-ones denying that. As I said, each solution (more VRAM + slower IO vs less VRAM + faster IO) has it's pro's and cons. The idea that I'm pushing back on that smaller VRAM + fast IO is universally better under all circumstances.

All of the above scenario's you mentioned could potentially be done (and done better at that) with more VRAM albeit the developer would be limited in other ways. For example, the fast travel scenario - most games have fixed fast travel entry points. So if you can store 50% of your game content in VRAM at any given time, you can potentially pre-cache the entry points to all of the fast travel options while leaving the areas away from the entry points to be streamed in post travel. This would potentially allow for faster fast travel transitions than loading the data in from SSD.

Beam down from ship to planet - assuming there's more than 1 planet in the game then you pre-cache the one (or several) that are closest to the ship. Again, the result is no transition screen required at all because you're not limited by data transfer speed since that data's already in VRAM.

I'm not saying there aren't scenario's where the fast IO is simply better full stop. Of course there are. But equally there are scenario's where having more VRAM would be better full stop.

If Sony and MS hadn’t put all this effort in the IO systems then you’d have to have all games designed around slower data streaming and that expensive extra memory would take even longer to fill up, which will mean similar scenarios like we had last gen in game design.

But as I mentioned earlier you wouldn't have to fill all the memory just to start the game. It may only require 4GB to actually get into the game while on a 32GB system the remaining 28GB is pre-cached while playing. With a SATA SDD for example that would be only 4 seconds of initial load time with the rest of the data filled in about another 30 seconds of play time. Yes the initial load would be a few seconds faster with the NVMe drive but after that you have 32GB of RAM to play with, and if you don't actually need more than 1.1GB/s (with decompression) worth of streaming during gameplay (which I'll wager most games won't) then the only downside is slower initial loads, and maybe fast travels if you can't pre-cache them with the extra RAM.

Am I saying a 32GB PS5 with a SATA SSD would have been the better design choice? No. I'm simply saying it would provide a different mix of advantages and disadvantages vs the current design.
 
I realized some people were interpreting his intent wrongly, but it felt pretty obvious to me he meant "source assets geometry" and not actual rendered geometry per frame since that was the only possible interpretation that made any sense and was not internally contradictory. It's also what the unreal devs meant when they were saying models had XYZ polys during the UE5 demo. They were pointing to source assets. They made the same "slight of hand" when demoing UE3 back in the day.
Yes I remember the whole Gears of War bullshots very clearly, with the final game looking nothing like them :LOL:.
hmmm I reread my posts again (whilst I do admit often Im terrible at explaining things) but here its pretty clear, the actual world has a shit ton of triangles, perhaps billions like I was saying since this is obviously far too much to throw at a GPU they have to reduce this with culling and LOD to something manageable.
I wasnt talking about the 'source assets geometry' that they use to create normalmaps or what not. Sorry if I wasnt clear enuf.
Heres a slide from infamous second son (view with firefox) now here they say regularly render 11+ million tris (OK a lot of these with be depth buffer stuff/lights etc) but they will also be usually rendering 90+% of the stuff at > LOD0, and you typically only see a fraction of the world at once. Thus I would not be surprised if this game also featured a world with > billion tris @ LOD0
infamous.jpg

R&C / UE5 seems to be doing the same thing, just streaming in the chunks when needed.

Not sure what they mean 3+ million static instance verts per block?
 
The UE5 demo is a display of fast baked data into a scene to test SSD->VRAM speed.

Voxel cone tracing isn't new. The mere fact that you have to have voxels means it's an organized data structure where accuracy depends on the number of voxels for a reasonable approximation. A ray has infinite precision. They just aren't comparable. Also screenspace is what we are trying to get away from this gen. RT doesn't have the limitations of screenspace rendering.

This sounds like you're underestimating Lumen. If they don't lie, then it is not baked, so relation to SSD is only about the extra data they need to compute all lighting in game.

Saying RT has perfect (or even better) precision than alternatives is arguable. RT can use the same geometry we use for the frame buffer, but it can also use lower resolution geometry for optimization purposes. Or it can use simplified material models, or even solid color per object like Metro does for GI AFAIK.
So, approximations are used in RT as well. And beside geometry, denoising often causes blur, lag, even screenspace related disocclusion artefacts, washing a lot of geometric precision away in cases. Just saying.

I wonder what makes you think Metro Enhanced lighting is so much better than UE5 demo?
You mentioned Metro has detail for small objects. UE5 has this too, but using (pretty unstable) SS and (impressively high res) shadow maps.
So, what's the difference in practice you expect? I could list a view, but if we look at promises given about workflow, both have the same. And to the gamer the differences of final results are hard to spot?
 
It's an example of how the ability to keep more data in VRAM can be advantageous over being able to fill VRAM up quickly (but still much slower than the data already being there).

All of these things represent potential advantages to having a faster IO, no-ones denying that. As I said, each solution (more VRAM + slower IO vs less VRAM + faster IO) has it's pro's and cons. The idea that I'm pushing back on that smaller VRAM + fast IO is universally better under all circumstances.

*snip*

Am I saying a 32GB PS5 with a SATA SSD would have been the better design choice? No. I'm simply saying it would provide a different mix of advantages and disadvantages vs the current design.

I’m not sure you needed to push back though. I think the concept that SSD actually solve an important graphics bottleneck is still much less prevalent than that more VRAM or bandwidth or CUs help to improve graphics.

As it stands, 16GB of data can be replaced in 2 seconds. But game worlds aren’t built in visible blocks of 16GB necessarily, because that is not how it works or can work today either. But if you make a slow game with lots of corridors you can kee streaming in new data fast enough even with 50MB/s, a trick that was learnt already when we moved from Cartridge to CD based games and has been well developed. But it is a cumbersome design trick much like baking lighting into textures, hand placing lights, LOD transitions and so on.

If you can load 8GB/s, that means you can do 8/60 in a single 60fps frame, which is 133MB. That may not seem like much, but that means 7-8 frames is 1GB already, and with that you can load a significant portion of your visible data at high details.

Of course that’s not the whole story. An open world has repeating data all over the place. It is far too large to keep in memory no matter how much memory you have and if you are free to go in any direction it is hard to predict what to stream. Now consider if loading in higher detail models for close by viewing are still popping in because the renderer can’t handle the higher detail models yet, or if they can’t be streamed in fast enough. In reality while the previous was a bottleneck in the past, with today’s GPU power the latter is the bigger bottleneck now.

In addition, a big open world is not built from unique data every 2 seconds of travel. The world is littered with data that keeps reappearing - you don’t need unique tree models or animals or moss or plants or houses or planks, bricks and so on ... whatever. You do need to have a database of objects, and locations where they can be statically or dynamically and where they can be retrieved from at super high speed.

That combined with the component prices and the general load times improvements means that I think they made the right decision.
 
I think the concept that SSD actually solve an important graphics bottleneck is still much less prevalent than that more VRAM or bandwidth or CUs help to improve graphics.

And i still think theres a difference between nvme ssd's and actual vram (gddr6, hbm etc). 800gb of gddr6 obviously would be even more capable (but too expensive). Maybe what AMD has done in the past has a future (having a 1tb nvme directly on the gpu besides vram). But that still wouldnt match latency, speed, bandwith etc.
 
This sounds like you're underestimating Lumen. If they don't lie, then it is not baked, so relation to SSD is only about the extra data they need to compute all lighting in game.

What makes Lumen anymore accurate than any other GI solution? I can see in the demo that Lumen doesn't correctly handle occlusion while the object is in shadow. That's because the shader isn't computing the correct light propagation on the surface and normalizing that approximation with an accurate enough occlusion term. The other thing is that Lumen doesn't handle is area lights. I'm tired of the same old lighting setup with a hierarchy of cubes, spheres, etc.. to approximate illumination from a pre-pass. I want a complete light loop for every light source (not just directional) and factor in it's size as well as a decent inverse-square falloff and I want my BRDFs to use importance sampling on the surface as well as rays shot from the light source and to the material using a proper PDF and sampling function.

This lighting is wrong. It's too uniform (flat) like all the other GI light probes techniques.

K8Gh1aJ.png


Saying RT has perfect (or even better) precision than alternatives is arguable. RT can use the same geometry we use for the frame buffer, but it can also use lower resolution geometry for optimization purposes. Or it can use simplified material models, or even solid color per object like Metro does for GI AFAIK.

That's you limiting the approximation. A ray is inherently as accurate as you want. A voxel can be accurate but you'll have scale it down to pixel size.
 
Last edited:
I’m not sure you needed to push back though. I think the concept that SSD actually solve an important graphics bottleneck is still much less prevalent than that more VRAM or bandwidth or CUs help to improve graphics.

As it stands, 16GB of data can be replaced in 2 seconds. But game worlds aren’t built in visible blocks of 16GB necessarily, because that is not how it works or can work today either. But if you make a slow game with lots of corridors you can kee streaming in new data fast enough even with 50MB/s, a trick that was learnt already when we moved from Cartridge to CD based games and has been well developed. But it is a cumbersome design trick much like baking lighting into textures, hand placing lights, LOD transitions and so on.

If you can load 8GB/s, that means you can do 8/60 in a single 60fps frame, which is 133MB. That may not seem like much, but that means 7-8 frames is 1GB already, and with that you can load a significant portion of your visible data at high details.

Of course that’s not the whole story. An open world has repeating data all over the place. It is far too large to keep in memory no matter how much memory you have and if you are free to go in any direction it is hard to predict what to stream. Now consider if loading in higher detail models for close by viewing are still popping in because the renderer can’t handle the higher detail models yet, or if they can’t be streamed in fast enough. In reality while the previous was a bottleneck in the past, with today’s GPU power the latter is the bigger bottleneck now.

In addition, a big open world is not built from unique data every 2 seconds of travel. The world is littered with data that keeps reappearing - you don’t need unique tree models or animals or moss or plants or houses or planks, bricks and so on ... whatever. You do need to have a database of objects, and locations where they can be statically or dynamically and where they can be retrieved from at super high speed.

That combined with the component prices and the general load times improvements means that I think they made the right decision.

First there is not 16 GB of RAM used for game on PS5. If it is the same than on Xbox, there is 13.5 GB and 2.5 GB reserved by the OS and using oodle kraken and texture the average uncompressed data is 10 to 11 GB/s it means between 1.22 and 1.35 seconds to fully load the memory. Some of the memory is reserved by the gameplay logic and simulation of the world.

We see in the last Ratchet and Clank video the loading from one level to another tooks 1.4 GB/s. The community manager told everything SSD related will be slightly faster on final game. This is ok a bit above 1 seconds to wait is not the end of the world. And it is like you said a cheaper and more clever option.

Other things it ill be easier to do portal where you can open a portable in any place in the game world and goes anywhere into the game world like for example in a Doctor Strange game and the character is rumored to be in the next Spiderman.
 
Last edited:
Yes I was working with 8GB on purpose.

To add from the NVidia article:

“When used with Microsoft’s new DirectStorage for Windows API, RTX IO offloads dozens of CPU cores’ worth of work to your GeForce RTX GPU, improving frame rates, enabling near-instantaneous game loading, and opening the door to a new era of large, incredibly detailed open world games.
Object pop-in and stutter can be reduced, and high-quality textures can be streamed at incredible rates, so even if you’re speeding through a world, everything runs and looks great. In addition, with lossless compression, game download and install sizes can be reduced, allowing gamers to store more games on their SSD while also improving their performance.”
 
What makes Lumen anymore accurate than any other GI solution?
I'd answer it supports infinite bounces, which is pretty new to fully dynamic methods.
I can see in the demo that Lumen doesn't correctly handle occlusion while the object is in shadow. That's because the shader isn't computing the correct light propagation on the surface and normalizing that approximation with an occlusion term.
You likely mean missing interaction of character and the wall. I think this is because skinned character is ignored for GI. It only receives light (probably form a probe grid) but does not cause occlusion or bounces.
After the huge lag and missing sharp reflections this seems the major limitation to me, but i assume they address this with using capsules so characters will cause occlusion at least. Bounces they can get a bit from SS, and it's not that character reflections had a big impact on global light transport.
The other thing is that Lumen doesn't handle is area lights.
Pretty sure it does. I expect it works the better, the larger the area light is. Emissive material and arbitrary light shapes should just work. Quality depends on probe and occluder SDF grid resolutions.
I want a complete light loop for every light source (not just directional) and factor in it's size as well as a decent inverse-square falloff and I want my BRDFs to use importance sampling on the surface as well as rays shot from the light source and to the material using a proper PDF and sampling function.
RT / importance sampling are not the only options to achieve this, but they are most efficient for high frequencies.
For global light transport lower frequencies are most important to get right, and here RT alternatives can do faster at the same quality. RT is not the silver bullet to solve all lighting problems just because it can do that.

This lighting is wrong. It's too uniform like all the other GI light probes techniques.
I was just wondering why you were not impressed. Personally i'm not that impressed either, but result is much better then other dynamic GI tech from before. Enough for a generational visual leap, and also enough to eliminate manual work on fill light setup or baking times.
Metro is better, yes, but the difference is visually small in the most cases, and i don't see how DXR could deal with insane Nanite detail. As an artist i would be pretty happy with Lumen.

That's you limiting the approximation. A ray is inherently as accurate as you want. A voxel can be accurate but you'll have scale it down to pixel size.
Sure, but for diffuse GI we can use an optimized representation of geometry, because small scale details don't matter much. It's fine to reduce detail here. (Currently, all shown dynamic GI solutions are laggy and very expensive.)
Also notice SDF gives much better approximation than binary voxels at equal grid resolution. The real limitation which matters much more is using a low res global probe grid (i just assume they do).

Detailed geometry matters for shadows and sharp reflections, and for the former they have a pixel exact solution at RT precision (likely tiled dynamic resolution shadow maps - impressive work which did not receive much attention in media). This won't work for area light shadows ofc., so there is some gap between area light support from GI and point light restricted SM. One way to fill this gap would be stochastic SM i think, but not sure if it's worth to disable SM occlusion culling just to fill this gap. Actually they support blurring sharp shadows in screenspace to fake some penumbra, and there is other SS hackery to fake soft shadows / directional occlusion stuff as well.
Looks like good compromises to me. Again happy as artist - would only complain about missing sharp reflections. They say they'll make Lumen faster (easy: Just accept more lag), and they'll also try reflection tracing (worked fast with VCT approaches, but missing characters becomes some problem then maybe).
 
RT / importance sampling are not the only options to achieve this, but they are most efficient for high frequencies.
For global light transport lower frequencies are most important to get right, and here RT alternatives can do faster at the same quality.
That's true, but you will spend so much time hacking your scenes and custom tailoring it to the dynamic behavior of your game. It's not the way to go for an engine that will be used by the masses like UE. Also, while you may get faster convergence to a solution to the lighting equation, it more than likely won't be a robust solution (i.e. applicable to any type of game with any kind of art direction).

RT is not the silver bullet to solve all lighting problems just because it can do that.

I disagree here. But I come from the film industry where I'm used to seeing things rendered "correctly". I'm naturally going to be hard on realtime systems even though I work with them. You may feel strongly against not using accurate solutions but I think if we are all honest here, it would be because the hardware just isn't there. I've seen the industry bark at RT back when REYES algorithm micropolygons was the thing. But over time, it quickly became apparent that we couldn't use the shortcomings of it in any diverse lighting pipeline. The pain of trying to work around baking out brickmaps for subsurface skin and having shadow maps render first before the main render was a bear to deal with. I think as the hardware gets more powerful, game devs will realize the easy of use and stunning results of a full RT pipeline. It may take longer than this generation of hardware but it's coming..

Looks like good compromises to me. Again happy as artist - would only complain about missing sharp reflections. They say they'll make Lumen faster (easy: Just accept more lag), and they'll also try reflection tracing (worked fast with VCT approaches, but missing characters becomes some problem then maybe).

At least UE4.26 has a path tracer in the code anyway. If I were going to make a game today, I'd forego their Lumen for lighting and just use the RT lighting pipeline. Since it's completely feasible from 4A's implementation (Metro), it should be a big priority to get the speed increase in UE5 as well so that it is plausible for consoles.

Lastly, I have not received UE5 yet to make a determination whether rivals Metro or bests it. I just asked one of the Producers at Epic if they are a go on UE5 release. He told me any day now. Until I see it in my hands and can test it myself, I will assume it doesn't have the functionality to remedy the concerns I have in the UE5 trailer.
 
I’m not sure you needed to push back though. I think the concept that SSD actually solve an important graphics bottleneck is still much less prevalent than that more VRAM or bandwidth or CUs help to improve graphics.

As it stands, 16GB of data can be replaced in 2 seconds. But game worlds aren’t built in visible blocks of 16GB necessarily, because that is not how it works or can work today either. But if you make a slow game with lots of corridors you can kee streaming in new data fast enough even with 50MB/s, a trick that was learnt already when we moved from Cartridge to CD based games and has been well developed. But it is a cumbersome design trick much like baking lighting into textures, hand placing lights, LOD transitions and so on.

If you can load 8GB/s, that means you can do 8/60 in a single 60fps frame, which is 133MB. That may not seem like much, but that means 7-8 frames is 1GB already, and with that you can load a significant portion of your visible data at high details.

Of course that’s not the whole story. An open world has repeating data all over the place. It is far too large to keep in memory no matter how much memory you have and if you are free to go in any direction it is hard to predict what to stream. Now consider if loading in higher detail models for close by viewing are still popping in because the renderer can’t handle the higher detail models yet, or if they can’t be streamed in fast enough. In reality while the previous was a bottleneck in the past, with today’s GPU power the latter is the bigger bottleneck now.

In addition, a big open world is not built from unique data every 2 seconds of travel. The world is littered with data that keeps reappearing - you don’t need unique tree models or animals or moss or plants or houses or planks, bricks and so on ... whatever. You do need to have a database of objects, and locations where they can be statically or dynamically and where they can be retrieved from at super high speed.

That combined with the component prices and the general load times improvements means that I think they made the right decision.
I'd love to know what big developers would rather have had if given the option:

Let's use Xbox Series X:

16GB RAM + fast NVME SSD (actual specs)

Or...

20GB RAM + SATA SSD

The 2nd option gives you unified memory bandwidth but I'm assuming you lose much of the benefits of Direct Storage and possibly SFS with the SATA SSD.
 
First there is not 16 GB of RAM used for game on PS5. If it is the same than on Xbox, there is 13.5 GB and 2.5 GB reserved by the OS and using oodle kraken and texture the average uncompressed data is 10 to 11 GB/s it means between 1.22 and 1.35 seconds to fully load the memory. Some of the memory is reserved by the gameplay logic and simulation of the world.

We see in the last Ratchet and Clank video the loading from one level to another tooks 1.4 GB/s. The community manager told everything SSD related will be slightly faster on final game. This is ok a bit above 1 seconds to wait is not the end of the world. And it is like you said a cheaper and more clever option.

Other things it ill be easier to do portal where you can open a portable in any place in the game world and goes anywhere into the game world like for example in a Doctor Strange game and the character is rumored to be in the next Spiderman.
I am not sure if it's public Info - but I think the PS5 maybe has less RAM available than XSX? Sony has not puvlicly given the Reservation numbers out for threads and RAM yet.
 
Back
Top