Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Those quotes are saying, "we don't know." ;)

Does Nanite scale with compute power? Yes. Does it also scale with drive performance? Maybe. Nothing there or anywhere else I've seen suggests compute is the bottleneck and storage performance isn't, and the dialogue from Epic is choosing to emphasise storage. The two reasons for that are either that it is important, or that they're basically marketing a falsehood for Sony to push PS5 interest where the tech doesn't actually benefit.

Given a range of possible ways of approaching this problem, either/or GPU/SSD could be limiting factors. In fact you'd expect the solution to vary based on available resources, streaming more on faster rives and caching more on larger RAM devices. There's no way I'd discount SSD impact though and point my finger at compute power here. That's jumping to conclusions.
awesome

Wouldn't it be a little of both. Storage performance does allow it to scale but how much is dependent on storage is also part marketing.

Will we ever really know unless Epic puts this demo out on pc for everyone to test ? Then you can you can have the ultimate judge of how important storage performance is. Get tons of main system ram and just fill it up with Nanite and compare it to just an ssd and then compare that to a traditional hard drive. I don't know if we will ever get that. I remember the controversy around the original unreal demos on PS4


and this goes for both consoles. We all remember the initial wave of demos from games like watch dogs and Witcher 3 that later got downgraded in terms of graphics.
 
So what you're saying here is you would leave open the possibility that if PS5 had 20 TF of compute power. It would still run 1440p30. With the same SSD setup.
No. I'm saying if PS5 had a slower SSD, it might be running less detail than it is. It might even run less detail at a higher res and/or framerate. If it had the same SSD and a faster GPU, it'd be drawing higher res (maybe same detail) and higher framerate.

I'm saying the detail level may have been chosen by the drive performance, and then the output resolution and framerate based on that detail, because Epic wanted the primary message for their engine to be "Unlimited detail is a solved problem". They had a fabulously strong message by repeating that they are using source assets in original fidelity.
 
No. I'm saying if PS5 had a slower SSD, it might be running less detail than it is. It might even run less detail at a higher res and/or framerate. If it had the same SSD and a faster GPU, it'd be drawing higher res (maybe same detail) and higher framerate.

I'm saying the detail level may have been chosen by the drive performance, and then the output resolution and framerate based on that detail, because Epic wanted the primary message for their engine to be "Unlimited detail is a solved problem". They had a fabulously strong message by repeating that they are using source assets in original fidelity.
But that doesn't make any sense.
The nanite engine will cull all sub-pixel triangles and that is directly tied to resolution.
With a high enough resolution, there would be no sub-pixel triangles any more, you would be seeing the assets in their original format which therefore means detail.

That has nothing to do with SSD bandwidth once the required data is loaded into memory.

To be clear, the demo is running on their dynamic resolution system. It was not fixed. They state that the majority of the time the resolution 1440p. Not that they set it to 1440p. Their dynamic resolution system is based on load. And you can't cull triangles before it arrives to the GPU. So that full section of that visible asset _must_ be resident in memory before culling.

So even when you're standing still, where SSD is not streaming anything, the compute was only capable of delivering 1440p30. The SSD was not restricting the performance of the system to 1440p there. The only way SSD can be apart of that discussion, is somehow if it is always streaming more than what is contained in memory.
 
Last edited:
Lets not forget the original UE4 announcement in 2012 when real-time GI using sparse Voxel Octree from Cyrill Crassin (who was an intern at nVidia) was supposed to be the next big thing since sliced bread & the only lighting system in the engine..Made for good hype & buzz at the time just like what is happening today...but then got silently ditched before release in favor of LightMass baking & later pre-computed lightprobes.

It even had a Siggraph talk:
https://cdn2.unrealengine.com/Resou...Behind_the_Elemental_Demo_16x9-1248544805.pdf

What we see in UE5 demo may or may not be what we get 18 months down the road. PR is a hell of a thing.
 
Last edited:
There's the whole issue of determining what triangles to load in the first place. There are 1 billion triangles conceptually in the scene, which have to be represented somehow. This representation needs to be processed to determine which triangles need to be present cached in memory, and then those triangles are processed to determine which ones to draw, and then those triangles are drawn. You only appear to be talking about drawing the triangles as if processing which ones to load and fetching them is trivial. If we pair a monster GPU with a poor storage solution, perhaps Nanite processes the scene in a nanosecond and determines what parts of the scene geometry are needed, but the storage can only deliver 20,000 triangles a frame, so instead of having one triangle per pixel, the scene is rendered as large polygons at 300 fps 8K resolution.

Are you saying that the only reason Epic have talked about the SSD performance requirement is because they bigging up Sony's platform for marketing purposes, and it doesn't play a notable role in this demo?
 
Are you saying that the only reason Epic have talked about the SSD performance requirement is because they bigging up Sony's platform for marketing purposes, and it doesn't play a notable role in this demo?
The primary bottleneck. Not a secondary bottleneck. I can reverse your argument then and move this SSD to PS4 and say it's can output more triangles than XSX because XSX failed to stream in enough data and that XSX will do less triangles but with more pixels per triangle?

Epic designed this engine to scale up and down. Up for movie quality (Mandalorian for instance) and low for Android phones. The whole point is to see things in real time to do your work on the Mandalorian and now they don't need to down sample their work anymore, just let the engine do it. So you'd have to believe that they developed a solution that doesn't work for their intended target audience because the SSD technology doesn't exist.
 
I wonder if in time, RT cores could potentially be used to accelerate traversal of the data structures in this implementation of GI, just as it does with BVH structures..?!
 
Let me put this another way, for everyone on the whole SSD bandwagon.
Why do you think in today's video game development, prior to UE5 discussion and prior to everything, that artists are required to cull models all the way down, but texture quality can stay high, or is a function of the available capacity memory?

Is it because the meshes are too large to fit in memory ?
Or because our fixed function pipeline chokes on small triangles therefore nuking all performance available.

Before you answer: X1X runs Gears 5 at 60fps at 4K with 4K textures btw. So the only thing missing are triangle meshes right?
100MB/s drive btw. terrible random read performance.

If you think it's because the meshes are too large to fit in memory, and it's not a compute problem, you're in the SSD category.
If you're in the fixed function pipeline choke, you are in the compute category.

Also note textures are significantly larger than meshes.
 
Last edited:
The primary bottleneck. Not a secondary bottleneck.
I don't understand the distinction between primary and secondary.

What we see on screen consists of framerate, resolution, and asset detail. Framerate and resolution are largely limited by processing power. Asset detail is largely limited by 'data architecture' (storage and buses) and processing power. A well balanced game/showcase will use all parts optimally and be equally bottlenecked by all of them - we haven't the processing power to go higher resolution, we haven't the bandwidth to go high framerate, we haven't the storage capacity to fit in more detail, and we haven't the streaming capacity to have more varied content.

Perhaps you've only been talking about framerate and resolution this whole time? I've been talking about the tech as a whole and the parts that are coming in to play to enable 'unlimited detail' displaying max-quality assets.
 
I don't understand the distinction between primary and secondary.

What we see on screen consists of framerate, resolution, and asset detail. Framerate and resolution are largely limited by processing power. Asset detail is largely limited by 'data architecture' (storage and buses) and processing power. A well balanced game/showcase will use all parts optimally and be equally bottlenecked by all of them - we haven't the processing power to go higher resolution, we haven't the bandwidth to go high framerate, we haven't the storage capacity to fit in more detail, and we haven't the streaming capacity to have more varied content.

Perhaps you've only been talking about framerate and resolution this whole time? I've been talking about the tech as a whole and the parts that are coming in to play to enable 'unlimited detail' displaying max-quality assets.
read my second post.
This should make more sense.
 
So you'd have to believe that they developed a solution that doesn't work for their intended target audience because the SSD technology doesn't exist.

Won't that technology exist by the end of this year? Off the top of my head, Sony's Cerny said that 7GB/s drives will be required for storage expansion, and some should be available by the end of the year.
 
I don't understand what you guys are arguing about. You are both right. You need compute power to be able to do your visibility testing, culling. It's also true that you need an SSD that's fast enough to be able to read blocks of data for models and 8k textures very quickly. Each model had up to four 8k textures, by the description. There's a ton of texture data being moved. Is the Series X fast enough to stream in all that texture data? We have no idea because they didn't run it on series x and they didn't give any metrics. It's definitely possible that slower drives would have to live with 4k textures in place of 8k textures etc. We don't know yet

Maybe PS5 doesn't have to worry about mip selection because it's just fast enough to load textures in. Maybe a slower SSD on a PC, or some other device would have to live with 4k or 2k textures. Maybe Xbox will stream in a lower resolution textures then blend and swap to the high resolution texture if it arrives a frame late (that is how they described sampler feedback streaming). We don't know if the scenes on ps5 were pushing io in a way that the series x ssd couldn't handle.
 
Is it because the meshes are too large to fit in memory ?
Or because our fixed function pipeline chokes on small triangles therefore nuking all performance available.
Yes. But once you remove the choking on tiny triangles, you then have the problem of crazy amounts of data to fit in RAM. So what becomes the bottleneck then? Processing performance drawing all those triangles, or storage performance (including processing overhead to manage it all) juggling which triangles are in memory?

read my second post.
This should make more sense.
Nope. I don't understand what you mean by 'primary' and 'secondary' bottlenecks. The bottlenecks are defined by the workloads asked.

Okay, after lot's of thinking I wonder if I've figured out your argument. ;)

We are told there are 20 million drawn triangles, so there are 20 million fetched from storage present in RAM to draw.
We are told there is one triangle drawn for each pixel.
PS5 renders 1440p, so that's 3.7 million pixels.
That therefore means in reality, 3.7 million triangles are drawn, not 20 million.
Therefore, the majority are culled.
If the processing was faster, more of those 20 million triangles could be rendered to screen.
Ergo, the bottleneck is the processing power, not the storage, as the storage is capable of delivering far more data than the PS5 is capable of rendering.

Is that it?
 
Let me put this another way, for everyone on the whole SSD bandwagon.
Why do you think in today's video game development, prior to UE5 discussion and prior to everything, that artists are required to cull models all the way down, but texture quality can stay high, or is a function of the available capacity memory?

Is it because the meshes are too large to fit in memory ?
Or because our fixed function pipeline chokes on small triangles therefore nuking all performance available.

Before you answer: X1X runs Gears 5 at 60fps at 4K with 4K textures btw. So the only thing missing are triangle meshes right?
100MB/s drive btw. terrible random read performance.

If you think it's because the meshes are too large to fit in memory, and it's not a compute problem, you're in the SSD category.
If you're in the fixed function pipeline choke, you are in the compute category.

Also note textures are significantly larger than meshes.

Zbrush model are very heavy...



At the end the dev knows better and he said we could have more details on PS4 with a SSD probably textures.
 
Last edited:
Yes. But once you remove the choking on tiny triangles, you then have the problem of crazy amounts of data to fit in RAM. So what becomes the bottleneck then? Processing performance drawing all those triangles, or storage performance (including processing overhead to manage it all) juggling which triangles are in memory?

Nope. I don't understand what you mean by 'primary' and 'secondary' bottlenecks. The bottlenecks are defined by the workloads asked.

Okay, after lot's of thinking I wonder if I've figured out your argument. ;)

We are told there are 20 million drawn triangles, so there are 20 million fetched from storage present in RAM to draw.
We are told there is one triangle drawn for each pixel.
PS5 renders 1440p, so that's 3.7 million pixels.
That therefore means in reality, 3.7 million triangles are drawn, not 20 million.
Therefore, the majority are culled.
If the processing was faster, more of those 20 million triangles could be rendered to screen.
Ergo, the bottleneck is the processing power, not the storage, as the storage is capable of delivering far more data than the PS5 is capable of rendering.

Is that it?
Yes brother!
 
It might well be. But streaming isn't just about transfer speeds but whole drive data access speeds, and we don't know well detail scales with drive performance (transfer rates and access times).
I've never yet met a programmer who when asked about whether I/O was fast enough, said "yes". This literally has never happened in the history of the universe.
 
I've never yet met a programmer who when asked about whether I/O was fast enough, said "yes". This literally has never happened in the history of the universe.

:LOL:

Some might see it as "just" a fast SSD. But because of the high/guaranteed QoS, it is more like a realtime computing platform. Such things can enable new classes of time critical applications (like a military radar tracking system -- with a realtime database). It is a very useful tool if your system needs to respond to ad hoc events in the wild, and generate output in real time even if overloaded.
 
I don't understand what you guys are arguing about. You are both right. You need compute power to be able to do your visibility testing, culling. It's also true that you need an SSD that's fast enough to be able to read blocks of data for models and 8k textures very quickly. Each model had up to four 8k textures, by the description. There's a ton of texture data being moved. Is the Series X fast enough to stream in all that texture data? We have no idea because they didn't run it on series x and they didn't give any metrics. It's definitely possible that slower drives would have to live with 4k textures in place of 8k textures etc. We don't know yet

Maybe PS5 doesn't have to worry about mip selection because it's just fast enough to load textures in. Maybe a slower SSD on a PC, or some other device would have to live with 4k or 2k textures. Maybe Xbox will stream in a lower resolution textures then blend and swap to the high resolution texture if it arrives a frame late (that is how they described sampler feedback streaming). We don't know if the scenes on ps5 were pushing io in a way that the series x ssd couldn't handle.

For me like, I'm trying my best to separate what could be I/O and what could be compute.
and when I think of compute we have been doing 4K graphics on lesser hardware. So most people just think it's an I/O problem.
See Xbox One X for instance. Most people are willing to trade resolution for more graphics detail.

But they misunderstood why they don't respect 4K resolution and it's because of the following reason:
To obtain 100% raster efficiency, one must have a triangle/pixel coverage of 1 triangle per 16 pixels.
This means, resolution be damned, high or low, when you move up to 4K or down, the ratio of triangles to pixels are roughly the same. The only thing that changes is aliasing quality.
But because Fixed Function hardware is still so fast, even with older weaker harder we are capable of doing this, all we need is bandwidth to feed the fixed function units.
It would appear that 32 ROPs on X1X was just sufficient enough.

So you see 6TF isn't a lot of power, the power came from Fixed Function units. The 6TF of compute was augmenting and adding complexity to the FF pipeline.
And when we look at the memory footprint, the textures are still 4K in size however and make up a majority or the streaming bandwidth required and the capacity in memory. With normals, mips etc. And textures take up vastly more footprint than vertex meshes.
When you match mesh with textures the mesh size needs to inflate by nearly 16x the amount.

So the question becomes, was streaming the issue for rendering unlimited detail technology? Or that fact that we didn't' develop the technology to get away from fixed function hardware.

The obvious limitation for no unlimited detail for today is not I/O. Because even with a 100GB/s SSD speed, you'd croak trying to work with subpixel triangles and single pixel triangles on any FF hardware pipeline. The problem is exponential.
That was the limitation, and by moving that limitation over to compute and away from Fixed Function, now you are looking at largely how much compute power you have to draw the screen.

Which means the I/O portion of it, actually worked within the realms of what we have already.
It was the rendering method.

So with 12 GB of memory, a slow 100MB/s drive, 4K textures and a huge buffer period, they still managed a high fidelity 4K game at 60fps. Virtually streamed.
They only thing missing was not the textures, but the triangle meshes. And it's not because FF hardware isn't capable of producing a lot of triangles or culling them, but because it croaks exponentially worse once it drops below a specific triangle/pixel emission output.

Triangle meshes shouldn't inflate the size to 500x more I/O requirement.

But if Gears 5 moved to unlimited detail and added in 4K meshes to match those 4K textures. Removing the mips, lod, normals etc. They should have capacity and more than enough to hold those denser meshes.
But the compute power would not nearly be enough to do 4K anymore. It will be significantly less, like 1080p to 1440p or even worse than that.

Thus I believe the I/O from Sony is largely spent streaming in the 8K textures and 16K shadowmaps.

so 8K textures are 4x more size than 4K textures. 16MB per texture vs 67MB per texture without compression. And that's the whole texture, not the streaming texture from drive.

These SSD drives are capable of a shit ton more and that's why they wanted to showcase it could even handle movie quality assets.
 
Last edited:
Pretty sure the system is conceptually similar to virtual textures. (In most simplest term, guite sure there is lot of things we do not know..)

Find out objects or chucks needed for rendering and the detail they need.
If required detail level is not in memory show lower detail until the data is available.

If this has objects stored in something like virtualize geometry images using mipmaps as lod seems proper thing to do. (Would also save huge amount of memory and simplify streaming.)

I do not believe that they brute force cull whole object data or load tiny pieces of it as sample points race on its surface.

So my guess is intelligently select visible objects, detail levels and rasterizer it.
 
Back
Top