Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

If you use an existing game like Gears 5 to measure the nextgen performance, you would be limited to the current generation game mechanics and game world design. You may have to look at the new games to see what they want to do. Looking at old games, mechanics and pace may give you a limited idea.

The other part of the SSD story is the improvements in developer's workflow. The more consistent the SSD performs, the easier it is to develop for. I am curious why Sony settle on 6 priority levels. Some more details there should help understand the use cases better.
 
@iroboto We don't know how much frame time nanite is taking. Resolution could be scaling more because of the GI solution than the geometry. We really don't know. We also don't know how much data is being read from disk per frame during the most demanding scenes, like the flying scene or some of the larger rooms with longer draw distances where you fit more models within your view. I think it's obviously true that more compute means more performance for nanite and lumen. But there's still a stage where you have to get the data off the disk on the fly (not pre-loading) within the current frame. How much data, how fast etc. We have no idea.
 
Yes brother!
Okay. I think the issue is what's meant by 'drawing' and 'culling' and some assumptions. Your theory assumes one triangle drawn for one pixel, whereas we might have multiple, such as drawing triangles for shadows. Importantly, the wordage used isn't consistent with itself, so we can't read too many details into it. They explicitly say "20 million drawn triangles" - does that mean actually drawn, or does that include undrawn culled triangles? Why would we call culled-triangle drawn? They then say one triangle per pixel and that's clearly not '20 million drawn'. At that point, I'd say this isn't a technical presentation and there's no point trying to read between the lines to understand the engine. Theoretically, the SSD can serve up at most, enough information for 4 million unique triangles per frame. Epic then picked a render target that could make the most of that 4 million triangles, 'drawing' 20 million triangles each frame.

Alternatively, a basic PC NMVe SSD can serve up enough data for 4k one-tri-per-pixel graphics because the tiling is so very efficient, such that you only need a cache of a few GBs and this drive to never miss a tile, and with a powerful enough GPU, you could render the same pixel-level fidelity at highest resolutions and framerates.

I'm just going to repeat there are far too many unknowns to determine what part in this new tech SSDs has to play, and how much of an enabler they are. Or aren't.
 
Okay. I think the issue is what's meant by 'drawing' and 'culling' and some assumptions. Your theory assumes one triangle drawn for one pixel, whereas we might have multiple, such as drawing triangles for shadows. Importantly, the wordage used isn't consistent with itself, so we can't read too many details into it. They explicitly say "20 million drawn triangles" - does that mean actually drawn, or does that include undrawn culled triangles? Why would we call culled-triangle drawn? They then say one triangle per pixel and that's clearly not '20 million drawn'. At that point, I'd say this isn't a technical presentation and there's no point trying to read between the lines to understand the engine. Theoretically, the SSD can serve up at most, enough information for 4 million unique triangles per frame. Epic then picked a render target that could make the most of that 4 million triangles, 'drawing' 20 million triangles each frame.

Alternatively, a basic PC NMVe SSD can serve up enough data for 4k one-tri-per-pixel graphics because the tiling is so very efficient, such that you only need a cache of a few GBs and this drive to never miss a tile, and with a powerful enough GPU, you could render the same pixel-level fidelity at highest resolutions and framerates.

I'm just going to repeat there are far too many unknowns to determine what part in this new tech SSDs has to play, and how much of an enabler they are. Or aren't.
Yes.

This is pretty much where I'm at. I'm just trying to separate what is I/O and what is compute. I think you understand where I'm coming from. When we talk about streaming from I/O, then we need to actually be discussing buffer sizes or how many tiles out can be held in memory before the SSD can no longer keep up with filling them back in. With slower SSDs requiring a larger buffer, and faster ones requiring less. Not resolution, not detail.
 
This is pretty much where I'm at. I'm just trying to separate what is I/O and what is compute. I think you understand where I'm coming from. When we talk about streaming from I/O, then we need to actually be discussing buffer sizes or how many tiles out can be held in memory before the SSD can no longer keep up with filling them back in. With slower SSDs requiring a larger buffer, and faster ones requiring less. Not resolution, not detail.
Yes, except the moment you can't load enough of a cache, you have to reduce the amount of detail. And by detail, I mean asset fidelity and variety, not rendered pixels. Conceptually, just like virtual textures require only a tiny footprint for perfect texture fidelity, virtual geometry might require only a relatively tiny footprint for meshes. And then with enough RAM, caching becomes quite doable without needing insanely fast storage. But if that footprint can't be reduced enough, and the available RAM is limited, a faster storage solution could make up the difference.

However, I'm willing to attribute Epic's mention of the SSD genuine meaning and I think it has played a part, rather than just being kowtowing to Sony for business reasons.
 
Lets not forget the original UE4 announcement in 2012 when real-time GI using sparse Voxel Octree from Cyrill Crassin (who was an intern at nVidia) was supposed to be the next big thing since sliced bread & the only lighting system in the engine..Made for good hype & buzz at the time just like what is happening today...but then got silently ditched before release in favor of LightMass baking & later pre-computed lightprobes.

It even had a Siggraph talk:
https://cdn2.unrealengine.com/Resou...Behind_the_Elemental_Demo_16x9-1248544805.pdf

What we see in UE5 demo may or may not be what we get 18 months down the road. PR is a hell of a thing.

The big difference here being that they were clearly angling and building their tech in 2012 for the consoles to be more powerful than they were. They clearly misfired with the adoption rate of the technologies they were using at the time. Cliff blezinski going to the media and begging both the console makers for 2.4tflops minimum of gpu power, it still rings in my ears.

Here they have a lighting solution more advanced than their gi solution that they couldent get working on console or lower end pcs then, but also have actually worked with the console pubs while creating the engine and the tech inside of it.

Hence why they chose to reveal it with a playable ps5 demo emulating a fake verticle slice of a game running on the hw as opposed to a gussed up pc trailer of a non interactive cutscene and then try and downscale an approximation onto the console like they did with ue4 and "elemental"
 
Yes, except the moment you can't load enough of a cache, you have to reduce the amount of detail. And by detail, I mean asset fidelity and variety, not rendered pixels. Conceptually, just like virtual textures require only a tiny footprint for perfect texture fidelity, virtual geometry might require only a relatively tiny footprint for meshes. And then with enough RAM, caching becomes quite doable without needing insanely fast storage. But if that footprint can't be reduced enough, and the available RAM is limited, a faster storage solution could make up the difference.

However, I'm willing to attribute Epic's mention of the SSD genuine meaning and I think it has played a part, rather than just being kowtowing to Sony for business reasons.
Right, but lets be real here. It's only 5GB/s.
Memory runs at 448/s and 560 GB/s respectively. If you are just in time streaming in textures there is a solid limit there.

Or like.. if you say, in order for us to render, we need 10 GB of space leaving 2.5GB left for textures but we need 7GB of capacity for textures, the drive is fast enough such that the 2.5GB is consumed then replaced by SSD several times over (all of this virtually streamed). I would say that the drive will not last long. these are unrealistic use cases that anyone would just lower model fidelity for since they do not have the compute power to render these anywhere close to native anyway.
 
Last edited:
So I have some thoughts that perhaps people much smarter than I can answer.

1) The switch has access to 3 storage pools. Would requiring a physical only release on a game that also needs x amount of storage from both internal and sd cards be possible ? You wouldn't match the ssd speed of a ps5 or xbox series x but it would still allow you to stream a lot of data quickly into the alliable space on the switch. I believe the switch is 8 gigs of ram but less is available

2)Same with the ps5/ xbox series x\pc. I'm not sure if sony will allow for two ssd's in the console but we do know that MS allows for the internal and external. If both can be accessed at the same time can that be used to stream even more data. Or on the pc if you have two pci-e 3 or even 4 nvme drives will they be able to stream from both drives to increase the fidelity they can stream in?

I am sure the majority of these wont be known until we see more on the engine but its worth asking
 
Right, but lets be real here. It's only 5GB/s.
Memory runs at 448/s and 560 GB/s respectively.
I don't understand the relevance. 1 MB of data can saturate 500 GB/s of bandwidth if you draw enough particles. There's no correlation between data size and bus speed. If you need 4 GB of data to provide enough for pixel perfect density, and that data changes at most 1 GB per second in what needs to be cached, you're still going to need a fast storage solution to provide that no matter what your RAM bus speed is. If it only changes 50 MB per second, you won't. But we've no idea what the data requirements are.
 
I don't understand the relevance. 1 MB of data can saturate 500 GB/s of bandwidth if you draw enough particles. There's no correlation between data size and bus speed. If you need 4 GB of data to provide enough for pixel perfect density, and that data changes at most 1 GB per second in what needs to be cached, you're still going to need a fast storage solution to provide that no matter what your RAM bus speed is. If it only changes 50 MB per second, you won't. But we've no idea what the data requirements are.
I'm just iterating that the majority of your should be in memory not sitting on a disk, that's how you pull out effective bandwidth and computation. All of it should be there for processing, not coming off the drive.

I think it's just marketing. You need speed and a lot of it, but not 5.5 GB/s speed - at least not until you have the compute and capacity to do more.
 
I'm just iterating that the majority of your should be in memory not sitting on a disk, that's how you pull out effective bandwidth and computation. All of it should be there for processing, not coming off the drive.

I think it's just marketing.

Cerny's statement was to have data needed for next 1s being cached in ram. i.e. streaming will be pretty highly valued if cerny's vision comes through. I think it is great goal to have as that would allow unique content everywhere. It's whole another topic if that is reached and/or if anyone has budget to create that much unique content. Layman way of thinking could be that openworld could be as detailed as piperuns thanks to being able to stream very high level assets on demand.
 
I remember this happening:

With "the R word" London-boy meant REYES, of course. And then:

That was funny as hell.
Thanks for the memory lane - good times!
In the Road to PS5 presentation, Mark Cerny talked about how "fast and narrow" was "the tide that raises all boats". But with compute based rasterization, wouldn't that leave fewer boats to raise? The hardware rasteriser, RBs, tessellation unit and to some extent the L1 (unlike the L0 and L2 it's described repeatedly as a "Graphics L1" in the RDNA whitepaper) would surely be of relatively lesser importance for this approach....?
There are a lot of variables there to get an image rendered optimally. If any one of them is the bottleneck, wouldn't you alleviate that by raising the clocks? For REYES, wouldn't you want your geometry engine to be as fast as possible, for example?
They explicitly say "20 million drawn triangles" - does that mean actually drawn, or does that include undrawn culled triangles? Why would we call culled-triangle drawn? They then say one triangle per pixel and that's clearly not '20 million drawn'.
I don't think drawn triangles and drawn pixels mean the same in REYES. If you have 20 million triangles left after culling, you might have a bunch that are sub-pixel sized and cover multiple pixels, so after you reject those because you can't differentiate them anyway, you are left with your drawn number of pixels, which is a lower number.
 
If any one of them is the bottleneck, wouldn't you alleviate that by raising the clocks? For REYES, wouldn't you want your geometry engine to be as fast as possible, for example?.

If I'm following correctly ... For UE5 the geometry work is done in the CUs using software rendering and not using any of the fixed-function hardware, so wider is easier way to scale capacity.
 
Cerny's statement was to have data needed for next 1s being cached in ram. i.e. streaming will be pretty highly valued if cerny's vision comes through. I think it is great goal to have as that would allow unique content everywhere. It's whole another topic if that is reached and/or if anyone has budget to create that much unique content. Layman way of thinking could be that openworld could be as detailed as piperuns thanks to being able to stream very high level assets on demand.
That's fine, as long as it's all there and not being used within frame/realtime. That's what their SSD is capable of. If their target is 1s cached in RAM, other's will need to do 2s. If you want to make the case of, what if it doesn't fit. Then they would have said that PC and Xbox cannot do it because there isn't enough VRAM available to do it. But they said it would be fine with a nvme drive. I don't know what else to say.

I'm not saying I don't get the need for fast I/O.

I'm just pushing back on the whole 5.5 GB/s is the enabler for all of this.
It's not. They would have said so.

And if Epic was allowed to demo this on XSX or on high end PC, I'm willing to donation bet they would outperform PS5 in both resolution and detail.
 
2)Same with the ps5/ xbox series x\pc. I'm not sure if sony will allow for two ssd's in the console but we do know that MS allows for the internal and external. If both can be accessed at the same time can that be used to stream even more data. Or on the pc if you have two pci-e 3 or even 4 nvme drives will they be able to stream from both drives to increase the fidelity they can stream in?

PC is not in the same league, a lot of PC cames with 32-64-128 GB RAM, and other 6-8-11 GB of GDDR, so the developers can store the whole level in ram, why should him do aggregate nvme drives streaming?

XSX and PS5 have only 16 GB for both CPU/GPU, both needs streaming from storage to process assets, PC can "stream" from main memory, at warp speed, while single SSD feeds main memory
 
I'm just iterating that the majority of your should be in memory not sitting on a disk, that's how you pull out effective bandwidth and computation. All of it should be there for processing, not coming off the drive.
Yes, the data has to be present when needed for processing .You can't stream mid-frame, as you say. The issue here is how much data you need to fetch for the following second of game. Going with Epic's numbers, there's one billion triangles at any point. That's potentially one billion new triangles in one second if you change look entirely in one second. That's many GBs of data to replace over that one second.

If their target is 1s cached in RAM, other's will need to do 2s.
Yep. And if that one second is 10 GBs, other platforms will need 20 GBs of cache.

And if Epic was allowed to demo this on XSX or on high end PC, I'm willing to donation bet they would outperform PS5 in both resolution and detail.
Resolution, yes. Detail, not necessarily. That's not a given. It might be the case, but it's not a given. Using a hypothetical 10 GBs cache, this theoretical XBSX could only store half as much detail if the SSD is half the speed as it has no additional RAM over PS5.
 
If I'm following correctly ... For UE5 the geometry work is done in the CUs using software rendering and not using any of the fixed-function hardware, so wider is easier way to scale capacity.
I don't consider the geometry/ primitive block to be fixed function - it is programmable after all. Epic saying they are using CUs for Nanite is clear, but I find it hard to believe they would leave the geometry engine idle in an RDNA2 GPU.
 
PC is not in the same league, a lot of PC cames with 32-64-128 GB RAM, and other 6-8-11 GB of GDDR, so the developers can store the whole level in ram, why should him do aggregate nvme drives streaming?

XSX and PS5 have only 16 GB for both CPU/GPU, both needs streaming from storage to process assets, PC can "stream" from main memory, at warp speed, while single SSD feeds main memory
sadly I don't know of any pc game that loads up all my main system ram and I only have 32 gigs. It be nice if pc devs started targeting all that ram. I've said it before i would go out and put 64 gigs in my machine if it meant much higher detail
 
@Shifty Geezer @iroboto There are limits to how many MB/frame each drive can handle The numbers are not particularly high when you break it down that way. That said, PS5 can probably handle 2x the data per frame. We're really going to have to see what their caching strategy is and how hard that demo was pushing the PS5 ssd. One thing we don't know is how Nanite handles things if an asset is loaded late. It would have to handle that situation because it'll run on PCs with slower ssds etc. We don't know how virtual geometry streamed and cached vs virtual texture streaming and caching. There are just too many things we don't know.

They did say they were streaming in geometry as the camera moved, which means they may actually be streaming in some data based on the view frustrum, which is something I doubted, but sounds a lot more plausible with virtual geometry.
 
Resolution, yes. Detail, not necessarily. That's not a given. It might be the case, but it's not a given. .

But Nanite tech makes details (triangles/texel) linearly proportional to the number of pixel, so I think that more compute power means more pixel on screen -=> more detail
how much? only Epic guys knows this

as long as XSX have 2-3 TFlops advantage (12 vs 9-10 variable, I suppose), this could mean 20/30% more compute power -=> 20/30% more pixel can be processed, am I wrong?
I hope the will use the insersection power of the CU units to speed up the tracing, at least the screen-space tracing fo details / mid distance objects
 
Back
Top