Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Scott_Arm · May 15, 2020

j^aws said:
I don't consider the geometry/ primitive block to be fixed function - it is programmable after all. Epic saying they are using CUs for Nanite is clear, but I find it hard to believe they would leave the geometry engine idle in an RDNA2 GPU.

They are doing exactly that. They only fall back to primitive shaders when it would be faster, probably when a triangle has coverage of multiple pixels, which should be rare in this system. For 1 poly per pixel they say they have two software rasterizers that run in compute that basically massively outperform the gpu hardware.

I think PS5 has 4 (?) shader arrays, each with a primitive unit and a raster unit. The typical raster pipeline scales well to clock speed because this is a very narrow path. The UE5 solution ignores this hardware and does software rasterization which means it scales to CUs, so it should scale to overall TFLOPS.

j^aws · May 15, 2020

iroboto said:
I'm just iterating that the majority of your should be in memory not sitting on a disk, that's how you pull out effective bandwidth and computation. All of it should be there for processing, not coming off the drive.

I think it's just marketing. You need speed and a lot of it, but not 5.5 GB/s speed - at least not until you have the compute and capacity to do more.

From PS1 to PS5, if you follow the RAM upgrades for each console, we would expect 64-128 GBs of RAM in PS5. The SSD is clearly there to mitigate only providing 16 GBs in PS5. It's an engineering compromise.

iroboto · May 15, 2020

j^aws said:
I don't consider the geometry/ primitive block to be fixed function - it is programmable after all. Epic saying they are using CUs for Nanite is clear, but I find it hard to believe they would leave the geometry engine idle in an RDNA2 GPU.

Mesh shaders are being used on the hardware that supports it.
Compute shaders being used where it's not.
Standard raster hardware in the rare moments that it will run faster than compute.

BRiT · May 15, 2020

j^aws said:
I don't consider the geometry/ primitive block to be fixed function - it is programmable after all. Epic saying they are using CUs for Nanite is clear, but I find it hard to believe they would leave the geometry engine idle in an RDNA2 GPU.

They go with whatever is faster and Epic indicated it was using software rendering since it's faster.

iroboto · May 15, 2020

Scott_Arm said:
@Shifty Geezer @iroboto There are limits to how many MB/frame each drive can handle The numbers are not particularly high when you break it down that way. That said, PS5 can probably handle 2x the data per frame. We're really going to have to see what their caching strategy is and how hard that demo was pushing the PS5 ssd. One thing we don't know is how Nanite handles things if an asset is loaded late. It would have to handle that situation because it'll run on PCs with slower ssds etc. We don't know how virtual geometry streamed and cached vs virtual texture streaming and caching. There are just too many things we don't know.

They did say they were streaming in geometry as the camera moved, which means they may actually be streaming in some data based on the view frustum, which is something I doubted, but sounds a lot more plausible with virtual geometry.

I suspect that they are streaming in objects outside the view frustum. but it's still a frustum itself, just outside of what you can see to remove any chance at pop-up.

iroboto · May 15, 2020

j^aws said:
From PS1 to PS5, if you follow the RAM upgrades for each console, we would expect 64-128 GBs of RAM in PS5. The SSD is clearly there to mitigate only providing 16 GBs in PS5. It's an engineering compromise.

what? I/O is the bottleneck it solves the issue with needing to take forever to load items into 64-128MB of memory.
And then you can stream faster so you need less VRAM capacity.

For rendering, bandwidth is the next bottleneck as the rate of each generation, ALU greatly outpace the speed of bandwidth.

Scott_Arm · May 15, 2020

John Norum said:
But Nanite tech makes details (triangles/texel) linearly proportional to the number of pixel, so I think that more compute power means more pixel on screen -=> more detail
how much? only Epic guys knows this

as long as XSX have 2-3 TFlops advantage (12 vs 9-10 variable, I suppose), this could mean 20/30% more compute power -=> 20/30% more pixel can be processed, am I wrong?
I hope the will use the insersection power of the CU units to speed up the tracing, at least the screen-space tracing fo details / mid distance objects

Yes, that should be correct. Take the flying segment as an example for a place where you might be limited by the performance of the SSD. If they're pushing the limits of the PS5 ssd, how do you scale down to a slower drive? Do you fly half as fast? Are you more aggressive with dynamic lod so you load in less geometry? Do you load in textures late and blend from lower mips? Do you just use 4k textures instead of 8k textures? Are there other ways of scaling? That's assuming this segment is really pushing the PS5 ssd hard.

John Norum · May 15, 2020

Scott_Arm said:
Yes, that should be correct. Take the flying segment as an example for a place where you might be limited by the performance of the SSD. If they're pushing the limits of the PS5 ssd, how do you scale down to a slower drive? Do you fly half as fast? Are you more aggressive with dynamic lod so you load in less geometry? Do you load in textures late and blend from lower mips? Do you just use 4k textures instead of 8k textures? Are there other ways of scaling? That's assuming this segment is really pushing the PS5 ssd hard.

yes this is where they push hard the PS5 SSD, but the sequence is so fast with a lot of motion blur, did we have the time to spot fine geometry detail?

iroboto · May 15, 2020

John Norum said:
yes this is where they push hard the PS5 SSD, but the sequence is so fast with a lot of motion blur, did we have the time to spot fine geometry detail?

There's only 1 model with no LOD or normal maps. So you're not seeing it because they aren't computing it, not because the assets aren't there. The assets that need to be there are there, and they are scaled down quickly by the engine.

j^aws · May 15, 2020

iroboto said:
what? I/O is the bottleneck it solves the issue with needing to take forever to load items into 64-128MB of memory.
And then you can stream faster so you need less VRAM capacity.

For rendering, bandwidth is the next bottleneck as the rate of each generation, ALU greatly outpace the speed of bandwidth.

I should've specifically quoted this bit which was what my reply was relating to:

"I'm just iterating that the majority of your should be in memory not sitting on a disk, "

The SSD is mitigating this because we are not getting 64-128 GBs of RAM. If we did, we wouldn't need a 5.5 GB/s SSD, even 1 GB/s would suffice.

Xbat · May 15, 2020

j^aws said:
I should've specifically quoted this bit which was what my reply was relating to:

"I'm just iterating that the majority of your should be in memory not sitting on a disk, "

The SSD is mitigating this because we are not getting 64-128 GBs of RAM. If we did, we wouldn't need a 5.5 GB/s SSD, even 1 GB/s would suffice.

Yup so in a way over simplified manner the PS5 has more usable RAM.

John Norum · May 15, 2020

iroboto said:
There's only 1 model with no LOD or normal maps. So you're not seeing it because they aren't computing it, not because the assets aren't there. The assets that need to be there are there, and they are scaled down quickly by the engine.

ok, but in this sequence you can use pre-underscaled, low poly, assets, because you don't need to show details
Epic says that when the tech is not useful, they will use more classic, rasterized, rendering. I think this is the case, no details needed, fast sequence with no stops, a lot of motion blur all around, if stops happens, the you don't need the fastest nvme anymore

manux · May 15, 2020

iroboto said:
That's fine, as long as it's all there and not being used within frame/realtime. That's what their SSD is capable of. If their target is 1s cached in RAM, other's will need to do 2s. If you want to make the case of, what if it doesn't fit. Then they would have said that PC and Xbox cannot do it because there isn't enough VRAM available to do it. But they said it would be fine with a nvme drive. I don't know what else to say.

I'm not saying I don't get the need for fast I/O.

I'm just pushing back on the whole 5.5 GB/s is the enabler for all of this.
It's not. They would have said so.

And if Epic was allowed to demo this on XSX or on high end PC, I'm willing to donation bet they would outperform PS5 in both resolution and detail.

I don't think anyone sensible expects unreal5 engine to require ssd speed of 5.5GB/s and so much has been said many times on this thread. Technology would be scalable but specific asset/use case could require 5.5GB/s speed. Surely epic has some smart LOD implementation to scale the streaming to also work on much lower level hw. Unreal is inherently cross platform and I don't think they would hardcode anything in their engine to require ps5.

Maybe in unreal5 there will be streaming texture&geometry settings that are mostly dictated by ssd speed. And perhaps on some cases user ram can be used as cache where the first use of asset causes pop in and then it comes from ram super fast after that. Consoles have pretty limited amount of ram but some crazy pc people might have 64GB or more main ram.

iroboto · May 15, 2020

j^aws said:
I should've specifically quoted this bit which was what my reply was relating to:

"I'm just iterating that the majority of your should be in memory not sitting on a disk, "

The SSD is mitigating this because we are not getting 64-128 GBs of RAM. If we did, we wouldn't need a 5.5 GB/s SSD, even 1 GB/s would suffice.

If the goal is to increase your VRAM capacity without needing to add more VRAM, improving your I/O is one method. The other method is to have less overhead of what you need to store in memory.
Just because things are stored in memory, doesn't mean it gets used. That's the challenge of real time gaming vs a tech demo. A tech demo, everything gets used. A game, a lot of times you're loading stuff a lot of stuff you may not need. If you can reduce the amount of information to exact to just what you need, you can have significantly more savings on capacity.

I/O speed is great too, but if you're doing things the old way and loading in 67Mb texture (8K) and only use 1Mb of it on screen, and the competing system is just loading the 1MB of that 67MB texture, the amount of I/O won't make up the difference in both capacity or in how much you can send over the line. You wasted bandwidth and capacity in doing it in this way.

If you want more capacity the best way is to reduce the overhead in memory. If you can succeed in wins there, I/O speed is the next best thing afterward.

chris1515 · May 15, 2020

iroboto said:
There's only 1 model with no LOD or normal maps. So you're not seeing it because they aren't computing it, not because the assets aren't there. The assets that need to be there are there, and they are scaled down quickly by the engine.

There is other model with only LOD0.

This french indie dev guy download Quixel LOD 0 cinematic quality assets and it is more than 2 millions triangle. They use some sort of REYES rendering like system static object are composed of micropolygon and it is easy to see when they use the polygon view. static object are all like this- 1only vegetation and animated chracter aren't using this technology and it seems this is possible to use it for animated character but it will arrive later.

Interesting

https://twitter.com/x/status/1261380700973879296

iroboto · May 15, 2020

chris1515 said:
There is other model with only LOD0.

This french indie dev guy download Quixel LOD 0 cinematic quality assets and it is more than 2 millions triangle. They use some sort of REYES rendering like system static object are composed of micropolygon and it is easy to see when they use the polygon view. static object are all like this- 1only vegetation and animated chracter aren't using this technology and it seems this is possible to use it for animated character but it will arrive later.

LOD0 is the full asset. Epic presented that you only needed LOD0. The engine would do the rest.

John Norum · May 15, 2020

manux said:
I don't think anyone sensible expects unreal5 engine to require ssd speed of 5.5GB/s and so much has been said many times on this thread. Technology would be scalable but specific asset/use case could require 5.5GB/s speed. Surely epic has some smart LOD implementation to scale the streaming to also work on much lower level hw. Unreal is inherently cross platform and I don't think they would hardcode anything in their engine to require ps5.

Maybe in unreal5 there will be streaming texture&geometry settings that are mostly dictated by ssd speed. And perhaps on some cases user machine cache in ram where the first use of asset causes pop in and then it comes from ram super fast after that. Consoles have pretty limited amount of ram but some crazy pc people might have 64GB or more main ram.

from what I've understood fast streaming are used to load/drop/reload/redrop assets, sometimes (or often) the same assets to and from a 10-12 GB gpu-pool, with 6-4 GB CPU-pool that can't be used as "cache" for gpu loading, so they have to rely on fast nvme or velocity tech, but an high end pc have 64-128 GB of main memory to store geometry assets, so I believe that they will take different solution, than the one used for consoles.
Of course the pc solution will be scaled down for pc with 32-16 GB RAM + 6-8-11 GB GRAM, but why Epic schould use the same console approach when PC have not the ram-bottleneck of the console?

Scott_Arm · May 15, 2020

@chris1515 @iroboto You import the lod0 model straight out of zbrush, or out of the quixel library, but I imagine the engine transforms that data into whatever their virtual geometry format is on import. That way the data is in a form that nanite can selectively load pieces of it, dynamically adjust the lod on the fly. And it sounds like it has all of the tools to reduce that content down to make it suitable for mobile phones etc.

j^aws · May 15, 2020

iroboto said:
If you want more capacity the best way is to reduce the overhead in memory. If you can succeed in wins there, I/O speed is the next best thing afterward.

Great, so what I can see with:

- cache scrubbers
- coherencey engines

We are minimising expensive cache flushes, and have transparent access to virtual RAM, as much as 800 GBs (not needed), so we have access to data when we need it. A great engineering solution, and not some marketing gimmick.

chris1515 · May 15, 2020

Scott_Arm said:
@chris1515 @iroboto You import the lod0 model straight out of zbrush, or out of the quixel library, but I imagine the engine transforms that data into whatever their virtual geometry format is on import. That way the data is in a form that nanite can selectively load pieces of it, dynamically adjust the lod on the fly. And it sounds like it has all of the tools to reduce that content down to make it suitable for mobile phones etc.

They are adjusting the lod to decimate subpixel triangle doing probably software quad merging. This is the same for all assets but at the end it is very efficient and we never see missing details like with normal maps and great for the workflow.

Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Scott_Arm

j^aws

iroboto

Daft Funk

BRiT

(>• •)>⌐■-■ (⌐■-■)

iroboto

Daft Funk

iroboto

Daft Funk

Scott_Arm

John Norum

iroboto

Daft Funk

j^aws

Xbat

John Norum

manux

iroboto

Daft Funk

chris1515

iroboto

Daft Funk

John Norum

Scott_Arm

j^aws

chris1515

Similar threads