Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Nobody said devs were lazy. You program for the hardware that's common...

System RAM is vastly faster in both transfer rates and latency than PS5's SSD... So Cerny can be right that there ARE bottlenecks (because of course there are) but it can ALSO be right that games simply aren't being designed atm to take advantage of the fastest NVMe drives and most certainly not RAMdisks.
Games are not designed to take advantage even of average speed nvmes, it's as if nvmes do not exist. They load no faster than sata ssd.

Even pc exclusives, like star citizen, that are designed taking in mind nvidia titans and ssds in their recommendations, do seem to ignore nvmes too... how strange. Or maybe it is not that they are ignoring but that they can't do anything about the bottlenecks.

Laziness or an inability to realize a benefit from current PC hardware are the only two possibilities that come to your mind? You don't think current-gen consoles and many PCs not having SSDs has had an effect on developers choice to not push the envelope further on the use of streaming in their engines up to now?

There are games that have an ssd speed up substantially faster than any other game benefits from ssd. So they are taking advantage of ssds, and have optimized heavily for it, which is why they substantially exceed all other games in the market in terms of loading. Yet even these games load no faster in nvmes than in sata ssds.
 
The stuff I've seen on mesh compression that's effective is also slow.

Here's a look at Google's Draco: https://medium.com/box-developer-bl...ssion-at-scale-draco-vs-open3dgc-c9618b7d64d8

A 5 million vertex statue, 800 MB source down to only 14 MB, but on a 2.8 GHz i7 MacBook took 12.7s to decode.

1*J_5CqmM8fsq8dG6n_PlPmw.png


GPU compute fancyness may well get much faster, but they're still using quantization. It'd be quite the feat if Epic has both high compression and fast random-access reads. I think it more likely data is just huge and read quickly (hence the need for SSDs). Even drive-compression like Kraken might screw with the virtualised data and be unusable?
 
There are games that have an ssd speed up substantially faster than any other game benefits from ssd. So they are taking advantage of ssds, and have optimized heavily for it, which is why they substantially exceed all other games in the market in terms of loading. Yet even these games load no faster in nvmes than in sata ssds.

Loading faster is not an indication that a game is "heavily optimized for SSDs". A developer could do nothing to optimize for SSD and if I/O is their bottleneck they would see a more significant speedup from an SSD than a game that is less I/O bound.
 
Imagine simple case where we divide mesh to blocks. Top left block would have index 0,0 and top right would be 256,256.
yes I was gonna say something similar most/all the vertices are gonna be very close to each other. if you are storing x,y,z as floats then youre doing it wrong.
a 3d grid is good, though I suppose you could just use a direction & distance

heres an old paper about it
http://mcl.usc.edu/wp-content/uploa...-3D-triangular-mesh-compression-a-survey1.pdf
 
There's a reason why Sony states 8-9 GB/s.

And even then, its most likely 5.5gb/s sustained/minimal, 2.4gb/s for xsx.

That's my expectation, at least until a significant portion of the PC hardware ecosystem is DirectStorage-capable.

So, main ddr4/5 ram is then acting as a ’extreme fast ssd’ in gaming pc’s, besides the 12 to 16gb vram gpus.
As opposed to consoles which only have the gddr ram.

You’d still have to load from nvme ssd to ram though, maybe preloading? No doubt atleast for compression there will be some hw block somewhere, be it gpu, on the ssd etc, if thats needed.
No idea what MS has planned.

Atleast, ddr ram as a ssd seems very fast, 25gb/s and much faster, with the latencys that it brings. Can imagine ram doesnt get as hot either?
But like said, it needs to be loaded from the ssd also. SSD on gpu could be interesting too.

Would be nice if ms/epic could share their findings, as in the technical details.
Consoles certainly push things in the right direction.
 
The stuff I've seen on mesh compression that's effective is also slow.

Here's a look at Google's Draco: https://medium.com/box-developer-bl...ssion-at-scale-draco-vs-open3dgc-c9618b7d64d8

A 5 million vertex statue, 800 MB source down to only 14 MB, but on a 2.8 GHz i7 MacBook took 12.7s to decode.

1*J_5CqmM8fsq8dG6n_PlPmw.png


GPU compute fancyness may well get much faster, but they're still using quantization. It'd be quite the feat if Epic has both high compression and fast random-access reads. I think it more likely data is just huge and read quickly (hence the need for SSDs). Even drive-compression like Kraken might screw with the virtualised data and be unusable?

Completely agree on generic case. Why I think epic can do better is that they have different constraint and can tightly control how data is packed instead of taking arbitrary input. Epic would probably store data in some kind of streamable blocks and the geometry would be very dense instead of being arbitrarily parse. Streamable blocks could also be something pretty crazy like this:


http://hhoppe.com/gim.pdf
 
I'm assuming some tree type structure in memory and in storage. Texture representations couldn't use lossy compression or you'd start addressing the wrong tiles! Quite possibly they're using Maths way beyond my comprehension as well. We try to visualise these things in structures we are comfortable with, but the best maths represents them with Gobbledegook, like SDFs. At which point I just have to sit back and admire. ;)
 
The stuff I've seen on mesh compression that's effective is also slow.

Here's a look at Google's Draco: https://medium.com/box-developer-bl...ssion-at-scale-draco-vs-open3dgc-c9618b7d64d8

A 5 million vertex statue, 800 MB source down to only 14 MB, but on a 2.8 GHz i7 MacBook took 12.7s to decode.

1*J_5CqmM8fsq8dG6n_PlPmw.png


GPU compute fancyness may well get much faster, but they're still using quantization. It'd be quite the feat if Epic has both high compression and fast random-access reads. I think it more likely data is just huge and read quickly (hence the need for SSDs). Even drive-compression like Kraken might screw with the virtualised data and be unusable?

Comments I've read suggests kraken compression might be similar to draco but much faster. I assume it is usable for arbitrary data on ps5 and won't interfere with anything.


edit:
kraken is very similar to lzma in compression capability.
http://www.radgametools.com/images/oodle_typical_vbar.png

LZMA made 2x smaller files that gzip on standard polymesh data streams
- no joke, I tried a number of representative files and LZMA was just
awesome. The main issue with LZMA is that it is slow to decompress
compared to GZIP - quite a bit slower. .https://groups.google.com/forum/#!topic/alembic-discussion/r3GsgbI8-o0
lzma compared to gzip

https://blog.umbra3d.com/blog/umbra-mesh-compression-overview
draco compared to gzip
 
Last edited:
Comments I've read suggests kraken compression might be similar to draco but much faster. I assume it is usable for arbitrary data on ps5 and won't interfere with anything.


edit:
kraken is very similar to lzma in compression capability.

lzma compared to gzip

https://blog.umbra3d.com/blog/umbra-mesh-compression-overview
draco compared to gzip

You could try getting the sdk from oodle. Sony should be same algorithm, just implemented in hw. Would be interesting to hear from someone who has the sdk and is willing to spill the beans how good/bad kraken is for different things.

http://www.radgametools.com/oodlekraken.htm

There is some stuff here: http://cbloomrants.blogspot.com/2018/03/oodle-data-compression-integration-for.html

Here's another example on a large pak from Fortnite :
FortniteGame-WindowsClient.pak

uncompressed : 14,150,876,787 bytes
zlib 9 : 6,601,033,750 bytes
Oodle Leviathan : 5,461,767,960 bytes

The special thing about Oodle Leviathan is that it gets very high compression levels while still being super fast to decode. Kraken, Mermaid and Selkie are even faster and give you options for different platforms and performance targets.
 
Kraken's performance on 3D data doesn't necessarily equate to what Epic can do, because their data needs to be tiled (what's the three-dimensional variant?!) in some capacity. I'm not sure it'll shed light in this particular thread. Its performance would be definitely suited to loading whole models in a classic engine.
 
Loading faster is not an indication that a game is "heavily optimized for SSDs". A developer could do nothing to optimize for SSD and if I/O is their bottleneck they would see a more significant speedup from an SSD than a game that is less I/O bound.

You could try getting the sdk from oodle. Sony should be same algorithm, just implemented in hw. Would be interesting to hear from someone who has the sdk and is willing to spill the beans how good/bad kraken is for different things.

http://www.radgametools.com/oodlekraken.htm

There is some stuff here: http://cbloomrants.blogspot.com/2018/03/oodle-data-compression-integration-for.html

We should see, if we assume it is similar to lzma guestimating based on the comparison graphs it should do something on the order of a 1:3 compression. That would put effective bandwidth for geometry alone at 16.5GB/s. Though again that would be guestimating based on what the graphs show and needs to be confirmed
 
Well for the discussion we can assume that compression won’t greatly improve over the hardware supported methods on offer. So this begs the question - how much better graphics will we get and how much more varied graphics will we get? From the looks of things we got the detail covered but the assets are going to be a bottleneck. Everything that Uncharted taught (at least me) about clever combinations of assets and which Dreams also shows very clearly is going to hold true. So in a sense this demos many repetitions of huge poly statues highlights this too - we can render a tonne now but where we are now no longer bottlenecked by rendering speed and RAM we are more bottlenecked by space on disc. I expect the big games to be at least 200GB. The net benefit will be a bit higher thanks to some deduplication. I suspect we will see some innovations like perhaps much more here is asset x and here are the coordinates where to render them and with what deformations much I guess Dreams does.

I don’t know how the UE5 demo here plays into that or what the LOD0 stuff breaks down to. Does it work similarly?
 
I understand streaming data fast enough in a corridor game with full detail assets. But how did they manage the view reaching from the canyon to the horizon? Isn’t there just too much assets to fit in memory if there is literally no LOD ?
I mean, is the gate seen at the horizon a fully detailed asset that is crunched/minced in real time for each frame? Or is it a lower lod asset up to a certain distance then loaded to LOD0 when you get closer?
 
If next-gen ends up with repetitive objects in insane detail instead of huge variety of assets in reasonable detail, I'll be very disappointed! I think random access read speed could be the differentiator for streaming. For virtualised textures, you can have a separate texture for a 1000 objects on screen so long as you can access each texture from storage quickly enough. Wouldn't require a lot of transfer (63 MB/s from my prior BOTE calculations) but would be impossible on HDD due to seek overheads.
 
I understand streaming data fast enough in a corridor game with full detail assets. But how did they manage the view reaching from the canyon to the horizon? Isn’t there just too much assets to fit in memory if there is literally no LOD ?
The key point here is the virtualisation of the geometry. This is the whole reason behind Nanite and what the showcase was really about, and how Epic are waving in front of the professional creators this idea of no more geometry bottlenecks.

To render a mesh, not all the triangles are loaded. Half the triangles are on the other side of the object for starter. Then if you have an object in front of the statue, only the unoccluded triangles need be present to draw.

It's just occured to me that the data isn't 3D but 2D, a surface. If you UV unwrap it, you get a texture with each vertex a part of that texture. If you were to store the position of the vertex in that 'texture', you could use virtual textures as a way to access the data, maybe. It's odd thinking like that which gave us GPGPU and then compute and, quite frankly, drives the evolution of hardware and software in parallel.

Edit: That's probably what Manux's pdf is about. ;)
Edit: Yup!
 
This needs more attention. This image shows what's happening very clearly...

upload_2020-5-18_18-16-42.png

The moment the data is arranged this way, we can see how virtualised textures would also apply conceptually to the geometry in a 2D array, along with how compression can change from having to crunch 3D data. You don't need to load the whole texture to show the model, but only the pieces of it that are viewable, which is the same problem as picking which texture tiles with virtual texturing.

Very clever stuff.
 
Geometry images was part of my plan, the seamless UVs would have allowed displacement mapping on any surface, which was not resolved yet from GIM research, but addressed with quadrangualtion which is difficult.

After seeing the UE5 demo, i realize i was on the wrong track. What they have is 'simpler' and just works. Pretty sure they don't use GIM, and virtual texturing, storage and compression are different topics.
 
Comments I've read suggests kraken compression might be similar to draco but much faster. I assume it is usable for arbitrary data on ps5 and won't interfere with anything.


edit:
kraken is very similar to lzma in compression capability.
http://www.radgametools.com/images/oodle_typical_vbar.png


lzma compared to gzip

https://blog.umbra3d.com/blog/umbra-mesh-compression-overview
draco compared to gzip
What i take from there is that kraken decompression speed is way faster than zlib. I dont know how many cores are used in those tests but i supposse is a very parallel friendly task and thats why Sony's claims of 9 zen 2 cores performance for their Kraken decompressor.
 
The big difference with REYES pipeline is you create micropolygons 1:1 with pixels. This is a 2D array, much like a texture.

I mentioned a bit earlier:

If texture is to texel, what is geometry? It's a micropolygon, and in a texturised format, you can call this a 'microcell' for want of a better word.

You can then manipulate and compress data like you would a texture.

And shade geometry like you would with textures using geometry textures.

The special normal map Epic mentioned for your object gets shaded with geometry textures is my understanding.
 
Back
Top