Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

That text however focusses basically on hardware compression support and not necessarily on reduced latency although this may come eventually of course.

I wouldn’t be surprised if we start seeing SSD connectors on GPUs again though in the near future as that could solve most of this for PC. [emoji16]


GPUDirect-Fig-2_new.png

GPUDirect-Fig-5.png






Just as GPUDirect RDMA (Remote Direct Memory Address) improved bandwidth and latency when moving data directly between a network interface card (NIC) and GPU memory, a new technology called GPUDirect Storage enables a direct data path between local or remote storage, like NVMe or NVMe over Fabric (NVMe-oF), and GPU memory.
The bandwidth from SysMem, from many local drives and from many NICs can be combined to achieve an upper bandwidth limit of nearly 200 GB/s in a DGX-2.
We demonstrate that direct memory access from storage to GPU relieves the CPU I/O bottleneck and enables increased I/O bandwidth and capacity. Further, we provide initial performance metrics presented at GTC19 in San Jose, based on the RAPIDS project’s GPU-accelerated CSV reader housed within the cuDF library. Lastly, we will provide suggestions on key applications that can make use of faster and increased bandwidth, lower latency, and increased capacity between storage and GPUs



https://devblogs.nvidia.com/gpudirect-storage/
 
Not exactly, even 6 months ago there were solutions that are way faster than PS5. Gigabyte launched an SSD that is capable of reaching 15GB/s, it's called AORUS Gen4 AIC SSD 8 TB.
Broadly speaking. The point is PS5's SSD is fast and Sweeney made a note of it. It's also worth noting that PS5's solution includes optimised OS access and intrinsic compression, so it's performance goes beyond the basic 5 GB/s rate and may or may not be faster than high-end PC SSD. That's not a meaningful contribution to this discussion though; the fact that technically there may be faster solutions on workstation class hardware (which is inevitable in any comparison; 'PC' will always have a superior solution in some $10,000+ professional workstation configuration) is neither here nor there.
 
All those drives have advanced thermal solutions with big fans, copper heatsink. I wonder how could Sony cooling down its SSD in a little warm box

I'd assume that with 12 Channels, the chips themselves will be running a good bit slower than they otherwise would.
 
I really doubt it's as simple as many of us are suggesting.
From my perspective it is simple, no matter how it works exactly. But this is no disrespect against Epic - it is more a commitment of probable failure on my side, so the opposite.
Just to make clear calling things simple depends on context, and mostly simple solutions are good solutions.
 
What laptop was it? Is that confirmed?
not only what laptop but what settings.

If patsu is still reading this thread, (I don't recall if he mentioned), did you manage to hear if he mentioned the resolution in the laptop version? Some sites are saying it was 1440p, others are saying he never said.
Yes, this is a real-life problem
if the game requires the SSD installation, next gen consoles needs to install at least a max of 8-10 games in the drive, with 800-1000 GB drives, this means around 100GB (already compressed) each.
Some games that are not using LOD0 solutions, as COD, needs already 200GB on storage, so this will become standard as the time will pass

having hundreds GB of geometry assets doesn't help for sure
maybe specialized compression+LOD1-2 can help

COD probably has a lot of redundancy and might have 4k fmvs.

Downloadable uncharted 4 npcs are about 100MB, some say the ue5 statue after compression is 200MB. If true that is in the same ballpark
Not exactly, even 6 months ago there were solutions that are way faster than PS5. Gigabyte launched an SSD that is capable of reaching 15GB/s, it's called AORUS Gen4 AIC SSD 8 TB.
https://www.amazon.com/Gigabyte-Performance-Advanced-Solution-GP-ASACNE6800TTTDA/dp/B081BSF14V

Other less exotic solutions include the Viper VP4100, Corsair MP600, and AORUS NVMe Gen 4, all of which already provide 5GB/s speeds.
The ps5 can reach peak 22GB/s with compression not to mention the ps5's IO hardware allows faster access.

depending on the DDR:

DDR4 2133:17 GB/s raw
DDR4 2400:19.2 GB/s raw
DDR4 2666:21.3 GB/s raw
DDR4 3200:25.6 GB/s raw

Ps5 SSD have similar trasnfert rate with DDR2 667:5.3 GB/s

of course DDR have a lot less latency
The ps5 can reach peak 22GB/s with compression not to mention the ps5's IO hardware allows faster access.

Games installed on ramdisk load no faster than sata ssds. The ps5 is substantially faster than sata at loading a game, thus it is faster than even a ramdisk solution. This is probably due to all the overhead bottlenecks Cerny mentioned exist, and solved with ps5 IO hardware.

The games could work faster with ram in theory, but instead of installing to ram they'd need to be designed to work entirely from ram not loading from an install.
 
Last edited:
depending on the DDR:

DDR4 2133:17 GB/s raw
DDR4 2400:19.2 GB/s raw
DDR4 2666:21.3 GB/s raw
DDR4 3200:25.6 GB/s raw

Ps5 SSD have similar trasnfert rate with DDR2 667:5.3 GB/s

of course DDR have a lot less latency

Limited by PCIe transfer rates though. So 16GB/s max on gen3 and 32GB/s on gen3. Although there'll obviously be other traffic using that bus too.

I'm really looking forward to the DirectStorage deep dives. It should answer a lot of currently unanswered questions.
 
System RAM to VRAM is faster than SSD to Unified RAM, and it has less latency.

Right, but it needs to get into RAM in the first place.

Standard PC SSD - RAM - VRAM

Will that be faster and have less latency than SSD - VRAM?

And besides, aren't there decompression steps in there too?
 
Last edited by a moderator:
Because current gen games aren't designed to load and stream faster.
There are games that seem to be designed to take advantage of ssds, like star citizen. But practically all games even those designed for ssds do not load faster in nvme vs sata ssd vs ramdisk. Perhaps ALL developers are lazy and don't know nvmes exist, so don't optimize for that. Or perhaps all the bottlenecks Cerny mentioned exist are actually impeding any faster loading than sata ssd even on ramdisk. Thats the nature of bottlenecks.
 
Last edited:
Have a larger batch of system RAM reserved. System RAM to VRAM is faster than SSD to Unified RAM, and it has less latency.
PC games requiring 16 GB of RAM would be a very expected next generation difference I think.

That's my expectation, at least until a significant portion of the PC hardware ecosystem is DirectStorage-capable.

depending on the DDR:

DDR4 2133:17 GB/s raw
DDR4 2400:19.2 GB/s raw
DDR4 2666:21.3 GB/s raw
DDR4 3200:25.6 GB/s raw

Ps5 SSD have similar trasnfert rate with DDR2 667:5.3 GB/s

of course DDR have a lot less latency

And that's just from a single memory channel. Gaming PCs would be dual channel.
 
So out of curiosity, I was trying to estimate the amount of memory needed for the geometry of the statue they had in the demo. They said that the statue had 33 million triangles. Maybe someone knows exactly how the geometry is getting stored, but here is my estimate of the amount of data anyway:

For a mesh consisting of triangles of a closed surface, I think the ration of number of tris and number of nodes is about 1:1. Hence, 33 million triangles should equate to rougly 33 million nodes that need to be stored.

For a mesh node, you need three numbers in 3D, i.e. the x,y,z coordinates. So in total, we have about 99 million numbers to store for the statue.

I am not sure if one gets away of storing the mesh in single precision (4 bytes) or if double precision is needed (8 bytes).

So you get rougly about between 400 MB (SP) or 800 MB (DP) for the mesh geometry alone of this statue...right?
 
There are games that seem to be designed to take advantage of ssds, like star citizen. But practically all games even those designed for ssds do not load faster in nvme vs sata ssd vs ramdisk. Perhaps ALL developers are lazy and don't know nvmes exist, so don't optimize for that. Or perhaps all the bottlenecks Cerny mentioned exist are actually impeding any faster loading than sata ssd even on ramdisk. Thats the nature of bottlenecks.
Nobody said devs were lazy. You program for the hardware that's common...

System RAM is vastly faster in both transfer rates and latency than PS5's SSD... So Cerny can be right that there ARE bottlenecks (because of course there are) but it can ALSO be right that games simply aren't being designed atm to take advantage of the fastest NVMe drives and most certainly not RAMdisks.
 
I would imagine compression would do wonders for such a dense triangle soup. Organize data nicely(needed for streaming anyway) and the adjacent vertex values would be very similar and lend themselves really well to be compressed further. Doesn't even need to be very complicated compression mechanism to take advantage of close proximity of adjacent vertices. Perhaps that's one part where sw rasterizer comes in handy as they can do decompression in their rasterizer? Cost of decompression gets amortized over less memory accesses?
 
Last edited:
There are games that seem to be designed to take advantage of ssds, like star citizen. But practically all games even those designed for ssds do not load faster in nvme vs sata ssd vs ramdisk. Perhaps ALL developers are lazy and don't know nvmes exist, so don't optimize for that. Or perhaps all the bottlenecks Cerny mentioned exist are actually impeding any faster loading.

Laziness or an inability to realize a benefit from current PC hardware are the only two possibilities that come to your mind? You don't think current-gen consoles and many PCs not having SSDs has had an effect on developers choice to not push the envelope further on the use of streaming in their engines up to now?
 
I would imagine compression would do wonders for such a dense triangle soup. Organize data nicely(needed for streaming anyway) and the adjacent vertex values would be very similar and lend themselves really well to be compressed further. Doesn't even need to be very complicated compression mechanism to take advantage of close proximity of adjacent vertices. Perhaps that's one part where sw rasterizer comes in handy as they can do decompression in their rasterizer? Cost of decompression gets amortized over less memory accesses?
I'm not totally convinced by that. There are no patterns in the values so compression options are limited. As I understand it, mesh compression is lossy with slight quantization of values to produce compressible data.
 
I'm not totally convinced by that. There are no patterns in the values so compression options are limited. As I understand it, mesh compression is lossy with slight quantization of values to produce compressible data.

Imagine simple case where we divide mesh to blocks. Top left block would have index 0,0 and top right would be 256,256. This would naively give us 65536 blocks(ignore z for simplicity). We would also make seeks work on 1x1 blocks to allow streaming blocks as needed. Instantly the integer part of vertices on each block is same saving space. Inside each block we only need to store the decimal part. Now inside the block we could also sort the data so we find some compression on the decimal part. We know the mesh is super dense so difference between adjacent vertices should be very small.

As the blocks are pretty small and geometry is super dense we could make more assumptions on how many bits are needed to store the decimal part. If data is sorted appropriately then even simple run length encoding would do wonders. I don't think math like assuming 32bit or even 16bit per vertice multiplied by count of vertices would apply. It's just way too naive considering how many ways there is to compress the data when knowing we have ridiculously dense geometry and we do want to make it streamable i.e. chunk it to blocks. Or do something more advanced.

I have seen some papers which store geometry as textures and unwrap from texture. Could be unreal is using something way different than just regular vertices.
 
Last edited:
Back
Top