Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Except that’s not how it works in the new system, as in most cases, the buffer doesn’t get smaller, it goes away!

For instance a texture is streamed straight from the SSD into the GPUs own cache pipelines, bypassing RAM altogether and so not using space there as well as no bandwidth. Not even the CPU is used in any significant way as the object goes from SSD to a hardware decompressor and then straight to the GPU.

And raytraced reflections also rarely works the way you think as rarely is there enough raytracing power to get a high res infinite depth reflection of the actual game world you would see if you turned around.

I actually wonder if it is not cheaper to just calculate the correct coordinates to mirror and transpose the meshes to the mirror location and just render them as if they were actual geometry ...

Probably not. Latency is like a 1000+ times worse from the SSD than RAM. The buffer can be used to effectively hide that latency by aggressively calling data to RAM that is not needed immediately. Bypassing the RAM because the data (that is needed immediately) is on the SSD makes sense because it eliminates the latency involved in placing it in the RAM and finally into the GPU. But the ideal situation is to treat those types of calls (direct calls to the SSD) as worst-case scenarios and find a way to minimize their use.
 
Last edited:
Probably not. Latency is like a 1000s of time worse from the SSD than RAM. The buffer can be used to effectively hide that latency by aggressively calling data to RAM that is not needed immediately. Bypassing the RAM because the data (that is needed immediately) is on the SSD makes sense because it eliminates the latency involved in placing it in the RAM and finally into the GPU. But the ideal situation is to treat those types of calls (direct calls to the SSD) as worst-case scenarios and find a way to minimize their use.

Probably yes, look at what ratchet&clank is doing:
“We unload the things literally behind you from a camera perspective. If you spun the camera around, we could load them before you see that. That lets us devote all of our system memory to the stuff in front of you right now, that you need to experience in that moment.”
https://gamingbolt.com/ratchet-and-...content-and-quality-in-every-corner-insomniac

Then you can also consider ue5 that is doing something very similar. Future is almost here, ssd's will finally become useful in gaming as we get the low level io api's we need, fast ssd's, good compression and games like ratchet&clank, miles morales and hopefully also horizon zero dawn this year. Those are fun games and show a little bit what the hw can do. Late generation games can be quite much better at using the available hw. I would wait few more years before passing any judgment on how games evolve now that limit is storage space, not storage bandwidth.
 
Probably yes, look at what ratchet&clank is doing:

https://gamingbolt.com/ratchet-and-...content-and-quality-in-every-corner-insomniac

Then you can also consider ue5 that is doing something very similar. Future is almost here, ssd's will finally become useful in gaming as we get the low level io api's we need, fast ssd's, good compression and games like ratchet&clank, miles morales and hopefully also horizon zero dawn this year. Those are fun games and show a little bit what the hw can do. Late generation games can be quite much better at using the available hw. I would wait few more years before passing any judgment on how games evolve now that limit is storage space, not storage bandwidth.

“We unload the things literally behind you from a camera perspective. If you spun the camera around, we could load them before you see that. That lets us devote all of our system memory to the stuff in front of you right now, that you need to experience in that moment.”

That doesn't hint at avoiding RAM as a texture buffer.
 
You don’t need the textures but you do need the geometry for detecting collisions and so on. And that’s a lot of data too.

The latency from RAM is so low, that even a big difference to SSD is still small. The latency from SSD on PS5 is still well below 1ms, which is well below the 16.67ms of a single frame of 60fps.
 
You don’t need the textures but you do need the geometry for detecting collisions and so on. And that’s a lot of data too.

The latency from RAM is so low, that even a big difference to SSD is still small. The latency from SSD on PS5 is still well below 1ms, which is well below the 16.67ms of a single frame of 60fps.
Nanoseconds I hope for ssd. Latency in ms is like spinning platter speeds IIRC.
 
For instance a texture is streamed straight from the SSD into the GPUs own cache pipelines, bypassing RAM altogether and so not using space there as well as no bandwidth.

i think you are wrong here. IThe SSD is fast but not that much.
 
I actually wonder if it is not cheaper to just calculate the correct coordinates to mirror and transpose the meshes to the mirror location and just render them as if they were actual geometry ...

I think what you are describing planar reflections. And it is indeed how mirror or water reflections are done in some games that chose to devote a lot of resources to those effects. But it still is limited to a single reflection plane. Or, you need to re-transform the world geometry for every different reflection plane. It's not viable for scenes with arbitrarily many planes of reflection, which is actually any scene that does not consist of 100% non-reflective materials and ONE lone perfectly flat mirror.
 
Except that’s not how it works in the new system, as in most cases, the buffer doesn’t get smaller, it goes away!

For instance a texture is streamed straight from the SSD into the GPUs own cache pipelines, bypassing RAM altogether and so not using space there as well as no bandwidth. Not even the CPU is used in any significant way as the object goes from SSD to a hardware decompressor and then straight to the GPU.

And raytraced reflections also rarely works the way you think as rarely is there enough raytracing power to get a high res infinite depth reflection of the actual game world you would see if you turned around.

I actually wonder if it is not cheaper to just calculate the correct coordinates to mirror and transpose the meshes to the mirror location and just render them as if they were actual geometry ...
You really don't want that as textures etc are read multiple times. So you want to buffer then for a few frames at least. The buffer does not go away it just s just much smaller and contains only things that are in the scene.
Don't forget, textures must be filtered, shaded, used for light calculations, for shadow calculations and reflections. You don't want to wait for a texture part when the shader is loaded. Latency might be 1ms (which still is much) but that only means you get an answer and not that all the data is transferred.

That all the data behind the camera can be pushed out of memory (attention: RT reflections work against that) only means that you can turn the camera only as fast, as the streaming speed allows it. You still want to have cached some geometry and textures that are there on the way to the other camera position (as much as the streaming system allows). This is not that much different than what is done now for years, just more extreme now. Before more or less base assets from the whole level were loaded into memory and in packages for bandwidth optimization. This way many thinks landed in memory that were never needed. And this is what is really the new thing. Only stuff that is actually needed is now loaded into memory (and a bit more extremely tiled on Xbox if it works like promised).
If there would be no buffer needed, R&C wouldn't need any loading time/animation while going though "world portals". The PS5 ssd is fast, but not that fast that we can completely get rid of the buffer.

Btw, Caches on the GPU are really small. How do you want to get a 4k texture that is compressed on the SSD into the cache without hitting the main memory? Also normally the cache is only an image of parts of the main memory (synched back and forth) so directly loading to the cache does not work. The GPU can just request data from the SSD a short time (e.g. a few frames) before it is needed. This data is than copied (and decompressed) into main memory where the GPU can work with the data.
 
Last edited:
Right, I may have misremembered or misunderstood some parts from here:

https://playstationvr.hateblo.jp/entry/2020/03/30/181003

So we've implemented a gentler way of doing things where the coherency engines inform the GPU of the overwritten address ranges and custom scrubbers in several dozen GPU caches do pinpoint evictions of just those address ranges.

The best thing is as a game developer when you read from the SSD you don't need to know any of this you don't even need to know that your data is compressed.

You just indicate what data you'd like to read from your original uncompressed file and where you'd like to put it and the whole process of loading it happens invisibly to you and at very high speed.

So yeah, textures will be streamed to memory and while they are being used by the GPU the cache scrubbers inform if the part in the cache is overwritten and out of date.

What he says about the streaming speed - 4GB of compressed data in half a second. In most console games, that is about the time you turn. But the more important part is that the game can be 100+ times more aggressive in evicting data or LOD levels for the GPU input from RAM.
 
I think what you are describing planar reflections. And it is indeed how mirror or water reflections are done in some games that chose to devote a lot of resources to those effects. But it still is limited to a single reflection plane. Or, you need to re-transform the world geometry for every different reflection plane. It's not viable for scenes with arbitrarily many planes of reflection, which is actually any scene that does not consist of 100% non-reflective materials and ONE lone perfectly flat mirror.

Good point but on the other hand, if I look at Dreams and many other engines, transformed versions of the same geometry can be vastly more efficient to render. And I doubt the current generation of consoles has the power to do good reflections for multiple bounces - they already don’t get you a sharp reflection in the single bounce, basically. They will improve and possibly its too much work to get right, but I do wonder about it.
 
What he says about the streaming speed - 4GB of compressed data in half a second. In most console games, that is about the time you turn.

Save for some fast paced shooters, I think in most games it takes well over 1 second to make a 180° with the right stick. Especially 3rd person adventures like Ratchet&Clank or God of War.

1.5 seconds out of a typical 8GB/s streaming rate means the IO can push at least some 11GB of decompressed data into the RAM during that 180° turn, which is probably well over the amount of data that needs to be replaced, as character + weapon models, audio data, engine etc. don't need replacement most of the time.


With datasets optimized for Oodle texture we should be looking at over 12GB/s of effective transfer rate, so devs can be even more aggressive at how fast they want to change the contents in the RAM, and perhaps allow faster PoV turns.
 
Last edited by a moderator:
Good point but on the other hand, if I look at Dreams and many other engines, transformed versions of the same geometry can be vastly more efficient to render. And I doubt the current generation of consoles has the power to do good reflections for multiple bounces - they already don’t get you a sharp reflection in the single bounce, basically. They will improve and possibly its too much work to get right, but I do wonder about it.

Multiple bounces has nothing to do with sharp refections. Muliple bounces are about reflections within refections.
We are talking about arbitrary reflection angles and positions and the whole point of RT is its infinitly more optimal than what you are suggesting, and games like watch dogs and spiderman on new gen consoles already show it is possible.

You have severly misinterpreted how rendering tech works about every time you tried to say something about it. I would recommend that you research more before asserting things tah are above your head.
 
With an HDD not virtualisation of geometry, this will be impossible to do. No available GPU is able to render exactly like this this scene with traditionnal method not a 3090. It means tons of memory, a GPU running at a very low utilisation because of the huge number of polygons. After no continuous LOD like in Nanite need to have multiple LOD of each asset.

optimizing-the-graphics-pipeline-with-compute-gdc-2016-51-638.jpg


1 polygon per pixel means only 6,25% efficiency of the GPU shading, this is coming from a DICE presentation "optimizing the graphics pipeline with compute".
I want to clarify something from a few pages back. Those efficiency numbers are for the rasterizer, not shading. With 1 pixel/triangle the shading efficiency is 25%.
 
“We unload the things literally behind you from a camera perspective. If you spun the camera around, we could load them before you see that. That lets us devote all of our system memory to the stuff in front of you right now, that you need to experience in that moment.”

That doesn't hint at avoiding RAM as a texture buffer.

Correct, what they are saying is that
  • Things that are out of sight of the player are unloaded from memory
  • When the camera turns, the SSD is fast enough that they can load the needed data into memory before the GPU needs it.
Rendering involves multiple reads into and out of memory for any given frame. Basically what he's saying when he says, "That lets us devote all of our system memory to the stuff in front of you right now..."

Basically the fast SSD acts as a memory footprint multiplier. Instead of the GPU holding both the scene that is being rendered as well as the data that might be needed if the player's view changes, it now only holds the data that is required for rendering the visible scene and when needed new data can be loaded as it is about to come into view because the IO stack is now fast enough.

Since the view can only be changed at a relatively fixed rate, there's plenty of time to load in any new data into memory as the camera turns. Which, again, allows them to potentially use all of the system memory for rendering what is immediately visible to the player.

In short, because of the fast IO subsystem, significantly more memory can be dedicated to rendering the scene as opposed to consoles in the past which had to share that memory with assets that might or might not be used at some point in the future depending on where the player looks.

While it's possible some things could be read directly into GPU register space (or caches), it doesn't really make sense to do that when rendering the scene involves multiple reads and writes into and out of main memory (not to be confused with main memory on a PC, this is analogous to GPU memory on PC) from the GPU.

Regards,
SB
 
Last edited:
And raytraced reflections also rarely works the way you think as rarely is there enough raytracing power to get a high res infinite depth reflection of the actual game world you would see if you turned around.

I actually wonder if it is not cheaper to just calculate the correct coordinates to mirror and transpose the meshes to the mirror location and just render them as if they were actual geometry ...
Adding some different examples to milks explanations to show why we can not generally transform geometry efficiently to get mirror reflections...

1. A planar mirror, or a planar water surface: It works, just mirror and transform the geometry. All surfaces sharing the mirror plane can use it. Multiple mirrors, each having a different plane require to transform and raster the scene for each. (recent Hitman game does this for multiple planes of windows.)

2. A mirror ball: Does not longer work with a linear transformation. We would need to transform and render the scene for each pixel of the ball. From each render we would fetch only one texture sample to display it. Ofc. raytracing is more efficient then.
But we could still do a nonlinear transformation of the scene, so it looks like a fisheye projection of the scene. Works only if tessellation is fine enough to allow this (lines become curves so we need enough segments to get that curve.), but faster than RT.

3. A concave mirroring object, e.g. a torus. We see the same scene object reflected multiple times on the torus. At this point it's no longer possible to process the scene just once to get all reflections. All we can do is rasterizing a cube map, and accept resulting errors, e.g. missing self reflections of the torus. But if we have many such complex mirror objects in the scene, we need to render many dynamic cubemaps, and if the objects are large we also need many cubemaps per object and blend them, including the complexity to figure out where to place those cube maps.

So, there is no way around raytracing, which becomes the most efficient method to do this at some point.
 
great posts clarifying raytracing guys! Was gonna try to explain the planar reflection downsides but you all beat me to it much better than I could.

Correct, what they are saying is that
  • Things that are out of sight of the player are unloaded from memory
  • When the camera turns, the SSD is fast enough that they can load the needed data into memory before the GPU needs it.

...
Since the view can only be changed at a relatively fixed rate, there's plenty of time to load in any new data into memory as the camera turns. Which, again, allows them to potentially use all of the system memory for rendering what is immediately visible to the player.

Right. Plus, anybody can look at those ratchet scenes and see that the majority of the assets in any scene are the same even if you turn 180 degrees. Surely it's streaming in and out a significant amount of data as needed, or else the devs wouldn't have said that, but it's hard to imagine them needing 10 new gb from frame to frame.

Good point but on the other hand, if I look at Dreams and many other engines, transformed versions of the same geometry can be vastly more efficient to render. And I doubt the current generation of consoles has the power to do good reflections for multiple bounces - they already don’t get you a sharp reflection in the single bounce, basically. They will improve and possibly its too much work to get right, but I do wonder about it.

Dreams is not rendering geometry. They're SDFs being rendered in a novel way -- totally different problem to optimize.

Also, terminology-wise, the scalable value that can give your reflections sharpness is not bounces, as other posters said, it's samples -- most modern RT games use much fewer than one sample per pixel, whereas a film renderer might use dozens.
 
great posts clarifying raytracing guys! Was gonna try to explain the planar reflection downsides but you all beat me to it much better than I could.



Right. Plus, anybody can look at those ratchet scenes and see that the majority of the assets in any scene are the same even if you turn 180 degrees. Surely it's streaming in and out a significant amount of data as needed, or else the devs wouldn't have said that, but it's hard to imagine them needing 10 new gb from frame to frame.



Dreams is not rendering geometry. They're SDFs being rendered in a novel way -- totally different problem to optimize.

Heaviest operation for ssd in ratchet&clank has to be when warping from one dimension to another. It's a full unload current level, load next level operation. Streaming based on camera movement should be much lighter operation than changing levels. If the warp happens in something like 1s that would likely require ssd and decompression to go work hard.

For dreams it would be tremendously interesting to be able to stream content from ssd and in essence provide "infinite ram" for developing levels.
 
Last edited:
Multiple bounces has nothing to do with sharp refections. Muliple bounces are about reflections within refections.
We are talking about arbitrary reflection angles and positions and the whole point of RT is its infinitly more optimal than what you are suggesting, and games like watch dogs and spiderman on new gen consoles already show it is possible.

You have severly misinterpreted how rendering tech works about every time you tried to say something about it. I would recommend that you research more before asserting things tah are above your head.

With all due respect, for sharp reflections, you need enough samples, and a pretty big depth of tracing to be done. Kindly point me to all those reflections on especially in console games that have a decent sample size to produce sharp reflections and mirror more than a few objects nearby (like watchdogs) or do not severely reduce detail on objects further away (like Spider-Man). From there, multiple bounces are even more of a stretch, if a typical game can’t even get a single bounced reflection with enough depth and resolution. And I am basing this among others on all DF videos on the subject, which I’ve all seen, including the explanation on raytracing. And of course I know how raytracing works simply from occasionally playing around with raytracing software since the first ones were released on PC and it took ages to just render one dumb picture with a few balls.

It is totally and extremely obvious to me that raytracing is the superior solution and that there are situations where only raytracing may be realistically feasible, but as it is also almost painfully clear that the current rendering power available is so limited to the point where many turn off raytraced reflections because it cost so much more in framerate than it delivers, that I wonder about what kind of optimizations and alternatives exist that can help out, and when.

So when I look at Spider-Man for instance and I see the sorry excuse for what remains of the tree in Central Park, say, in the reflection of a building, I am just wondering if some basic raytraced data that can tell where the tree should be placed wouldn’t be seriously enhanced if you just drew the actual tree in the reflective surface using a few rays worth of data to determine where it should have been, say. And I am perfectly happy to hear why that is not feasible, but I think more than anything it is probably a problem that grows in complexity quickly also in terms of graphics engine design. But I bet for most actual flat mirror on the wall type situations on a console just drawing the actual geometry with an extra camera transform could be cheaper for now, to get a convincing quality. There are more diffuse surfaces that require much less precision, for which I imagine raytracing is cheaper quickly.

As for Dreams, that is a combination of SDF and more traditional rendering, depending on the situation, as far as I am aware. But also I am pretty sure that in traditional rendering it is also much cheaper to use this technique, just as it was when the tech first started being used in VFX render era ages ago.

So thank you to JoeJ who took the time to actually explain the limitations of raytracing for curved surfaces and such, that makes sense, that the transformations get too complex or expensive.

What I am sure of is that eventually, raytracing on pure geometry (eventually without textures) is the best solution for practically everything - just multi bounce the lights on everything sufficiently often on materials with the proper information on color, diffuse, etc and you solve lights (and automatically shadows as well), reflections etc. Same for sound, even.

But the current generation of consoles is way off that point, so I’m willing to bet that for large flat surfaces like buildings in watchdogs and Spider-Man using a vector transformation like here could be worth it


Especially if you then use the raytracing for better lighting and shadows instead.
 
Back
Top