Next gen lighting technologies - voxelised, traced, and everything else *spawn*

WCCFTech - Scorn Developer Interview
May 29, 2020

Q: Given that Scorn is powered by Unreal Engine technology, we cannot fail to ask you some thoughts on the stunning new UE5 demo. Also, do you plan to upgrade to UE5 in order to take advantage of those new features (Nanite and Lumen) for Scorn?

That demo looked very impressive. Even more so on the development side, if all that was said is true without some major caveats. It looks like all three platforms will be able to use the engine quite well.
...
Even if Epic for some reason wanted to create an engine only for that system I doubt that they would design it to be primarily focused on the system’s slowest part (compared to other parts in that system). Theoretically, If I had to choose I would rather take an average speed SSD (an even slower than the one in Series X) and have more memory. Now since that kind of system would be obviously too expensive, these SSDs with custom I/O solutions are the best option.

UE5 isn't even available yet, and even if it was available today moving the game now to what seems like a completely different pipeline would result in a failure of great proportions.

Q: Scorn does not use ray tracing, at least according to the official tags listed on the Xbox website. Furthermore, the UE5 demo showcased an impressive Lumen real-time GI solution that does not rely on ray tracing. Does that mean ray tracing will remain an optional technique rather than a baseline one for next-generation games as some had assumed?

As you saw with that demo there are other ways to get that GI equivalent. Developers have been using Ray Tracing to create static GI for years. Real-time Ray Tracing is certainly a breakthrough. It will be a much more useful tool for the developers in the future than a mind-blowingly obvious feature for players to notice. You looked at the tag to see if it was there. Through the years developers have developed many different techniques to fake aspects of what Ray Tracing can accomplish, from reflections to shadows and AO.

These 'fakes' have some limitations. Presentation is more static, effects at certain angles break the illusion, but for the most part, it looks pretty good. Sometimes when the new technology becomes available some developers start overusing it just to show it off, without thinking about the context in which it's getting used. That is why you are starting to see games that have rooms with all reflective surfaces or inappropriate lighting conditions just to show off the technology. Technology should be in service of what you are trying to accomplish, not the other way around. So yes, Real-time Ray Tracing will undoubtedly be a complete solution in the future, but in the nearest future developers will use it on case to case basis.

The trailer was labeled as 'in-engine footage representative of expected Xbox Series X visual quality', which usually means it was running on PC. If so, can you share the specs used?

Now, this is a tricky question as for some reason a lot of people feel that it should be quite easy to get 4K 60FPS on the PC even with this graphical fidelity, and that really isn't the case. For the showcase we used 2080Ti and a Ryzen processor just because there was no reason not to use it, but a 2070 Super with a mixture of settings is adequate to run the game at 4K 60FPS.



https://wccftech.com/scorn-intervie...-system-trailer-was-running-on-an-rtx-2080ti/
 
I wonder if that preference is fuelled by legacy thinking though? The moment one thinks of a problem to solve now, one thinks of using data in RAM simply because fast storage isn't an option. Similar to thinking of a problem in terms of a single thread instead of multiple threads when we move over to multicore, which was forced onto devs. As discussed before, all data pools are cache between storage and CPU registers. As a pool gets faster, the need for interims decreases, so we can ask the same question of any tier. Would a dev prefer more L2 cache and less DRAM? Okay, the sizes and deltas are very different in that case, but as the storage moves closer to RAM in terms of delivery, RAM can be looked at less like working storage and more like a cache for the SSD data, at which point the whole mindset for game design might shift.

Another big move this way is a move away from Object Orientated development to Data Orientated. Thinking in terms of data and stream-processing everything, steaming the game data becomes a part of the intrinsic design philosophy.

Store all these developer opinions away now, and we'll compare them to developer opinions at the end of the generation. ;) Maybe next-gen hardware predictions will start with 16 GBs RAM again, only faster, and 100 GB/s SSDs. ;)
 
NVDIMMs have potential for even better bandwidth. For the moment it's tech marketed and priced at enterprise, but consoles are a big enough market they can knock the margins down.

Of course completely custom extremely wide bus flash directly on an interposer with the CPU/GPU could do even better, but that's a significant investment.
 
Demanding more RAM is IMHO going in the wrong direction. That request is built upon the premise that you could pre-load all assets if you just had enough RAM. When artists now consider it viable to dump 100GB or more worth of assets on the customers hard drive.

Streaming assets has to be the way to go, but increasing bandwidth to storage can't be the solution either. Slow storage is going to stick with us for at least another 2-3 years before HDD based systems currently in use are ultimately phasing out. Even longer until early SSD adopters with SATA II links are replacing their former enthusiast systems. Half a decade, before you may consider NVMe (or equivalent tech) to be defacto baseline standard.

So streaming concepts have to be devised with low available bandwidth in mind. What springs to mind, is shifting asset decompression from CPU to GPU. Lossless compression, and especially beyond the block-wise, transparent texture compression formats we have gotten so used to. Even with slow storage, you can afford to prefetch a couple hundred MB on CPU, if that means being able to quickly deliver a multitude of that in decompressed assets.

About 3 years ago (https://arxiv.org/pdf/1606.00519.pdf), academic research had reached the point where LZ77 like decompression speed on the GPU had exceeded PCIe 3.0 16x bandwidth. But somehow nothing made it from academic research into production. Research by other parties is still ongoing (http://www.bncss.org/index.php/bncss/article/view/143), showing promising results, closing in on 100GB/s decompression speed for highly compressible resources.

Yet tools provided by our IVHs still all only revolve around plain old lossy texture compression.

Probably it's time to rethink residence of resources, and to treat uncompressed resources even in GPU memory as transient only. It may even be worth a consideration to treat a portion of VRAM as a prefetch cache for compressed assets, with the goal to be ideally able to decompress missing assets within the same frame on the fly.

With a complex culling chain, there should be plenty chances to record which assets still need to be decompressed before proceeding with generating the final draws. E.g. as a side product during generation / evaluation of the HighZ buffer.
 
Last edited:
hmm.. perhaps I'm wrong, but you're still going to allocate a fixed size within RAM for streaming though right? Meaning you can only carve up 16 GB in so many ways. The faster the streaming SSD you have, that's great, but there's got to be a pool limit unless you want to be trying to render off your slowest bandwidth.

So if you have 16GB of memory, 2.5 GB of it reserved. You're going to set aside say 5GB for textures with 7.5GB remaining for render work. You're going to be limited by the 5GB pool you set aside. Even if you stream faster and faster, you're just going to be held by that 5GB allocation. So lets assume your VT system has pools of MIP 0-13 textures, each pool approximately 400MB in size, holding their appropriate number of tiles as the tile sizes go up. Despite how fast you stream, if you put more on the screen than your pool size, something will be held back from entering the next mip level until something exits. Ie. Something in MIP 4 pool can't move into MIP 3 pool if MIP 3 pools are full. So despite how fast your streaming is, something still needs to exit, and that's a memory footprint issue.

We can't rely on the SSD being able to stream textures and consume them every single frame. Your bottleneck will become the speed of the SSD.

And then there are still issues with edge cases like transparent textures.
 
We can't rely on the SSD being able to stream textures and consume them every single frame. Your bottleneck will become the speed of the SSD.

And then there are still issues with edge cases like transparent textures.

Both bandwidth and latency. Regardless this thread is about lighting. Lighting tends to be quite dynamic by nature, and that means the first bound you're going to hit is almost certainly compute throughput in some way, the SSD won't help much at all.
 
hmm.. perhaps I'm wrong, but you're still going to allocate a fixed size within RAM for streaming though right?
Agree. With storage size being the greater problem than streaming speed, i see options to achieve really good compression ratios only by utilizing repetition. E.g. UE5 with it's instances of rocks, or something more fine grained like seen in DAG compressed shadow maps.
But this means we need to keep those instances in memory. Streaming them only on demand does not work because there is always at least one instance (or data fragment of a dictionary) visible and required.
Though, i don't see a problem with 16 GB. Seems enough.
 
Unigine said:
We have invented Panorama Space Dynamic Global Illumination (PSDGI) technology for UNIGINE 2 real-time 3D engine. Available starting from the 2.17 SDK release, including free Community edition.

Key features:

* Great visual quality
* Reasonable performance
* Works with GPUs of various vendors
* Takes emission materials into account
* Good solution for open worlds

We have added a new Raymarching mode for the Environment Probe light source. This is a far more advanced high-quality dynamic GI technique than any other solution used in UNIGINE before. This technology combined with the new spatial temporal denoiser is a huge step to a full-scale GI solution.

 
Gives the impression that it's a spherical sampling around the player that's projected, so like screen space but the environment behind the player also. There's quite a lot of temporal-accumulation latency though as lighting updates over time.
 
Gives the impression that it's a spherical sampling around the player that's projected, so like screen space but the environment behind the player also. There's quite a lot of temporal-accumulation latency though as lighting updates over time.

Yeah, in that case, that's very much Mafia's solution. Which I find very competent, and I wish I got to see it pushed further, though RT made it obsolete.


MafiaIII_Slide-18.PNG
 
Is a neural radiance cache surfel GI where a small MLP stores the hemisphere instead of a small texture map?
 
Is a neural radiance cache surfel GI where a small MLP stores the hemisphere instead of a small texture map?
NRC trains from low-res, full-length traced scene. The result(=cache) is a tiny neural network model.

The workflow is to first run a path tracer at lower-than-target resolution to write training radiance for the neural network. We refer to this pass as the update pass. This is followed by a second, full-resolution, query pathtracer pass, where query points where we want to read predicted radiance are created. Next, the neural network predicts radiance which is read at the queried points during a resolve pass. To generate the training data, the NRC library internally propagates the predicted data backwards along the training path such that each vertex of that training path will get an estimate of reflected light - this will be used to train and optimize the network so it makes accurate predictions.

NRC training view:
1training.jpg

NRC query view:
2query.jpg

SHaRC algorithm integration doesn't require substantial modifications to the existing path tracer code. The core algorithm consists of two passes. The first pass uses sparse tracing to fill the world-space radiance cache using existing path tracer code, second pass samples cached data on ray hit to speed up tracing.

SHaRC cache view:
3sharc.jpg
 
A bit, odd. Cached radiance transfer aint exactly new, and a major point of these new GI techniques is you don't have to bake anything. Look instant feedback and dynamic gameplay! Getting rid of all that so you can add "neural" as a bullet point seems silly.

Gives the impression that it's a spherical sampling around the player that's projected, so like screen space but the environment behind the player also. There's quite a lot of temporal-accumulation latency though as lighting updates over time.

They just parent the environment map to the player in the video, so it's not necessary to have it as such. Environment probes aren't the worst for really distant geometry if you have a separate mid range, UE5 still uses an environment probe of the skybox instead of volume tracing it. Maybe they'll integrate a separate mid range solution like AMD's Brix GI, then you combine it with screenspace and hey you've got an end to end dynamic GI solution.
 
Back
Top