Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Async to what? Reflection is sandwiched between G-Buffer generation and tiled shading. Async to SSDO which is even more expensive than tiled shading?
Async should help GCN considerably once Crytek will port this version of CE to D3D12 and/or VK. But I doubt that the same will be true for RDNA and thus it's unlikely that RDNA's comparative weakness here will be solved with modern APIs.

What should be pointed out here though are two points:

A) The demo may not be optimized for AMD h/w for now, especially considering that it's not really targeting RT presently. Crytek may have used NV h/w only while developing this.

B) RDNA's D3D11 driver may be rather poor considering that it's been less than half a year since it's launch and AMD generally isn't that good in D3D11. RDNA's GPU compute results were all over the place back when it's launched and I don't know if this has changed since then. This could affect the results here too.
 
Why low? It's full rate on both.
Not really:

AMD-Radeon-RX-5700-XT-AIDA64-GPGPU-Part-2.jpg
 
Async should help GCN considerably once Crytek will port this version of CE to D3D12 and/or VK. But I doubt that the same will be true for RDNA and thus it's unlikely that RDNA's comparative weakness here will be solved with modern APIs.

What should be pointed out here though are two points:

A) The demo may not be optimized for AMD h/w for now, especially considering that it's not really targeting RT presently. Crytek may have used NV h/w only while developing this.

B) RDNA's D3D11 driver may be rather poor considering that it's been less than half a year since it's launch and AMD generally isn't that good in D3D11. RDNA's GPU compute results were all over the place back when it's launched and I don't know if this has changed since then. This could affect the results here too.
When Crytek first showed Neon Noir it was running on a Vega 56
 
Async to what?
If there really is no task available they could add one frame of latency and so do entire RT stuff async while the next frame is raterized.
Not really:
Likely the graph shows a load of just 32 bit multiplications? That's a special case, and only 24 bit is fast, while adds or bit math is not affected and full rate. Interesting to see this restriction is still there with RDNA.
In practice it should not matter much with the integer math necessary to index nodes (or any other memory).
 
Async should help GCN considerably once Crytek will port this version of CE to D3D12 and/or VK.

D3D12 and Vulkan is supported since a long while:
https://github.com/CRYTEK/CRYENGINE/tree/release/Code/CryEngine/RenderDll/XRenderD3D9/DX12
https://github.com/CRYTEK/CRYENGINE/tree/release/Code/CryEngine/RenderDll/XRenderD3D9/Vulkan

Additionally, my statement wasn't out of context, I question the possibility in the specific CryEngine setup/pipeline in the current engine, to leverage async compute for the raytraced reflections because it is sandwiched between hard/unmovable dependencies.

CryEngine is perfectly able to schedule asynchronous compute:
https://github.com/CRYTEK/CRYENGINE...erD3D9/GraphicsPipeline/TiledShading.cpp#L221
https://github.com/CRYTEK/CRYENGINE...3D9/GraphicsPipeline/ComputeSkinning.cpp#L451
https://github.com/CRYTEK/CRYENGINE...rD3D9/GraphicsPipeline/VolumetricFog.cpp#L633

If there really is no task available they could add one frame of latency and so do entire RT stuff async while the next frame is raterized.

I've dealt with SSR-lag of one frame before. Not a pleasant thing to deal with visually.
 
D3D12 and Vulkan is supported since a long while:
https://github.com/CRYTEK/CRYENGINE/tree/release/Code/CryEngine/RenderDll/XRenderD3D9/DX12
https://github.com/CRYTEK/CRYENGINE/tree/release/Code/CryEngine/RenderDll/XRenderD3D9/Vulkan

Additionally, my statement wasn't out of context, I question the possibility in the specific CryEngine setup/pipeline in the current engine, to leverage async compute for the raytraced reflections because it is sandwiched between hard/unmovable dependencies.

CryEngine is perfectly able to schedule asynchronous compute:
https://github.com/CRYTEK/CRYENGINE...erD3D9/GraphicsPipeline/TiledShading.cpp#L221
https://github.com/CRYTEK/CRYENGINE...3D9/GraphicsPipeline/ComputeSkinning.cpp#L451
https://github.com/CRYTEK/CRYENGINE...rD3D9/GraphicsPipeline/VolumetricFog.cpp#L633



I've dealt with SSR-lag of one frame before. Not a pleasant thing to deal with visually.
While CryEngine has builds that support D3D12 and Vulkan, as far as I know, the build Neon Noir is using doesn't and the "stable build" doesn't either
 
Digital Foundry has analyzed the Crytek solution, major sticking points:

-The scene is a closed street with very few moving objects. Updating the scene representation every frame would be very expensive; a news post by a developer of the open-source Ogre3D game engine detailing voxel cone tracing seems to support this: "The voxelization process isn’t cheap. If we were to try it every frame, it could run anywhere between 0.5-10 fps depending on scene complexity, voxel resolution and GPU performance."

-No reflections of reflections

-Low-poly versions of models are used in reflections

-Only mirror-like reflections are ray traced, no rough surfaces

-Reflections are rendered at quarter of screen resolution on ultra, and 1/7 or 1/8 on other settings

-LOD for triangle ray traced reflections is fairly noticable

 
a news post by a developer of the open-source Ogre3D game engine detailing voxel cone tracing seems to support this: "The voxelization process isn’t cheap.
Ogre uses a compute shader here, while Crytek does it on CPU, so 'async'. I do not know if they vexelize objects to the same grid every frame, or if they precompute a 'brick of voxels' per object and transorm the ray to the brick instead. If it's the latter voxelization costs would be low.

Looking at Gustaffsons blog (Teardown game), he even seems to use a combination of both: http://blog.tuxedolabs.com/2018/10/17/from-screen-space-to-voxel-space.html (We see voxel brick objects, but shadow acne indicating they are all voxelized to the same space to trace shadows).
This can be very fast by injecting each brick vixel into a global grid. I did the same with surfels when i worked on lighting volume, it's basically one atomic op per brick voxel, if done on GPU.

That just said to say we can not conclude voxelization would be a bottlenack here as long as we do not know details, even if the typical GPU voxelization using rasterization as in early VCT papers is known to be prohibitively slow.
 
Digital Foundry has analyzed the Crytek solution, major sticking points:

The flat mirror reflections have the same type of artifacts I'd associated with render-to-texture reflections from 15+ years ago. Visible artifacts in every possible way -- spatially, temporally, and geometrically. Basically in all the ways that ray-tracing is supposed to do reflections properly, it's not. By the time ray-tracing's "holy grail" is actually achievable they're going to have to come up with a different marketing term for it and try to convince consumers all over again how great it is.
 
The flat mirror reflections have the same type of artifacts I'd associated with render-to-texture reflections from 15+ years ago. Visible artifacts in every possible way -- spatially, temporally, and geometrically. Basically in all the ways that ray-tracing is supposed to do reflections properly, it's not. By the time ray-tracing's "holy grail" is actually achievable they're going to have to come up with a different marketing term for it and try to convince consumers all over again how great it is.

Well, to play devil's advocate: Even if these reflections have the same visual artifacts of classic 1998's render to texture type planar reflections, they still are much less limited than those in terms of how many different reflection planes a scene can present simultaneously. Crytek's demo, despite all it's other limitations, at least has scenes with half a dozen different surfaces at different positions and angles each reflecting the rest of the env. properly. While that would be possible by re-rendering the scene multiple times in the traditional techniche, it would be prohibitively expensive.
 
Last edited:
In the early days of RTX, there was some consideration that perhaps compute would be enough. In the subsequent developments, that appears very much not the case. The BVH intersect hardware of RTX adds little silicon cost for proportionally far higher gains in RT performance. I think at this point it's a given that as much RT acceleration as possible is the most sensible move for the consoles to support the Future Tech of game rendering which will advance significantly given dependable, widespread RTRT hardware. "RT hardware" at a minimum should be what RTX provides, and hopefully more in either a more versatile solution and/or a better accelerated solution (like PowerRT's BVH construction) so better than RTX2060 ray performance is possible.
 
The demo is a nice showcase of why we need hardware based RT rendering for now. Hence why even consoles are going to get it.
Sigh, even you actually proof RT would have been approached without a helping hand from hardware vendors, many people just ignore this.
So what exactly brings you to this conclusion, even with BFV, the first RTX game showing so similar results?
It is a minor difference in performance, right? I say this difference is minor, because both approaches use the same algorithms with the same time complexity.

If we take this list of 'disadvantages':

-No reflections of reflections

-Low-poly versions of models are used in reflections

-Only mirror-like reflections are ray traced, no rough surfaces

-Reflections are rendered at quarter of screen resolution on ultra, and 1/7 or 1/8 on other settings

-LOD for triangle ray traced reflections is fairly noticable

Let us adapt it to BFV:

-Some objects missing completely from reflections

-Only materials with roughness under a certain threshold are traced, the rest is cube mapped

-Reflections are only raytraced for 1/3 of pixels (don't remember exact numbers)

-NO LOD AT ALL!!! Because the hardware can't do it, is impossible. to deal with it, they just clip objects at a distance without any smooth transition.

Of course we can discuss this comparision of features now, but that's not the point, and we did enough of that already.

The point is: Using RTX, we get RT faster, a few years earlier than without it, but the downside is: Restrictions.
Take the last point LOD as an example. LOD is the only option to have independent scene complexity and performance. If we lack it, scene complexity is bounded by hardware and ther is no way to adapt dynamically. That's the main limitation of current RTX, IMO.
To solve this, a traversal shader hs been proposed by Intel. So while traversing the scene, one can decide to jump to a lower LOD presentation. The trasition can be hidden stochastically by switching LOD at random per pixel and let TAA sort it out. (Notice this is not possible with rasterization, and this may be the first time we see continuous LOD in games, so a very big thing.)

The problem: First RTX GPUs have no support for traversal shader because traversal is fixed function. How to deal with it? Leave first gen GPUs behind? Develop multiple codepaths? (The latter won't work. If we do this, we can not have a game that fully utilizes the new GPUs! The compromise has to be towards the old gen. Period.)

Would we have this problem, if there would be no RTX but just compute RT? Seriously not and never.

So what's the win? A hand full of games with their own performance problems, adding not so much to image quality than we might hoped, expensive GPUs for a niche, and a huge marketing machinery just to form your quoted opinion? Maybe it's more than that, but maybe not so much.

What we have for sure is: Now we have to wait, hope and beg on the GPU vendors to extend possibilities and to lift restrictions with time und future hardware, so in five years, when all those first gen GPUs are gone we can use some of those those features.
This ranges from exposing BVH format so we could generate dynamic geometry (which would be possible right now but vendor specific), up to breaking features like traversal shaders.
We also have to hope some big guys do not try to boycott such new features because they lack support, maybe even for their upcoming generation.

Believe this is frustrating and it is hindering progress, IMHO and experience.

Now, Cryteks work is just good for some time while people upgrade to RT GPUs. Sad but true, but don't think it would be inferior, which it is not.
It is superior, because it already has LOD build in by supporting voxel mipmaps, and it can do cone tracing as well, which hardware RT can't and never will.
But it is a proof hardware RT would NOT have been necessary. Please think of it. And be sure Cryteks work could be improved upon, if it would make any sense.
 
BFV isn't a great reference point as it was a first title with RTX added as an after-thought. Look at the pure specs of how many rays (and what devs will one-day be able to do with those beyond glossy reflections) in compute-based versus RTX based.

Control is a better comparison as Remedy included the best non-RT alternatives. Obviously we don't have a comparison with Crytek's latest offerings, but it is showing far better results from RTX such as one ray per pixel reflections.
 
To solve this, a traversal shader hs been proposed by Intel. So while traversing the scene, one can decide to jump to a lower LOD presentation. The trasition can be hidden stochastically by switching LOD at random per pixel and let TAA sort it out. (Notice this is not possible with rasterization, and this may be the first time we see continuous LOD in games, so a very big thing.)

That's actually how a lot of deferred rendered games handle their LOD transitions. Many have beem doing it since last gen even, before TAA was even a thing... They just accepted the noise.
 
But it is a proof hardware RT would NOT have been necessary.

So, sony and MS, AMD, and Nvidia are just stupid? What's the problem anyway, consoles are going to have hardware RT, it's confirmed, i assume it will be atleast equal to RTX otherwise why would they bother.
 
Back
Top