Next gen lighting technologies - voxelised, traced, and everything else spawn

pixeljetstream · Jan 2, 2019

Personally I am more on the rasterization side of things, but I think having an alternative to 3d textures for spatial lookups is great, and I see dxr just as a beginning to something more generic after a bit of evolution.

JoeJ · Jan 2, 2019

pixeljetstream said:
Hopefully this doesn't sound arrogant, but @JoeJ the abstraction chosen for dxr is intentionally hiding details about what the rtcores can do. Given Nvidia has been actively working in GPU raytracing for a decade and has amassed tons of experts on raytracing in general (siggraph papers etc.), you can assume that a lot experience of many people has gone into the design. While that is no guarantee for success, and every iteration in design is a compromise, some of your statements make it sound like the people who worked on this have not thought it through.

I do not doubt their expertise. They know much more about RT then i do. And about hardware anyways.

Although i see serious issues here with bringing the performance to the street, i guess we will see RT cores to take off with following GPUs.
But as it is now the hardware seems far from optimal, with a questionable need for RT cores at all, and TitanV == 2080ti in BFV really proofs this, or not? But due the lack of competition their success IS guaranteed anyways.

And to make adoption easy and hinder competition, they protect their 10y of experience with an API that is more like OpenGL 1.0 than DX12, and they stamp questionable FF HW into silicon which can only be underutilized as long as we are hybrid.
We will be much longer hybrid than necessary, i'm afraid of.

pixeljetstream said:
The research on doing RT without dedicated hw won't stop and will continue to be important to influence sw/hw design.

But the research is now entirely in their hands, locking out all the other experts with 10yo experience. They become just consumers with further inventions prevented.
A more open and flexible approach would have been better and also possible as we see now. I would LOVE TitanV kind of GPU without RT cores and fine grained sheduling exposed to GPGPU as well - it would have been f###ing perfect!!!
It's 2019 - why do you think we need to go through DX8,9,10... again, just because it's about rays?

I do not get why everybody just follows without any doubt and critique. Just adopting and moving on seems the only option, but i'm just not happy with that.

Shifty Geezer · Jan 2, 2019

JoeJ said:
and TitanV == 2080ti in BFV really proofs this, or not?

Titan V is bigger and more expensive.

I do not get why everybody just follows without any doubt and critique. Just adopting and moving on seems the only option, but i'm just not happy with that.

Because they're more willing to accept the compromises needed to get realtime raytracing into affordable hardware.

I haven't followed the numbers of this thread to know how RTX compares to non-RTX other than it accelerates aspects by a significant margin, but the lack of clear consensus shows it's not at all obvious as you imply, that everyone should be aware of the disadvantage RT cores brings. The data should be obviously "RT cores don't accelerate anything so what's the point?" for your argument to make sense. Your last numbers show a 2.4x speed increase with RT cores. That's significant.

At this point though, it's all wild speculation. Until someone (AMD) brings non-RT-core raytracing to the GPU to compare, it's only theoretical improvements one could make in software versus theoretical performance from a fixed-function unit and a couple of GPU comparisons. It's far from conclusive data.

troyan · Jan 2, 2019

JoeJ said:
I do not doubt their expertise. They know much more about RT then i do. And about hardware anyways.
But the research is now entirely in their hands, locking out all the other experts with 10yo experience. They become just consumers with further inventions prevented.
A more open and flexible approach would have been better and also possible as we see now. I would LOVE TitanV kind of GPU without RT cores and fine grained sheduling exposed to GPGPU as well - it would have been f###ing perfect!!!
It's 2019 - why do you think we need to go through DX8,9,10... again, just because it's about rays?

I do not get why everybody just follows without any doubt and critique. Just adopting and moving on seems the only option, but i'm just not happy with that.

Turing supports every feature of Volta: https://devblogs.nvidia.com/cuda-turing-new-gpu-compute-possibilities/

pixeljetstream · Jan 2, 2019

@JoeJ. I fail to see how research just using compute is blocked. But I also disagree with you that a 2x or more is not worth putting in hw.

As for API design, here I think it's totally valid to criticize the exact abstraction chosen. But you have to start somewhere.
If you start too low, it's not ideal to make changes later due to backwards compatibility. Too high can leave some performance/multi use on the table. So this will always be tough. However imo from high to low is a lot easier. But I agree that over time it shouldn't stay as high as is. The experience from now and the feedback from sw and the other hw vendors will help. It's hard it needs multiple iterations. Even rasterization isn't "done" and even there we don't expose low level rasterization details, despite twelve DX versions

Developers are using the api differently already, we have extensions for texture space shading etc.

Now yes it costs a shitload of money to make all that happen, so being first and a lot of the internals being protected by NDAs and IP frameworks like Khronos etc. is natural. But that is business as usual.

OlegSH · Jan 2, 2019

3dilettante said:
For GCN, parts of this seem better suited to a separate domain like the scalar unit and register file, but the existing scalar is already heavily used and may be too constrained in the operations it supports.

https://twitter.com/x/status/1069861229177520128

Funny story, these were added to Turing and are listed in Uniform Datapath Instructions
But I am pretty sure uniform instructions were added to save on power and vector register space because integer SIMDs are already decoupled in Turing, hence the same math can be done via more general purpose integer SIMD units concurrently to FP ops
DX RT should be quite stressing on resources handling, so both vector and uniform integer pipelines should be usefull and can likely be executed concurrently with some overlap

JoeJ said:
It's 2019 - why do you think we need to go through DX8,9,10... again, just because it's about rays?

Why do you keep repeating this?
DX8,9,10 are different pipelines.
DX RT has DX12 binding model and it's the first time GPGPU controls work creation, data expansion, work scheduling / distribution / termination, etc for fixed function units.
In rasterisation, FFP hardware controls all these aspects (with partial exception of Mesh Shaders), pixel / other shaders are fed by FFP blocks and are slaves to raster graphics pipeline (they can't manage data expansion, work scheduling / distribution / termination, etc).
In DX RT, you can write your own intersection shaders and probably even overlap these with standard accelerated intersections tests, that's actually much more flexible than current raster pipeline, where the whole pipeline is strictly serialized and controlled by FP hardware.

DavidGraham · Jan 2, 2019

JoeJ said:
All this seems to confirm the match of TitanV vs. 2080Ti performance in BFV as well.

JoeJ said:
and TitanV == 2080ti in BFV really proofs this

The jury is still out on this one, we need to see controlled testing first, preferably with IQ validation as well.

JoeJ · Jan 2, 2019

pixeljetstream said:
I fail to see how research just using compute is blocked.

troyan said:
Turing supports every feature of Volta

In my particular case i already have a BVH of surfels in place, which works fine for GI and rough reflections (for the latter i store 4x4 or 8x8 env maps for each texel of a lightmap and update them in real time, similar to Many LoDs. SH etc. would work as well)
For GI this is all fine: Fully dynamic large scene, Infinite bounces, area lights from emitting surfaces and area shadows all for free. I calculate visibility with RT in compute efficiently and there is no need for RTX here. Algorithm has time complexity pretty much of O

, which is remarkable for GI.
But surfels have a spatial maximum of detail (targeted 10cm for current gen), and env maps have limited angular precision because of low resolution.

So i want some classical RT as well to add high frequency details, mainly sharp reflections, and maybe some hard shadows. (Shadow maps or just living with the soft GI shadows are options too)
I'm optimistic at this point a very big step towards photorealism is possible with current hardware. RTX can do all things well that i can do not, and the other way around.
In the future one might want to add some local high frequency GI, finally my stuff could still serve as a fallback after 2nd or 3rd hit within a pathtracer, and after that my work would be no longer needed.

The natural way to add sharp reflections now would be to use classical RT at triangles close at the origin, at some distance fall back to my surfels because here we could solves the LOD problem as well, and at large distance fall back to the env map.

As it is now, to do so i have to maintain two BVHs, and fallback to surfels can not utilize sharing of my custom data in LDS for multiple rays within a DXR custom intersection shader.
It is thus possible that a compute only solution could handle the tracing more efficiently, even in comparison to RTX (accepting the limitation of 'no custom shaders for triangle hits', which in my case of already available low res shaded textures makes sense.)

But likely i wont try this. Because competing a hardware solution sounds just dump. So, (after swallowing my anger

) i will just use RTX as you intend, and i will discontinue my 'research' in this direction.
In contrast, assuming there would be no RT cores i had to utilize, experimenting with all this would still make sense and i might end up with a solution that is more efficient.

Now seeing RT cores struggle a bit to deliver, i'm frustrated and all that RTX thing appears wrong to me again... it's an emotional issue.
I feel guilty and sorry for the noise. But because game developers make smaller steps while making games, i also feel the need to bring up such topics right now, assuming i would be indeed ahead with GI. (Not sure yet - my need for seamless global parametrization is a big downside - making automated tool is hard, only after that i can test my stuff on real game geometry.)

So i hope the above example makes sense to you. Getting heard is all i want.

But now to another point, and i think this one is more important, not only for me:

We finally need a way GPUs can generate their own work! Please bring this to game APIs. I'm fine with vendor extensions.
Stacking up large brute force workloads as it is common in games does not work for everything.
My command buffers are full of indirect dispatches and barriers that do zero work at runtime and cost sums up.
Async compute can compensate, but the sync between multiple queues is too expensive to make it a win for small planning workloads. (On GCN, never tried on any NV)
Launching compute shaders from compute shaders is what we need.

So whatever you have, please expose it. VK device generated command buffers are nice, but no barriers so it's useless for me.
Seeing one additional shader stage introduced every year but no progress here really is frustrating. (Or did i miss something? I did no GPU work during last year.)

OlegSH said:
Why do you keep repeating this?
DX8,9,10 are different pipelines.
DX RT has DX12 binding model and it's the first time GPGPU controls work creation, data expansion, work scheduling / distribution / termination, etc for fixed function units.

I repeat myself because i never know if people understand what i mean. I'm new here and don't know anyone or what they do, and i'm no hardware expert or enthusiast either.
What i want is this functionality exposed to compute shaders in Vulkan - i would change to DX12 as well to get it. But CUDA is no option, and OpenCL 2.0 neither.
RTX shows there is really fine grained work sheduling under the hood, but i need this to be available in compute.

Shifty Geezer · Jan 2, 2019

JoeJ said:
So i hope the above example makes sense to you. Getting heard is all i want.

It's going to be hard to be heard if you aren't showcasing actual results.

If you have demonstrations showing what you're doing and what you're wanting and how current compute works with it and how future developments could be handled, the discussion would be more than a theoretical feet-dragging.

pixeljetstream · Jan 2, 2019

Thanks. Things are clearer now.
Not sure what this is for, but assuming it's mostly personal project, do "both" and experiment like crazy. You can record your custom hits and shade later in compute and compare.

"It depends" is universally true, so it's unlikely you get a "do this" reply. The benefit of it being new, there is no right or wrong atm

just people experimenting, making experience, and repeat.
Also the work you invested so far, even if slower is not wasteful, you learn a lot by doing it all manually...

If that level of graphics is your passion, if the goal of the project is something else and the raytracing aspect is just a part, I would go with whatever is least amount of work and focus on the other stuff more important to the project.

And don't forget not to pressure yourself too much about right or wrongs. These other developers are pretty big with lots of veterans and yet it takes time to make experience.

BTW I've implemented most of the device-generated extension, so I am happy you like that route. I guess you would prefer something like the GL NV commandlist with indirect tokens for barriers?
But do you really need a lot of runtime generated barrier points?

JoeJ · Jan 2, 2019

Shifty Geezer said:
It's going to be hard to be heard if you aren't showcasing actual results.

But it would be a lot harder to sell after it has been shown in public: Someone might have figured out how it works, and the value decreases

AFAIK this even makes it harder to get a patent, if who ever is (hopefully) interested wants this.

However, just imagine a low poly quake level with moving boxes and metal balls with lightmaps updating in real time... not so impressive anyways.
I have not even a render framework - need to start from scratch with this, after finishing damn geometry processing work...

So i'm just average Joe, not John Carmack

Shifty Geezer · Jan 2, 2019

JoeJ said:
But it would be a lot harder to sell after it has been shown in public: Someone might have figured out how it works, and the value decreases
AFAIK this even makes it harder to get a patent, if who ever is (hopefully) interested wants this.

Understandable, but then you have to accept that discussion will be limited. "I wish nVidia didn't do it this way but I can't talk about it," ain't the subject of great conversation.

Although just a YT video of realtime visuals without any explanations shouldn't give anyone any clues as to how its done.

JoeJ · Jan 2, 2019

pixeljetstream said:
BTW I've implemented most of the device-generated extension, so I am happy you like that route. I guess you would prefer something like the GL NV commandlist with indirect tokens for barriers?
But do you really need a lot of runtime generated barrier points?

Yes sounds good!
I need the barriers mostly for cases like this:
Process one level of a tree
barrier
process the next level (depending on parent data)
barrier and continue with levels deeper down

Mostly the processing does zero work so when generating on GPU the count would reduce a lot already, but i definitely need them (i guess maybe still about 30-50 per frame).
Another, simpler option would be to skip over commands (including the barriers) in the command lists on GPU side. Not so advanced but it would already fix that case.
OpenCL 2.0 is more flexible here, but with the above i would be already happy. CL smells too high level to be hardware friendly anyways.

I don't know if GL NV commandlist supports addressing multiple queues. If it's easily possible that would be nice too - but not so much of an requirement.

Many Thanks!

OCASM · Jan 2, 2019

JoeJ said:
Likely a matter of personal preference and artstyle. I have seen scenes that look pretty close to regular games.
I think the main limitation they have is missing cube maps so no PBR. But this is no tech limitation. Users generating the content probably have no fun placing probes manually.

But it has some interesting features neither rasterization nor current RT can offer:
High frequency diffuse geometry
'Proper' DOF and motion blur (although i have not seen how the noise looks on the real game not YT)

It also supports huge worlds and rapid content generation (but the latter could be used for polys too).

I should have mentioned it earlier - it's really the thing i talk about: No fixed function but same performance than rasterization although more flexibility, and the creative development process going beyond restrictions. According to the paper a trail of failures - that's the spirit!

Dreams uses SDF only as a intermediate data structure for editing. The result is converted to point clouds before rendering, it is a splatting approach.
On PC still not possible efficiently because 64 bit atomics to framebuffer are not exposed. (UE4 dev has requested for this recently - maybe things have changed.) Would be super interesting for foliage but also distant landscape.

Both SDF (sphere tracing) and splatting performance is affected by scene complexity. SDF (like RT) suffers from diffuse geometry, and splatting has the overdraw problem (like rasterization).

In Dreams they solve the grainy output with regular TAA. According to the (outdated) paper it's not perfect and requires more work.

This depends a lot on what you prefer. There are two kinds of people:
Those that want smooth images and dislike high frequencies (like me - i reduce game resolution to 1080p because the image quality becomes better due to smoothing, and nowadays TAA is so good it causes no downscale artifacts)
Those that want crispy and sharp images, 4K screens and textures. (they never agree that CG is usually much too sharp)

I assume the first kind likes splatting more than the latter maybe.

With RTX point clouds maybe become very inefficient because of the 'custom mini acceleration structure embedded in BVH boxes' problem. Would be interesting to know... Maybe they add a point primitive?

The Dreams approach certainly has benefits but performance isn't one of them. You say huge worlds but so far we've only seen small scenes (AFAIK). As you say it's noisy and doesn't support the standard PBR workflow. It works for what it's trying to accomplish but I wouldn't recommend it for the majority of games.

I do agree on the point of softness vs sharpness. Too shap looks fake. In addition to avoiding ultra high resolutions games should support light diffusion effects as well.

milk · Jan 2, 2019

JoeJ said:
So i'm just average Joe, not John Carmack

Joe Jarmack?

Dictator · Jan 2, 2019

Just want to say, that the idea of Titan V being the same performance wise as an RTX 2080 Ti in BFV is a bit problematic right at the start. It goes against all the information we have about the performance advantage the RT core provides for ray triangle intersections - it seems like sensationalist reporting not looking into the specifics to claim to the contrary.

I could imagine the following without testing it since I do not have a Titan V: the Fallback layer for Volta is programmatically and visually different, a direct comparison of its performance is made troubled then. When talking with DICE render devs in August they mentioned then how the game was much faster with RTX and they were used to deving at much lower framerates with Titan V at worse quality (optimisations to the BVH structure and more execessive snapping were there in the gamescom build at the time as a hold-over from deving on Volta).

Also we know from a number of presentations now that the ray/triangle intersection is just one part of the expense of using RT - and differs from scene to scene and effect to effect. Surely we can all imagine a scenario where the greatest performance bottleneck in a given scene is not the ray triangle intersection bit of RT, but the surface shading and denoising, thus shifting away from the importance of the RT core to change performance.

I won't be testing this sadly, but I think GamersNexus might! I hope they look at the visuals on the Titan V as well in comparsion to RTX, that should be neat to see if they really are the same.

EDIT - should I have posted this in the BFV RT thread?

Samwell · Jan 2, 2019

I have a question about the Titan V vs RTX thing in BFV and RT in general. Is the number of rays having an big impact on shading requirements with Raytracing or is it mostly independent, as the shading cost are growing because rays are hitting objects outside screenspace. More Rays don't hit so many more objects outside of screenspace and don't add so much shading cost?

In BF V RT is at a max of 0.4 Rays per Pixel, in the test cases from the user it might even be at 0.1 Rays per Pixel, if there aren't many reflections. So it's not really surprising the RT cores aren't helping much. If we look at Remedys Control numbers, it's 5 ms Titan V vs 1ms RTX with 1 Ray per Pixel. Bad simplification: 0.5 ms on Titan V vs 0.1 ms on RTX for 0.1 Rays per Pixel. This would then lead to numbers, where the impact of RT cores is negligible. But if the shading cost aren't changing so much, they might increase the number of Rays a lot for RTX.

Dictator · Jan 2, 2019

Samwell said:
In BF V RT is at a max of 0.4 Rays per Pixel, in the test cases from the user it might even be at 0.1 Rays per Pixel, if there aren't many reflections. So it's not really surprising the RT cores aren't helping much. If we look at Remedys Control numbers, it's 5 ms Titan V vs 1ms RTX with 1 Ray per Pixel. Bad simplification: 0.5 ms on Titan V vs 0.1 ms on RTX for 0.1 Rays per Pixel. This would then lead to numbers, where the impact of RT cores is negligible. But if the shading cost aren't changing so much, they might increase the number of Rays a lot for RTX.

A nice way to look at it I think and a good point. Also that would mean pumping up the resolution would increase ray count and then change the importance of the RT core. Maybe there are similarities of performance at lower resolution/lower ray count.

Scott_Arm · Jan 2, 2019

https://www.3dcenter.org/news/raytr...uft-mit-guten-frameraten-auch-auf-der-titan-v

Looks to me like Titan RTX beats TitanV quite easily in the reflection heavy maps (rotterdam). It's about 40 or 50% faster though these benchmarks are pretty simple.

I guess that goes back to the question of what BFV is actually doing. How much of the % difference is from shading performance. Would be nice to see both cards with RTX off to get a baseline performance difference.

Spec-wise, Titan RTX does not have any major performance advantages for general gaming except a huge amount of memory, that I'm not sure BFV could even take advantage of. TFlop/s and bandwidth are roughly equal.

Edit: In a roundabout way, this may actually explain what BFV is doing. If a map were primarily casting screen-space rays, you would expect performance to be roughly equal. The performance divergence suggests that there is a significant burden of DXR rays on-screen in Rotterdam map, even after the patch.

DavidGraham · Jan 2, 2019

Scott_Arm said:
Looks to me like Titan RTX beats TitanV quite easily in the reflection heavy maps (rotterdam). It's about 40 or 50% faster though these benchmarks are pretty simple.

These are user generated tests still, and they only use High DXR, I believe at Ultra DXR, and at bigger resolutions than 1080p, the difference would be higher.

Scott_Arm said:
How much of the % difference is from shading performance. Would be nice to see both cards with RTX off to get a baseline performance difference.

Notice the Titan V in question is running OC'ed to 2.0GHz, which gives it 20TFLOPs. Titan RTX @2.0GHz gets you 18.4 TFLOPs.

Next gen lighting technologies - voxelised, traced, and everything else spawn

pixeljetstream

JoeJ

Shifty Geezer

uber-Troll!

troyan

pixeljetstream

OlegSH

DavidGraham

JoeJ

Shifty Geezer

uber-Troll!

pixeljetstream

JoeJ

Shifty Geezer

uber-Troll!

JoeJ

OCASM

milk

Like Verified

Dictator

Samwell

Dictator

Scott_Arm

DavidGraham

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else *spawn*

uber-Troll!

uber-Troll!

uber-Troll!

Like Verified

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else spawn