When is the last time the fixed function rasterizer blocks have seen any notable improvements?
I was genuinely curious because it seems like not much has been done to improve that aspect.There have been a few improvements over the years aside from the massive increase in throughput - increased granularity (currently 16 pixels), conservative raster, VRS etc.
Nanite doesn’t use the HW rasterizer which was my point. I don’t get the point of your question though.
That's basically what i'm doing, but in practice your basis function always lacks support for sharp reflections and hard shadows. I see two opitons:Sure there's alternatives! Throw a high enough order basis function at virtual point lights and you've got your alternative, and you've got a full material representation that follows any brdf and goes over the entire roughness range.
Yeah, if we again imagine to replace triangles with pixel sized points, that would become a problem.Hardware raytracing is really only useful if you have a standard triangle only representation.
Yes. There is no good an simple solution for everything. Complexity will only increase, many options have to be explored. Thus flexibility > performance.But it is hard to get around severe performance limitations, thus the need for some sort of hybrid pipeline.
To be clear: Now that we have HW RT, i want rordering in HW too. It's the only option to help with RT performance. Not only for tracing, but also hit point shading and custom intersection shaders. So we will get it for sure at some time.It doesn't necessarily close the door if this happens, but it would certainly put some roadblocks in place on PC.
You can do it in compute or other shaders by yourself, in fact, reordering by material ID has already been implemented in Battlefield 1, UE4 and UE5 for hit point shading, so you can even test it.Now that we have HW RT, i want rordering in HW too. It's the only option to help with RT performance. Not only for tracing, but also hit point shading and custom intersection shaders. So we will get it for sure at some time.
Might have been nice if you'd have done a Google search or perhaps just checked earlier in this thread before suggesting people are making things up.
https://forum.beyond3d.com/posts/2186917/
What i mean with reordering is doing it in the traversal loop, not only for launching rays or hit points.You can do it in compute or other shaders by yourself
Because sorting and binning has so many applications, i do think it might be worth to have HW just for that. It's little ALU and much BW, but so is raytracing.The sorting itself seems to be rather cheap on SMs, so I am not sure whether dedicated HW or instructions will benefit a lot or justify area cost.
Maybe I’m not following but AMD already imposes no limits on how things are done. Game engines are free to implement pure compute based pipelines as evidenced by Nanite and Lumen.
Investments in hardware accelerated paths won’t close the door on improvements to general compute. We’ve had continuous improvements on both fronts since GPUs have existed.
Insightful post. In general I would always champion the less performant but more general programmable solution. Amazing things happen when software finds a way to use hardware creatively in a way that the original hardware designers never intended.<snip>
I would not say the rasterizer has any limitations.The rasterizer has been relatively unchanged (minor improvements in speed and some flexibility) and has limited what is done in 3D rendering.
I have not seen much innovation in RT games.There is only adoption of known practices with obvious realtime optimizations and compromises.The question becomes, will NV's hardware acceleration implementation lead something similar to rasterization of triangles leading to relatively little experimentation outside of what could be rasterized? Or will it be the catalyst for increased innovation and experimentation WRT RT beyond what the NV hardware is capable of accelerating?
Although i'm loud with requesting flexibility, i do not have expectations about 'experimentation' as you say. I mean, raytracing is just that - intersecting rays with triangles. The common saying 'rays transport light physically accurate and like in the real world' is already putting on things to it, and it's bullshit too. It's just a way to test for visibility, and usually we use it to solve the visibility term in the rendering equation. The other interesting bits like shading, integration, etc. have nothing to do with RT.Perhaps there's a better way to do things. But if you are locked into doing it a certain way because of hardware support, then no experimentation is likely to be done to find that. And that experimentation might have led to better or more efficient ways to do it in hardware. Perhaps the original design choices weren't the best. However, because it was early and the most performant, the possibility is there that the entire industry would get locked into it.
I know one example:IMO, specific hardware acceleration works best if it enables something that wasn't possible before (like hardware support for vertex and pixel shaders) which then leads to implementation in more generalized and flexible form (universal shaders which lead to general compute shaders). Perhaps at some point experimentation is basically exhausted and there is no longer a need for much experimentation and it's beneficial to do specific hardware support for something again...
Yeah, if we again imagine to replace triangles with pixel sized points, that would become a problem.
Though, RT is expensive, so constructing (eventually lower resolution) geometry just for RT is eventually justified.
Personally i have that surfel hierarchy, and something like that could be used for point rendering. I also use it for raytracing, where each surfel represents a disc, which is a bit less data than a triangle, and also avoids the problem to address 3 vertices scattered in memory.
So do i think such surfels would do better for 'insane detail raytracing' than triangles?
Maybe. But discs can't guarantee to have a closed surface. Rays could go through small holes, so it's imperfect.
Exactly, it's a great example of where a general "software" (in this case compute) solution is superior to fixed hardware support.
But if you are locked into doing it a certain way because of hardware support, then no experimentation is likely to be done to find that.
IMO, specific hardware acceleration works best if it enables something that wasn't possible before (like hardware support for vertex and pixel shaders) which then leads to implementation in more generalized and flexible form (universal shaders which lead to general compute shaders).
If you want to replace surface triangles with SDF, you also need a 3D texture for material (at least for UVs). So two volumes. That's much more RAM. And you constantly need to search for the surface. SDF is hard to compress (maybe impossible), because doing something like an octree with gradient per cell breaks the smooth signal and resulting small discontinuities break local maxima search or sphere tracing methods i guess.I mean, that's the thing though. Dreams works. Sebbi's unlimited detail SDF tracer works, for that virtualized level of detail they already work. Replacing triangles all together works. It's why I find SDFs so fascinating. They work for everything apparently, except very very thin representations, that does need to be worked on. There's long been work showing they're much faster for physics than triangles, now they're showing they're faster than triangles for indirect illumination, as Lumen shows faster performance in software mode than in RT for complex cases that you'd actually find in games, and that's on a 3090 where hardware RT should be fastest compared to compute.
I'd really like to know how collision detection between 2 SDF volumes works. Never came across an algorithm. Could be interesting...But the real thing I'm thinking about is, it's all the same math. Underlying indirect illumination and direct and physics and animation and etc. it's all just querying and moving surfaces. The idea of exploding complexity is based on the idea that different representations are ideal for different tasks, but I don't see that at all. You're doing the same fundamental thing, all the time, for everything. Collision of bodies is no different from collision of light rays, or sound, or ai visibility.
I share this detail amplification ideas and will try something here. My goal is to move 'compression using instances' from a per object level to a texture synthesis level. Requires blending of volumetric texture blocks, where SDF is attractive. Instead duplicating whole rocks, we could just duplicate a certain crack over rocky surfaces. LF repetition vs. HF repetition. Surely interesting but very imperfect. Can not do everything, so just another addition to ever increasing complexity.Thus, SDFs with UV maps for applying materials, and maybe you can raymarch a 2d signed distance field along the normal for amplification via an SDF texture. This can result in what is essentially tessellation in a data efficient manner. And is both how artists like to work, is incredibly flexible and compressible since you're querying the same material across a project rather than trying to compress everything as an individual instance like nanite does, and is still the exact same basic data structure so can be queried by anything you want. I'm not even sure you need surfels, what's the difference between a surfel after all and a volumetrically mipmapped implicit surface? They're the same thing as far as I can tell.
Unfortunately. We already have too much complexity, e.g. thinking of games using thousands of shaders.I'm not sure complexity needs to explode, and I'm not sure it's even efficient to do so, is what I'm ultimately getting at.
The information above also gives us an insight into cluster size (384 vertices, i.e. 128 triangles), a suspicious multiple of 32 and 64 that is generally chosen to efficiently fill the wavefronts on a GPU. So 3333 clusters are rendered using the hardware, and the dispatch then takes care of the rest of the Nanite geometry. Each group is 128 threads, so my assumption is that each thread processes a triangle (as each cluster is 128 triangles). A whopping ~5 million triangles! These numbers tell us over 90% of the geometry is software rasterized. For shadows the same process is followed, except at the end only depth is output.
One of Nanite’s star features is the visibility buffer. It is a R32G32_UINT texture that contains triangle and depth information for each pixel. At this point no material information is present, so the first 32-bit integer is data necessary to access the properties later.
The material classification pass runs a compute shader that analyzes the fullscreen visibility buffer. This is very important for the next pass. The output of the process is a 20×12 (= 240) pixels R32G32_UINT texture called Material Range that encodes the range of materials present in the 64×64 region represented by each tile.
We have what looks like a drawcall per material ID, and every drawcall is a fullscreen quad chopped up into 240 squares rendered across the screen. One fullscreen drawcall per material? Have they gone mad? Not quite. We mentioned before that the material range texture was 240 pixels, so every quad of this fullscreen drawcall has a corresponding texel. The quad vertices sample this texture and check whether the tile is relevant to them, i.e. whether any pixel in the tile has the material they are going to render. If not, the x coordinate will be set to NaN and the whole quad discarded.
You could have just searched, you know. Simply pasting the link into the search box and pressing enter would have yielded a few results. The earliest post with that link in this thread is from May 30...I found this write up to be illuminating. Apologies if it’s a repost.
http://www.elopezr.com/a-macro-view-of-nanite/
My video should be going live today at 17 and at one point I mention a visual artefact with Nanite that I have seen no one else post about before - I do wonder if it is a feature of how nanite functions or if it is just an issue in the current EA version of UE5. I would be curious to hear what people think could be the cause! Essentially there is a some shuffling of nanite when the camera changes position, not anything like tessellation boiling or a discrete LOD shift, but more as if the world pieces shuffle into place as the camera comes to a rest. Unfortunately that is the best way I can describe it - it needs to be seen in video form really.