Gaussian splatting

snarfbot

Regular
Supporter
I have been seeing alot of youtube videos about this popping up on my feed. How would a game using this tech provide for animated or interactive assets or even dynamic lighting?
 
How would a game using this tech provide for animated or interactive assets or even dynamic lighting?
We would need to add a surface normal to each splat so we can calculate lighting, and we would need to remove the 'baked lighting' the current approach captures from reality.
Likely we would work on a conversation tool first, to convert traditional textured triangle models to splats, to make this easier.
This would be very interesting, because currently it's hard to tell how much of the impressive image quality comes from capturing reality vs. the rendering method.

Surely we also need a hierarchy of splats so we can reduce the detail. The paper does not have this iirc., they rather just optimize splat distribution to capture all details present in the input photos.
So both tools and rendering would become more complex to us.
After that we can cache our lighting per splat, so we don't need to relight our splats every frame. We get the advantages of texture space shading although we don't use textures anymore.

But we may need to optimize the rendering method. Currently the bottleneck is a global depth sort of all splats per frame. I would try to sort larger clusters of splats instead, and eventually do some weighted OIT hack instead strict back to front rendering per cluster. Not sure about that, so that's the very first thing i would work on.
But if it works, we can now do transparency and we have no aliasing. No more need for TAA or upscaling, and we have a stable hq image.

Really promising and interesting. But it's still an extremely expensive form of splatting. Let's compare with the cheapest alternative i have in mind, accepting quantization:
Instead using big ellipsoid shaped splats (which already requiers a 3x3 matrix or a quaternion plus 3 floats for nonuniform scale), we can use pixel sized splats and use a Z-buffer to find one winning splat per pixel.
After we have rasterized all splats, we can do analytical AA as a post process by looking at the splats from a 3x3 image space kernel in world space. I have tried that and it works pretty well. No need for sorting or OIT.
But the problem is popping along depth because only one splat can win a pixel, so we need TAA to smooth that out. We also did nothing to tackle the transparency problem.
This should be much faster and may be good enough. Combination with traditional rendering also is a bit easier. But overall a similar concept to the paper. We still need a hierarchy for LOD as well.

But of course our rendering method is now very revolutionary. We have no use for ROPs and HW texture filter, and attempting to utilize RT cores would not make much sense either.
To use RT, likely we would use a lower detail triangle representation of the scene as well. But we need triangle representations for physics anyway, so maybe that's not much extra work.
Too me it feels very tempting to try out. But once GPUs get better at rendering small triangles, and i expect this to happen, a lot of the motivation to work on this fades away.

Animation would be trivial i think. I don't see any new problems or opportunities here.

EDIT: I forgot to mention another potential advantage of the transparent gaussian splats: We could do proper DOF and motion blur, bu resizing and stretching the splats.
If they become larger and cover more pixels the rendering becomes slower, but still.
So we have transparency, DOF and MB, world space lighting and caching, and anti aliasing. All the stuff we never got to work properly before. Really interesting to us, despite the revolution.
 
Last edited:
It's point splatting, more or less, and has been researched for a long time, what version might be best is complicated, as JoeJ's super long post points out.

Ideally you'd store this in a 2d texture and it would represent a surface, you really don't want to pay the memory/bandwidth cost here when you can use dimensional collapse and GPU compressed formats (BC/ASTC) to remove unnecessary cost.

The interesting thing is you can then embed these textures in say, a tetahedron, basically a texture in a volume. You can then raster, even if it's compute raster, OOBBs, then tet meshes, then splat/"surf" (however you get the data structure onscreen) for primary view, except with tet meshes you can stretch and warp the tets using standard skinning methods, and the embedded textures stretch and warp accordingly, and you've just solved skinned meshing for non triangle based primary visibility. The same goes with raytracing for secondary visibility, we have fast box tests/etc. now, so you construct your (imperfect as it is) BVH, then enclose the tets in it, then box test through the bvh, triangle hit into a tet, then splat/whatever from that. This should skip the biggest slowdown (other than highly divergent shading) which is testing a ton of triangles in RT, because all you have is very low triangle tets.
 
this blog posted was linked in Jendrik Illner's weekly compedium this week:

Gaussian Splatting is pretty cool!

SIGGRAPH 2023 just had a paper “3D Gaussian Splatting for Real-Time Radiance Field Rendering” by Kerbl, Kopanas, Leimkühler, Drettakis, and it looks pretty cool! Check out their website, source code repository, data sets and so on (I should note that it is really, really good to see full source and full data sets being released. Way to go!).

I’ve decided to try to implement the realtime visualization part (i.e. the one that takes already-produced gaussian splat “model” file) in Unity. As well as maybe play around with looking at whether the data sizes could be made smaller (maybe use some of the learnings from float compression series too?).

[...]
 
A game would have to ditch RTX to use this, so it's never going to happen even if it makes sense :p

Some papers I'm reminded of which they didn't cite :
Approximations for the distribution of microflake normals
Deep Appearance Prefiltering
 
I think binging all of Lower Decks must be affecting my brain. When I read "Guassian splatting" I thought it was some kind of kinky alien porn. 😶
 
Can someone translate Guassian splatting into idiot so I can understand it.
<iframe width="560" height="315" src="
" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
Thanks, maybe Google Earth can use this as some of it is a bit ropey especially trees
 

Attachments

  • 1726343082177.png
    1726343082177.png
    744.1 KB · Views: 4
Question : can you export 3d models with this to be used in a game ?
To my knowledge, not really anything that you would not be able to do already with the source imagery (that is, if you ran through Metashape or another photogrammetry workflow for a textured mesh, without 'de-lighting' effort; or to generate an intermediate dense point cloud).

It generates a sparse cloud of camera poses with Colmap prior to the training step.

For anything non-Lambertian, photogrammetry tends to fail pretty badly - so that is one nice feature of this technique. It also seems to give a more aesthetically pleasing result with sub-optimal input (e.g. I tested with digitized Video 8 frame sequences from the 1990s, where by luck, there was camera motion that allowed some structure from motion - I wish I could have told my childhood self to capture with this in mind...)

Again, I am a historian - so I don't have much of a command of how it works. However, I do wonder if there would be ways of encoding the view dependent properties in texture maps.

I remember an interesting technique, the Unstructured Light Field Renderer, that worked from Metashape camera poses and a generated geometry proxy that was fascinating for this - a sparsely sampled kind of light field or lumigraph (though still working with some underlying mesh data). The very old PTM approach also was interesting in that respect (though I think, insofar as I understand it, that just an early type of material capture, as is done now for most high fidelity game engines).
 
Last edited:
Whether there is anything gained by using the splats somehow as an intermediate - I'm not sure.

When using Instant NGP (NeRF), I found the mesh export from the NeRF much less faithful than just running same images through Metashape (probably would be the same with Meshroom and Reality Capture).
 
The above example was about 300 JPEGS captured in maybe 10 minutes with a smartphone, and resampled to, I think, 900p (the training step is very VRAM intensive - and 12GB is about half of what the published work had available for training - I'm using a 3060).

It is a better result that you would get with photogrammetry in that sense, even putting aside the lighting environment capture / reconstruction element.

I guess my thought on game engine export is - if you have that underlying image data used for reconstruction, presumably there might be ways of estimating / approximating material properties and lighting environment without the Gaussian Splatting portion.
 
Last edited:
Back
Top