Polygons, voxels, SDFs... what will our geometry be made of in the future?

BTW, up to a point. You can't separate them out. A highly variable latency instruction for a shader ... feels wrong.

PS. oops, I guess you could just detect leaf nodes in the shader and limit triangle count during build. So it doesn't really matter.
I am not sure I follow you here, but intersecting a triangle or an AABB should not be any worse, latency wise, than issuing a texture sampler instruction.
AFAIR RDNA2 sends the ray and AABB/triangle data to the HW intersectors, so it's not like the latter have to fetch anything from memory anyway (unlike when we are sampling a texture).

The main problems with this approach is that traversal doesn't like SIMD/SIMT and that there is a fair amount of state that needs to be moved around from the shader core to the intersector.
When you keep traversal in one place (as in a dedicated unit) there is not as much state being moved around and you can go MIMD on traversal.
It's not the only way to do it though. I believe the latest RT material from IMG advocates for a different approach, where IIRC traversal is constantly re-converged and SIMDfied, so that it might not require MIMD HW to be efficient.
OTOH that re-ordering might need to shuffle more state around..
 
Texture sampling ideally has identical latencies across threads. Testing an AABB or 1 triangle or 4 triangles will fundamentally never have the same latency, but I didn't really think about it, since you don't have to use it like that. The shader knows when it hits a leaf node and the BVH builder can keep triangle counts equal.

As I said in the other thread, I wonder if it makes sense to go (limited) breadth first and just pull in a new ray from a queue every time a leaf node is struck so the entire SIMD can keep traversing non leaf BVHs. Only testing a full workgroup of leaf node rays when available.
 
Hmm...
since divergence is the major problem for RDNA2, I wonder if you could approximate cone tracing through BVH. Screen space cone tracing is a thing so the same trick of using SS to get out of the initial dense geometry near the surface could work. Once the cone is launched into the BVH scene AMD's already tracked relative cone width along its rays, and while they tried BVH occlusion and get severe overdarkening there's been ideas around pre-filtered data being stored in BVH. Track overall filtered opacity in the BVH of all children, same as EPIC does for foliage through its SDF.

Not sure how you'd get radiance, filtered volumetric surface radiance in a given direction is a longstanding problem, easy to get light leaks. If you stored it in a basis function you'd have to update and mip the basis function with every lighting change.
 
Device side sheduling isn't exposed either, although every modern GPU has it (call it device side enque, dynamic parallelism, or whatever).
How can it be we get RT and mesh shaders over night, but still no proper way for GPU to generate its own work without helping hands from CPU?
I was reading
and I was kind of amazed at the speed they could re-queue from CPU ... it's not fast enough, but it's so fast I wonder if there isn't a third option apart from doing it right with device side enqueuing or doing it wrong with persistent threads. Could you do it wrong by just spamming kernels which kill themselves if there's not enough queued for them? :)

Megakernels tie up worst case register amount, even when only a small part of the megakernel will need it, which in retrospect is terrible for hiding latency.
 
Last edited:
Somewhat lengthy interview with two of the Atomontage guys.


TLDR; Atomontage "is no longer primarily aimed at the gaming market". Not a surprise, really, since the tell-tale signs have been there for quite a while now.

So are you actually selling a game engine like Unreal?

Daniel Tabar: That was our first idea when we started the company. We were building a traditional game engine based on voxel graphics. That could itself be an interesting product, but we were soon approached by some really big and important companies to test our technology on their projects. This accelerated something that was only in our long-term plan - streaming voxels to the user from our servers. We realised how powerful a tool this could be and decided to pivot. We also introduced our product to John Carmack, the gaming legend behind the Quake and Doom franchises, who warned us about having game studios as primary customers. They are very budget sensitive and rarely risk using new technologies, because they risk quite a lot with every new game. Selling them a whole new way of creating 3D graphics wouldn’t be impossible, but it would certainly be difficult.

Branislav Síleš: We saw how much more powerful it would be to build a cloud platform where everything is available as a service. When you upload your 3D-projects to the cloud, their size and complexity suddenly stops being important. People view them via a link and enjoy the experience in mere seconds. There is no need for special devices, apps and downloading large amounts of data. We knew we had to take this route.
What is your vision now?

D: Our goal is to make sharing and building interactive 3D-worlds easy for both individuals and enterprises. A great example of this is Roblox, a native cloud-based game creation platform that was recently valued at more than Unity and Epic games combined. Half of all American teenagers have a Roblox account, but it still uses traditional polygon graphics. Our vision could be summed up as Roblox + Minecraft + a higher resolution, with the secret ingredient being our technology for highly efficient compression without content loss.
B: It's like YouTube, where you could upload a video that’s an hour long, but once you upload it to the cloud, nobody cares how big the file is or what format it was in, everyone can see it right away. That's the experience we're offering for viewing 3D content and soon for interacting with 3D content. No installations, no downloads. Extremely detailed 3D objects available in seconds, even on a mobile phone. That, in a nutshell, is our product.
 
The quality of Atomontage's streamed geometry is dire though. Video streamed Nanite looks a trillion times better. Video-streamed voxelisation would probably also be better rather than streaming voxel data for local rendering.
 
You lose the build in occlusion culling inherent in raytracing so not really the way forward, but the way forward is held back with current hardware and APIs so nice for now.
 
I think I'm coming over to the dark side ... finding a good analytical approximation to prefiltering is just too damn hard, it's time to throw your hands up and just use MLPs. There's a couple recent papers touching on this, amusingly one from NVIDIA. It suits their tensor cores, but for actual rendering RTX would only be useful for empty space skipping and that would require very tight coupling with the shaders for it to be a win.

Deep Appearance Prefiltering was from research supported by Facebook, Instant Neural Graphics Primitives with a Multiresolution Hash Encoding from NVIDIA.

PS. since no one has done the taxonomy yet, I'll just name this approach noxel based rendering. Meta's approach is a really nice archetype, though the use of an uniform beam approximation seems to limits its use for GI, a cone would seem more efficient when you need only an extremely coarse approximation (still with the coverage mask). Maybe the neural model could even serve for anisotropic filtering that way.
 
Last edited:
Nanite and raytracing are a nail in the coffin for fixed function rasterization, but HVVR was already a sign of the future long before that, yet mostly ignored.

With sample reuse especially for VR, hybrid rendering has become a giant roadblock. The lack of ability to render truly sparsely (not just at lower shading resolution, but with partial screen updates) stands in the way of both efficient stereoscopic rendering and frameless rendering for high refresh rates ... and caused the ugly hack of frame interpolation.

With sparse rendering there would be no reason for interpolation, you could either compute silhouettes and render an extra layer with likely disocclusions or just render disocclusions on the fly during asynchronous timewarp (to use the VR name). Either way the remaining artifacts of extrapolation would likely be irrelevant.

With sparse rendering computing stereoscopic views with sample reuse between eyes is also easy.

Compute is all we need ... and ray tracing hardware should be used I guess since it's there. Fixed function rasterization however is long past its use by date.
 
On the one hand, impressive data manipulation. Although we could do with memory and storage access figures. On the other hand, it sure is ugly and I wouldn't want a play a game looking like that. I won't want to give up the improvements in game pretties achieved with UE5 etc. to have insanely large worlds you could never visit all of full of ugly blocks. Strikes me as tech solving a problem we don't really have.
 
Back
Top