Predict: Next gen console tech (10th generation edition) [2028+]

They aren't enabling RT shadows with virtualized geometry and this 'feature' you keep talking about isn't documented at all yet as it is still in development/experimental ...
They are enabling it, they just announced that in their presentation, and yes it's still under development, I never said otherwise.
 
Well how can they 'enable' a feature that's still under development or experimental then ?
by checking the tick box in UE5? there's a number of "experimental" functions in UE5 that can be enabled, they just don't recommend it for production yet
 
Well how can they 'enable' a feature that's still under development or experimental then ?
So now we are going to argue semantics? You know Epic is developing this, yet you ignored this fact during your arguments about possible future techs, and concentrated only on the present. That's no way to conduct a fuitful truthful discussion.
 
So now we are going to argue semantics? You know Epic is developing this, yet you ignored this fact during your arguments about possible future techs, and concentrated only on the present. That's no way to conduct a fuitful truthful discussion.
If it's undocumented and they aren't officially shipping/supporting the feature then it may as well have "not existed" just like their prototype SVOGI implementation. Until then it will be viewed as such ...

It also sounds like the feature is going to be totally locked behind the hardware ray tracing settings and given the severe performance implications, games are not likely to use it anytime soon whenever it materializes ...
 
The locality problem is down to the pointer indirection of traversing the different nodes in the hierarchal data structure
Pointers are not a problem as long as the same cached nodes are traversed by bundles of rays, so it all depends on coherence, as I said before. With traditional rasterization, you may need even more bandwidth, given how overdraw is pretty much unavoidable. And be assured that accessing random screen locations with something like screen space GI will trash your caches for the very same reasons it does with RT GI. Check out the Unigine Superposition benchmark, for example.

and any 'recursive' methods you hint to ray tracing can't be used to support PBR materials since makes the assumption that materials with the diffuse property doesn't exist ...
I suppose by "recursive" you meant the Whitted's RT? That's not what I had in mind. For PBR materials, you can use the same tricks with RT as with rasterization - render dynamic cube maps with RT, or sample the cubemaps with RT, treat all reflections as absolute mirror ones like planar reflections do, etc., but with less overdraw or the tiny triangles low utilization issues.

arbitrary camera viewpoints (differing x/y/z coords) which is needed for multiple planar reflections so it's a flawed design from the outset ...
This one was introduced in Turing. Google the multi-view rendering, but it also have had zero adoption in games as far as I can tell.
 
Pointers are not a problem as long as the same cached nodes are traversed by bundles of rays, so it all depends on coherence, as I said before. With traditional rasterization, you may need even more bandwidth, given how overdraw is pretty much unavoidable. And be assured that accessing random screen locations with something like screen space GI will trash your caches for the very same reasons it does with RT GI. Check out the Unigine Superposition benchmark, for example.
The pointers themselves are not the problem but it's the associated multiple levels of indirections that thrashes the cache hierarchy. It's like an indirect texture atlas in virtual texturing where virtual page table mappings are translated to their physical page mappings (2 texture fetches) but only with ray tracing it's much worse where you're often fetching a non-trivial arbitrary number of nodes in a hierarchal data structure with variable depth ...

Graphics literature invented "deferred rendering" for a reason to specifically address overdraw costs and I don't advocate using secondary views to rendering GI either as I believe there's other more suited data structures (particularly SDFs) for this purpose in mind ...
I suppose by "recursive" you meant the Whitted's RT? That's not what I had in mind. For PBR materials, you can use the same tricks with RT as with rasterization - render dynamic cube maps with RT, or sample the cubemaps with RT, treat all reflections as absolute mirror ones like planar reflections do, etc., but with less overdraw or the tiny triangles low utilization issues.
But why render dynamic cube maps at all with RT when it's easier to just render the scene directly itself with RT since you would only have one data structure to deal with instead of juggling two (including the dynamic cube map) ?

Using RT to sample the pre-baked static 'cubemap' for offscreen geometry makes sense since your only data structure is a static BVH (no update costs) that ALWAYS consists of 12 triangles (the cubemap in this instance) but this technique clearly can't be used to support dynamic lighting or geometry since statically defined data structures can't be used to represent information in real-time domains and it only works if you keep the reflectors in those same 'bounds' (fixed position) ...
This one was introduced in Turing. Google the multi-view rendering, but it also have had zero adoption in games as far as I can tell.
That's great that Turing supports rendering upto 4 views (3 planar reflectors views with 1 view taken up by main camera) with arbitrary positions but it would be just as helpful to support sparse rasterization and depth buffers too so we can apply any early out optimizations for sections of our secondary views like not rasterizing geometry that's outside of our arbitrarily bounded or partially occluded (by main view) planar reflectors to our irregularly populated depth buffer ...
 
but it's the associated multiple levels of indirections that thrashes the cache hierarchy
Again, caches work with frequently used data, not indirections or pointers, so whether something trashes caches or not depends on how frequently the cached data is reused. For coherent BVH accesses, it should be reused just as frequently as frame buffer accesses. So it's the data access patterns that matter. Random access patterns cause cache thrashing, not the pointers or indirections.

but only with ray tracing it's much worse where you're often fetching a non-trivial arbitrary number of nodes in a hierarchal data structure with variable depth ...
Why do you think it should be better with random sampling of screen locations with screen space effects or SDFs? Without BVH or any other hierarchical acceleration structure, it can only get worse.

But why render dynamic cube maps at all with RT when it's easier to just render the scene directly itself with RT
For the same coherency as in rasterization, you can do the dense mirror tracing of the cube's sides and then use it for the integration of the cube map with PBR, which would provide the coherent memory accesses for reflections you're looking for and the compatibility with PBR, which would otherwise require stochastic tracing.

with arbitrary positions but it would be just as helpful to support sparse rasterization and depth buffers too so we can apply any early out optimizations for sections of our secondary views like not rasterizing geometry that's outside of our arbitrarily bounded or partially occluded (by main view) planar reflectors to our irregularly populated depth buffer ...
Sounds like a too complex thing to make a specific path in hardware (which you were against previously for RT), while Quake II RTX, Minecraft RTX and Portal RTX have shown that you can easily create many recursive portals with mirror reflections using RT.
 
Last edited:
Again, caches work with frequently used data, not indirections or pointers, so whether something trashes caches or not depends on how frequently the cached data is reused. For coherent BVH accesses, it should be reused just as frequently as frame buffer accesses. So it's the data access patterns that matter. Random access patterns cause cache thrashing, not the pointers or indirections.
How do you realistically cache a data structure that potentially contains millions of triangles for secondary rays. A game with last generation assets like CP2077 can potentially take just under 20(!) memory indirections from TLAS to BLAS just to get to the real geometry ...
Why do you think it should be better with random sampling of screen locations with screen space effects or SDFs? Without BVH or any other hierarchical acceleration structure, it can only get worse.
Since geometry information is already encoded after G-buffer generation, there are exactly zero memory indirections to read this information!
For the same coherency as in rasterization, you can do the dense mirror tracing of the cube's sides and then use it for the integration of the cube map with PBR, which would provide the coherent memory accesses for reflections you're looking for and the compatibility with PBR, which would otherwise require stochastic tracing.
How does that save you any work when you still have to maintain/traverse against a full featured data structure that is both dynamic and represents actual scene geometry ? Ray tracing against pre-baked/static cubemaps (purely a static world space look up table at that point) means you don't have to maintain dynamic data structures or traverse it's complex scene geometry representation ...
Sounds like a too complex thing to make a specific path in hardware (which you were against previously for RT), while Quake II RTX, Minecraft RTX and Portal RTX have shown that you can easily create many recursive portals with mirror reflections using RT.
I'm willing to take my chances to straight up just double/triple raster/ROP resources if the architects can't apply any clever HW optimizations. That's probably still going to take up less die area/fewer logic than adding in tons of special RT accelerated states, on-die caches, and AI HW ...
 
I'm willing to take my chances to straight up just double/triple raster/ROP resources if the architects can't apply any clever HW optimizations. That's probably still going to take up less die area/fewer logic than adding in tons of special RT accelerated states, on-die caches, and AI HW ...

Even if that were true it would deliver worse results. This whole debate is kinda pointless if we’re targeting different quality levels with each proposed solution. The question is what’s the best solution for a fixed quality target.
 
Even if that were true it would deliver worse results. This whole debate is kinda pointless if we’re targeting different quality levels with each proposed solution. The question is what’s the best solution for a fixed quality target.
The original intent of real-time rendering was to never solve every problem or to ever approach offline quality rendering so we should take these liberties in mind of crafting specialized solutions for individual obstacles. There's nothing invalid about our specific solutions at hand not being trivially extendable or tailored for a unified model ...

Given that a viable design is now presented for multiple planar reflections, I don't suppose my alternative suggestion will be noticeably if any slower in comparison to ray tracing and we can go one step further in terms of optimization and apply the principle of spatial adjacency for rasterized secondary views to do software variable rate shading for them. If we need more speed ups or savings then we can consider combining the technique with SSR, view-space interpolation (frame generation just for reflections), or we could experiment with object space shading too ? (I wonder how hard it'd be for RT to integrate some of these concepts too)

There'd be more freedom in terms of optimization strategies/options available outside of ray tracing purely besides the choice of data structures ...
 
The original intent of real-time rendering was to never solve every problem or to ever approach offline quality rendering so we should take these liberties in mind of crafting specialized solutions for individual obstacles. There's nothing invalid about our specific solutions at hand not being trivially extendable or tailored for a unified model ...

Ok let’s assume we’re aiming lower than offline rendering quality. What limitations do you find reasonable when evaluating different approaches? With shadow maps for example even the most advanced techniques are limited to a handful of shadow casting lights. Same for reflections, what limits are reasonable in the next console generation for the number of reflective planar surfaces visible to the player at once?

I don't suppose my alternative suggestion will be noticeably if any slower in comparison to ray tracing and we can go one step further in terms of optimization and apply the principle of spatial adjacency for rasterized secondary views to do software variable rate shading for them.

I would take that bet since rasterization of multiple planar surfaces is inherently not very scalable. With a sufficiently complex use case it would perform very poorly. Just as shadow mapping with lots of lights is still untenable today after many years of shadow map R&D.
 
Unreal Engine 5 (arguably the poster child for raster tech), doesn't really give solid performance compared to hardware ray tracing, in contrary, it performs worse than the RT solutions while also presenting worse image quality (worse shadows, worse global illumination, and worse reflections).

This fact alone prompted Epic to invest more into hardware Lumen to keep up the pace with the other hardware ray tracing solutions.
 
It’s not just raster. The structure of the Lumen surface cache also has some fundamental limitations in terms of resolution and compatibility. Wouldn’t be surprised if it transitioned to something more dynamic like Frostbite’s surfels or Nvidia’s SHARC. I can’t recall why Epic went with the pre-generated surface cards in the first place.

Even if Lumen switches to hardware RT for visibility sampling those underlying issues with the cache structure will still remain.
 
Ok let’s assume we’re aiming lower than offline rendering quality. What limitations do you find reasonable when evaluating different approaches? With shadow maps for example even the most advanced techniques are limited to a handful of shadow casting lights. Same for reflections, what limits are reasonable in the next console generation for the number of reflective planar surfaces visible to the player at once?
I'm not sure why your impression of VSM is that it can only handle a 'handful' of shadow casting lights when it has optimizations like caching, sparsity, and advanced techniques like one-pass projection. Fortnite itself can handle rendering a dozen local lights (6 virtual shadow textures are allocated for each!) after culling with VSM well enough ...

@Bold With specifically no HW enhancements in mind, I can see next generation platforms being able to handle potentially up to 3-4 planar reflectors ...
I would take that bet since rasterization of multiple planar surfaces is inherently not very scalable. With a sufficiently complex use case it would perform very poorly. Just as shadow mapping with lots of lights is still untenable today after many years of shadow map R&D.
Since SRAM scaling hitting a ceiling, the ever ballooning costs of newer process technologies and the advent of smaller reticle limits all the while there being still a major outstanding issue (slow acceleration structure updates) on the most advanced RT implementations it is likely that next generation consoles will give out in terms of hardware improvements before they even reach RT for the majority of AAA games ...

Seeing how much of a performance disaster that a recent RT graphical update went for a 16 month old game and unless most AAA studios are content with building RT onto last generation technology like that (of which I seriously doubt), consoles are virtually poised to take a different direction if they want to avoid stagnation. 'Standardized' HW isn't living up to the console vendors expectation of being or becoming 'cheaper' overtime anymore so they have absolutely NOTHING to lose in particular at all by applying several exotic HW feature modifications going forward ...
 
I'm not sure why your impression of VSM is that it can only handle a 'handful' of shadow casting lights when it has optimizations like caching, sparsity, and advanced techniques like one-pass projection. Fortnite itself can handle rendering a dozen local lights (6 virtual shadow textures are allocated for each!) after culling with VSM well enough ...

@Bold With specifically no HW enhancements in mind, I can see next generation platforms being able to handle potentially up to 3-4 planar reflectors ...

Since SRAM scaling hitting a ceiling, the ever ballooning costs of newer process technologies and the advent of smaller reticle limits all the while there being still a major outstanding issue (slow acceleration structure updates) on the most advanced RT implementations it is likely that next generation consoles will give out in terms of hardware improvements before they even reach RT for the majority of AAA games ...

Seeing how much of a performance disaster that a recent RT graphical update went for a 16 month old game and unless most AAA studios are content with building RT onto last generation technology like that (of which I seriously doubt), consoles are virtually poised to take a different direction if they want to avoid stagnation. 'Standardized' HW isn't living up to the console vendors expectation of being or becoming 'cheaper' overtime anymore so they have absolutely NOTHING to lose in particular at all by applying several exotic HW feature modifications going forward ...
Does hardware accelerated Ray track ng count as exotic HW feature modifications? Or should console makers go with some other "exotic" tech?
 
Does hardware accelerated Ray track ng count as exotic HW feature modifications? Or should console makers go with some other "exotic" tech?
RT HW implementations already have a standardized API to interface with. Console vendors should do whatever it takes necessary to not raise the price of their new systems by more than $100 USD when the market is teetering on the edge of what they consider to be an acceptable price point. If that involves making some last ditch attempt to integrate some unorthodox HW features then so be it because rising system manufacturing costs aren't sustainable for them ...

Consoles are just going to corner themselves if they truly want to go the full way of being "dumbed down PCs". Ultimately we may need to face the reality that having PC and console architectures 'converge' was the wrong direction since one of them will exhaust hardware improvements with respect to cost much earlier ...
Which game?
Atomic Heart ...
 
Back
Top