Let's discuss the ways the future console architectures could potentially improve in efficiency

Milk, I feel like anything else but what you said will bring crushing disappointment to me. All those technologies sound great. Especially more broad application of non traditional rendering paths like Nanite.

There is no reason only UE5 should have monopoly on that tech and no reasons hw venders themselves shouldent have ways of vastly speeding up those processes outside of software(which is already pretty fast and impressive though to be honest based on what we are seeing, it saves a lot compared to normal methods even in it's current form)
 
More considerable speed increase in SSD top speed for short bursts, to be used exclusively for game switching or for level load. Sustained speed can increase moderatly from where it is with current new-gen.

HW rasterization will probably be completely overhauled. There is no way all previous assumptions arent being re-considered by HW vendors after nanite went with purely compute-based SW raster and it was faster (most of the time)
I dont think there's any 'major' leaps possible with current SSD's that would be game changing from what exists now. We see constant linear improvements because the technology itself doesn't present opportunities for anything else. Though something like ReRAM could be really interesting.

As for completely changing the geometry hardware because of virtualized geometry(Nanite), it's an interesting thought. But Nanite itself is already so relatively cheap, and current geometry engines in GPU's do not take up a ton of die space, so I dont think there really needs to be any huge overhaul. The paradigm change with the software side alone should do most of the work. Plus, you dont want to hurt performance badly in older games.
 
I dont think there's any 'major' leaps possible with current SSD's that would be game changing from what exists now. We see constant linear improvements because the technology itself doesn't present opportunities for anything else. Though something like ReRAM could be really interesting.

As for completely changing the geometry hardware because of virtualized geometry(Nanite), it's an interesting thought. But Nanite itself is already so relatively cheap, and current geometry engines in GPU's do not take up a ton of die space, so I dont think there really needs to be any huge overhaul. The paradigm change with the software side alone should do most of the work. Plus, you dont want to hurt performance badly in older games.
If nanite being run on the compute core is less power efficient than a redesigned geometry engine it would be worth it with todays power limitations.
 
If nanite being run on the compute core is less power efficient than a redesigned geometry engine it would be worth it with todays power limitations.

There’s a lot of stuff competing for compute now so it may soon become a bottleneck. Lumen, Nanite, RT, post processing, upscaling. A hardware micro poly rasterizer could be useful for both Nanite and mesh shading and may be worth the silicon. Would be interesting to have the option at least.

In all of the Nsight traces I’ve seen the hardware rasterizer has never been the bottleneck. Even in geometry heavy shadow or gbuffer passes the ROPs or bandwidth are usually the bottleneck. If it’s cheap enough it may be advantageous to support two raster modes in hardware. Mode 1 could be like existing rasterizers stamping out lots of pixels on medium to large triangles, one triangle at a time. Mode 2 would be similar to the Nanite rasterizer stamping out smaller pixel tiles for multiple small triangles in parallel.
 
There’s a lot of stuff competing for compute now so it may soon become a bottleneck. Lumen, Nanite, RT, post processing, upscaling. A hardware micro poly rasterizer could be useful for both Nanite and mesh shading and may be worth the silicon. Would be interesting to have the option at least.

In all of the Nsight traces I’ve seen the hardware rasterizer has never been the bottleneck. Even in geometry heavy shadow or gbuffer passes the ROPs or bandwidth are usually the bottleneck. If it’s cheap enough it may be advantageous to support two raster modes in hardware. Mode 1 could be like existing rasterizers stamping out lots of pixels on medium to large triangles, one triangle at a time. Mode 2 would be similar to the Nanite rasterizer stamping out smaller pixel tiles for multiple small triangles in parallel.
Is it never a bottleneck because developers purposely limit geometry though? We haven't seen a meaningful geometry increase in 10 years outside of UE5.
 
Is it never a bottleneck because developers purposely limit geometry though? We haven't seen a meaningful geometry increase in 10 years outside of UE5.

Maybe. The rasterizer itself is most likely not the bottleneck even with more detailed geometry though. Triangle setup would probably choke first.
 
Maybe adding more rasterizers was good enough for current triangle counts. It’s one per shader engine for AMD and I think the PS5 has 4 of them.
 
What would be good is a single unit that can handle micro geometry like Nanite and also be used to update geometry inside a BVH structure at the same time.
 
We have multiple rasterizer and geometry units working in parallel since Fermi - AD102 for example has 12 rasterizer and 72 geometry units.
Yeah, I remember that, I probably should have rephrashed my question in a better way, the rasterizer and geometry unit count is the lowest in any GPU, lower than Textures, Raster Out, FP32, INT32, Ray Tracers, Tensors .. etc. Why didn't we increase their count even further and allowed more of them to process an even greater number of primitives?
 
I dont think there's any 'major' leaps possible with current SSD's that would be game changing from what exists now. We see constant linear improvements because the technology itself doesn't present opportunities for anything else. Though something like ReRAM could be really interesting.

Yeah, that's my point. I think when it comes to storage, there won't be major leaps, other than what the natural evolution of the tech affords in the future. But it would be clever to aim at chips that can reach higher throttle speeds than they can sustain consistently, so they have the option to use those short bursts for sporadic app switching and full level loadings.

As for completely changing the geometry hardware because of virtualized geometry(Nanite), it's an interesting thought. But Nanite itself is already so relatively cheap, and current geometry engines in GPU's do not take up a ton of die space, so I dont think there really needs to be any huge overhaul. The paradigm change with the software side alone should do most of the work. Plus, you dont want to hurt performance badly in older games.

I think the geometry hardware has already been largely generalized and opened up on AMD side through "next-gen-geometry" and now adopted by Nvidia and the API's through geometry shaders. When I said rasterization has to be re-thought, I am strictly thinking of how triangles are drawn after the geo is already processed. How many fragments and pixels are generated, how they are grouped, in what order and format they are sent to shader units, what filling conventions it follows, how MSAA is handled, subpixel placement, etc. Epic has shown empirically that there is room to rethink the performance trade-offs in that area with their SW micro-polygon rasterizer.
And again, rasterization is only one field I mentioned. I suspect there are many other conventional ways the render pipe is handled in fixed function forms that might benefit from more generalization and programability. Even if traditional algos might be a little slower, the whole point is most devs are NOT using traditional algos anyways.

Texture Units were built to sample texels and filter them and feed the result to a fragment shader that would do some blending with the triangle gourad-shaded colours and output that to a frame buffer. Modern devs are researching ways of SW rasterizing an ID buffer, that is later re-evaluated in a compute shader that samples the texture values directly, filters them in SW and draws a new G-Buffer, to be shaded in a later pass, in a nother fully SW compute shader. High end devs and engines do their own tiling of the screen on this pass, and do all sorts of SW optimizations no GPU ever thought of considering. Mipmapping has to be biased to account for TAA, invalidating old-school assumptions on how that worked as well. SSAA is not used, but hacks are employed to do similar things but in slightly different ways like checker-board-rendering. Variable-rate shading was created to try and catch up with those trends, and yet devs such as infinity ward have already developed pure copute-based software solutions that are faster and achieve better results for their purpuses.

Eventually, GPUs will indeed just be a parallel compute chip. I don't think performance will be sufficient for that by PS6/XBStupidName times, but all the fixed function stuff could benefit from at least becoming as generalized, reprogramable and circumventable as possible. A lot of modern techniques spend a lot of effort to try and walk around fixed-funcion aspects. Those things that were meant to make their lives easier are often making it harder. Just open that shit up.
 
Last edited:
Why didn't we increase their count even further and allowed more of them to process an even greater number of primitives?

This is an instantaneous snapshot of hardware usage on a 3090 in 3dmark's mesh shader test. The numbers fluctuate during the frame but aside from the cache all of the other units hover around 30% usage. SM occupancy when running mesh shaders was at only 16% and is bottlenecked by something call ISBE which is apparently an on-chip buffer used for geometry processing. It was 98% full and seems to be the thing holding back all of the other hardware.

In this example adding more rasterizers won't help much unless you add more triangle clipping/culling hardware (VPCs/TPUs) and bigger on-chip buffers to support more mesh shader warps in flight. There's probably a whole host of other bottlenecks in the raster pipeline that would need to be beefed up too.

On a interesting side note, vertex attribute fetch (VAF) and primitive distribution (PD) are at 0% as expected as they're only used in the classic vertex shader pipeline. Developers manage those memory loads explicitly when using mesh shaders.

3dmark-mesh-3090.png
 
Could something over 60 fps become more common on consoles via frame generation on pro or next gen? Could fsr 3 bring something like that on future machines or some other type of frame generation is preferred consoles?

Is frame generation one of the ways to go?

Async time warp reprojection, foveated rendering... How does that sound for non VR use on consoles?
 
Could something over 60 fps become more common on consoles via frame generation on pro or next gen? Could fsr 3 bring something like that on future machines or some other type of frame generation is preferred consoles?

Is frame generation one of the ways to go?

Async time warp reprojection, foveated rendering... How does that sound for non VR use on consoles?

FPS is as high as a developer wants it to be. There is nothing stopping developers from releasing higher than 60 FPS content on consoles on current hardware.

However, as nice as that would be because it leads to better playing games, it's hard to sell someone who has never played your game on it with high FPS because it requires them to own the game in order to feel how nice it is.

So, AAA developers instead rely on mostly static screenshots of their games to sell their games. Thus motion resolution often suffers and we have low FPS on consoles.

PC gets away with high FPS for 2 main reasons.

Firstly, configurable graphics settings. This means that a developer can still shoot to maximize the graphics in the game which would lead to single digit FPS on some graphics cards (budget, midrange, etc.) but the user can always opt for lower settings and higher framerates.

Secondly, PC games are generally limited by console game development production flow. So, high end PC hardware is typically not used well and maxing out quality settings which are scaled up from console (versus scaled down from a planned highest graphics setting) doesn't typically lead to a meaningful increase in graphics fidelity. So, sometimes developers choose to just go with not bothering and having the highest quality setting be high FPS on higher end hardware.

Until more console developers offer more graphical choice options, consoles will always be more limited. If developers gave more graphical option choices on console they could still have the same high quality screenshot 30 FPS modes while also supporting 60 FPS and higher than 60 FPS modes in every single title that gets released.

It's developers opting for certain default graphical options in their games on console that limits the FPS attainable.

Regards,
SB
 
Could something over 60 fps become more common on consoles via frame generation on pro or next gen? Could fsr 3 bring something like that on future machines or some other type of frame generation is preferred consoles?

Is frame generation one of the ways to go?

Async time warp reprojection, foveated rendering... How does that sound for non VR use on consoles?
They would be at RDNA 6 by then.
AMD may always stay behind Nvidia by 1 generation in terms of performance (not much they can do about this), but RDNA 6 will be magnitudes order more performant over RDNA 3 if we think about doubling performance each generation, they will have frame generation and possibly more by then
 
foveated rendering... How does that sound for non VR use on consoles?
It'd work the same, but how do you do the eye trcaking without a headset? TBH that could be a game changer, if a console could include a TV-side camera capable of eye tracking at that quality to enable ETFR. Don't know if the accuracy is there at room-scale tracking though.
 
It'd work the same, but how do you do the eye trcaking without a headset? TBH that could be a game changer, if a console could include a TV-side camera capable of eye tracking at that quality to enable ETFR. Don't know if the accuracy is there at room-scale tracking though.
How about this?

 
Back
Top