And the demo is incredibly well threaded so it should still be rendering with those extra cores and putting them to use.
So is there a CPU limitation int here?
There
could be, but what I'm imagining (pure speculation, no warranties) is that computing the hair takes ~X number of cores for ~Y number of cycles. Until the hair is computed,
no other rendering work can start -- you need the hair to render shadowmaps, you need to hair to render depth, and you need the hair to render the scene.
On a reasonable high end GPU, which has approximately X number of cores in the first place, the hair fully saturates the gpu, take ~3ms, and then the rest of the render starts and takes ~4ms -- perfect for about a 120fps budget. On a
super high end gpu, maybe it has ~X*3 cores. Rather than seeing a 3x speedup, two thirds of the gpu sits idle during hair compute. In a perfect world maybe you would find extra compute work to do here, (you could just add a lot more hair strands to fill the gpu up all of the way and maybe satisfy some 4090 owners egos, but there would be no visual improvement, so it would be wasted resources) but there just isnt any extra work worth doing. (Generally the kind of compute work you would do to fill out space is optimizations to make
other parts of the render run faster, but the other parts are already preposterously fast here, so that's unlikely to help.)
The result is that the super high end gpu
also takes ~3ms to do hair. Then it renders the rest of the scene in ~3ms -- so there's only a ~20% improvement, despite a larger theoretical gap in gpu specs between the high end and super high end gpus.
(I agree with you, of course, that the game is incredibly well threaded and optimized, I think complaining it doesnt scale up to ~1000 fps on a suitably fast gpu is silly, just trying to dig into why that might be)