On a reasonable high end GPU, which has approximately X number of cores in the first place, the hair fully saturates the gpu, take ~3ms, and then the rest of the render starts and takes ~4ms -- perfect for about a 120fps budget. On a super high end gpu, maybe it has ~X*3 cores. Rather than seeing a 3x speedup, two thirds of the gpu sits idle during hair compute.
I think you're on to something. Hair strands seems to add the same ~2ms on a 3090 and a 4090. It doesn't take much longer on a 4070 Ti. Pretty compelling evidence that the hair workload isn't scaling with clocks or with SMs. Would be interesting to see RDNA numbers.
Edit: ran a few more tests and I'm convinced hair is a very narrow workload that isn't filling the gpu. At all resolutions gpu clocks are 100-200Mhz higher with hair enabled which points to fewer SMs being active allowing for clocks to ramp. At higher resolutions some work seems to run in parallel with hair rendering as the net cost of enabling hair is lower ~2.5ms at 1600p compared to ~3.4ms at 720p. So there is some async compute at play.
1080p hair off: 5.8 ms
1080p hair on: 8.8 ms
720p hair off: 3.9 ms
720p hair on: 7.3 ms
On a side note CPU scaling is extremely impressive. Even at very high framerates > 250fps all 16 CPU threads are active and the load is very evenly distributed. Well done RE engine!