Well, the 7600 would also have worse occupancy due to reduced register file capacity. But let's return to the 7900 XTX versus 4090 comparison. In the comparisons, both were tested at 4K with a 20 ms rendering time per frame. If the 4090 was faster by a typical margin of 25%, it would have achieved 62.5 FPS instead of 50. Moreover, even if the top 3 shaders were 2x faster on the 7900 XTX, the 7900 XTX would have reached roughly 54 FPS, making the 4090 still 15% faster.The 7600 likely has much higher L2 bandwidth than the 4060 Ti based on the earlier analysis
Their conclusion regarding the superior occupancy seems quite far-fetched given the small amount of their tests. Shaders are rarely bound by register file capacity, which is why AMD opted for reduced capacity in its more cost-sensitive mainstream GPUs. The number of threads can be more critical for shorter shaders vs register file size, and the GPUs capable of scheduling more threads per SM can be faster in such shaders. So better performance in shaders doesn't solely come down to higher register file capacity; scheduling strategies and scheduling capacities play significant roles too.
What I find to be quite misleading is their choice to compare the number of threads in flight per partition because the SM has twice the number of partitions. So for a more direct and less dramatic comparison, both the number of threads in flight for SM and the register file capacity should be doubled. The difference between the 7.4 versus 10 threads in flight doesn't sound as dramatic as 3.7 versus 10, right?
Last edited: