DavidGraham
Veteran
A former NVIDIA and AMD engineer (he authored TXAA) has shared some interesting insights on the performance of big GPUs, he believes there are several reasons why a big GPU will have worse scaling with current generation games than a smaller one, for various reasons. I will list his reasons here to start a discussion about them.
1- The front-end (command processor) of the GPU is a serial machine, because of this a bigger lower clock GPU is more likely to get bottlenecked there than a smaller higher clock GPU. Rasterizing triangles shares a similar issue.
2-The DX12 API itself is a limitation on big GPUs.
3-Very high fps has higher overhead.
4-Ineffecient game coding. As some games do lots of dependent passes with serialization, it takes one long pass to do serious damage on the "total realized performance due to the size of the rest of the machine going idle". Also, larger GPUs suffer more if the caches are invalidated. When upscaling from a lower resolution, games are not doing a good job managing triangle LODs, as often "the amount of cluster culling isn't reducing as resolution drops (think of shadow passes, etc)=poor scaling".
5-Big GPUs with high power consumption often show volatile clocks compared to smaller GPUs, they also need more rapid power state changes.
1- The front-end (command processor) of the GPU is a serial machine, because of this a bigger lower clock GPU is more likely to get bottlenecked there than a smaller higher clock GPU. Rasterizing triangles shares a similar issue.
2-The DX12 API itself is a limitation on big GPUs.
DX12 on PC established a bad API baseline. Drivers disabled ability to pipeline with split barriers by default, and queues are CPU scheduled with high latency ... most games are loaded with non-pipelined workload {drain,idle,fill} regions (gets progressively worse on larger GPUs)
3-Very high fps has higher overhead.
The higher the fps, the higher the costs of frame boundary idle (including an app context switch from game to driver-feature or compositor, etc) goes up, and it gets worse on larger machines (more drain/fill time) ... without also pipelining frames PC is in trouble for 480Hz displays.
4-Ineffecient game coding. As some games do lots of dependent passes with serialization, it takes one long pass to do serious damage on the "total realized performance due to the size of the rest of the machine going idle". Also, larger GPUs suffer more if the caches are invalidated. When upscaling from a lower resolution, games are not doing a good job managing triangle LODs, as often "the amount of cluster culling isn't reducing as resolution drops (think of shadow passes, etc)=poor scaling".
5-Big GPUs with high power consumption often show volatile clocks compared to smaller GPUs, they also need more rapid power state changes.
So big chip gets idle more often, then power state changes more often, this has more latency before it's at peak perf for work = slower.