This nitpick in the AMD Vega thread should be allowed:In the end, P100 has much larger register files (total) than any other GPU.
P100 has more registers in total than any other nVidia GPU. Fiji and very likely Vega10 have both more registers in total (16MB vector registers, 16.8MB including the scalar ones).
Traditionally, AMD builds GPUs with relatively large register files. Improving the energy efficiency of the register file accesses could help there too. The question would be, how efficient/wasteful is the current register file design of AMD (or are the lower hanging fruits somewhere else) and how could one improve it without giving up the general simplicity of how it works. GCN appears to be carefully designed from the beginning to reach this simplicity and different aspects are intertwined to make it work.
Last edited: