We're not talking perf/W efficiency here. We're talking architectural efficiency.
What's the difference? Are you calling "architectural efficiency" to perf/mm^2 by leaving TDP aside?
If so, has anyone made a Vega 10 vs GP102 comparison at ISO clocks? Downclock a Titan X down to say 1400MHz, do the same with a Vega 64 and see how they compare?
Last time I saw something like that, I think the Polaris 10 actually goes very close to a GP104 at ISO clocks for core and memory.
Do you have any reason at all to believe that the compute specific extra features of GP100 have a negative impact on its graphics performance? Is there a negative to having larger register files? To larger cache? To having NVLINKs?
They at least have an impact in the clocks the GP100 can achieve at a given TDP compared to GP102. According to
nvidia's own whitepapers, GP100's
peak FP32 throughout is 10.6 TFLOPs (56SM @1480MHz) with a 300W TDP whereas GP102 can get about 20% more at 250W. This obviously has an impact in its graphics performance.
So the answer to your question is yes: GP100's 1/2 FP64 + 2xFP16 + more cache + nvlinks etc. do in fact have a negative impact on gaming performance.
They're not responsible for decreasing IPC, they're responsible for decreasing clocks at iso TDP.
Because that's really what this is about: people claiming that VEGA's gaming performance is lackluster because it's focusing on compute.
There's a number of reasons why Vega isn't reaching the same gaming performance as GP102 at iso TDP:
1 - GlobalFoundries' 14LPP is substantially less efficient than TSMC's 16FF+ (from the posts of experts in this forum, there's at least a 20% difference in power consumption at iso clocks).
2 - As
Raja confirmed 2 weeks ago some of the features aren't implemented in the driver
yet (his statement implies they will be, and so have
@Rys ' statements so far). Perhaps this discussion will be different when DSBR gets enabled even in automatic mode, since it'll affect both geometry performance and effective bandwidth.
3 - Also as mentioned by Raja in the same tweet, the Infinity Fabric being used in Vega 10 wasn't optimized for consumer GPUs and that also seems to be holding the GPU back (maybe by holding back the clocks at iso TDP). Why did they use IF in Vega 10? Perhaps because iterating IF in Vega 10 was an important stepping stone for optimizing the implementation for Navi or even Vega 11 and Raven Ridge. Perhaps HBCC was implemented around IF from the start. Perhaps Vega SSG doesn't have a PCIe controller for the SSDs and IF is being used to implement a PCIe controller in Vega.
4 - Compute-oriented features like 2*FP16, larger caches and HBCC prevent Vega 10 from achieving higher clocks at iso TDP, just like what happens with GP100.