Reading some reviews the 1070 is close but slightly behind the Titan-X in games ,more so at high resolution.
Remarkably it has disabled a complete GPC, meaning 3 triangles and 48 pix per clock (vs 6 and 96 for GM200).
I wonder whether or not NV truly disabled a GPC in 1070 or just disabled 5 SMs. The reason being that at lower resolutions (say 1080p) the GPU should be bottlenecked more by triangle throughput than the pixel shader engine. This is one of the primary reasons we saw 980 Ti/Titan X outperform the Fury X @ 1080p and even 1440p but begin to lose out at 4k. Given the suggested triangle functional unit deficit of 33% for 1070 vs. 1080 while also factoring in the ~3% boost clock speed deficit this should produce a performance gap as large as 37% yet when we look at the performance in games the gap seems to align more often with (i.e. fall within the bounds of) the shader performance differential (25% unit * 3% clocks = 28.75% aggregate) than it does with the suggested triangle performance differential. In fact, out of the 12 games + 1 synthetic benchmark tested @1080p only 1 exceeds the expected performance gap due to shader performance (The Division - 31.7%). The average performance difference is 19.7%, only about half what it could be if each 1070 has a disabled GPC.
Compared to the 980 Ti with its 6 tri/hz rate or supposedly 2x/per clock of the 1070 and it should come away winning at least some benchmarks @ 1080p but it wins precisely none (the Titan X wins exactly 1 benchmark again The Division by all of 1 fps compared to the 1070). Factoring in clock speed differential we would expect the following triangle throughput differential:
GM200 (980 Ti) 1075MHz *6 tri/hz = 6.45G/tri/s
GP104 (1070) 1683MHz * 3 tri/hz = 5.05G/tri/s
difference: 27.7%
This difference should be observable at the lower resolution of 1080p but again, it is never once observed! In fact, the Titan X with its identical triangle rate to the 980 Ti actually does one win benchmark vs. the 1070 so this cannot be explained by the triangle rate since we observe different performance results. However, if the 1070 does in fact not have a disabled GPC and has the full 4 tri/s rate expected in GP104 then that changes the picture entirely and it would have a triangle throughput advantage compared with GM200, rather than a deficit. In this case we would see the following triangle throughput differential:
GM200 (980 Ti) 1075MHz *6 tri/hz = 6.45G/tri/s
GP104 (1070) 1683MHz * 4 tri/hz = 6.73G/tri/s
difference: 4.3%
Now, all that being said I realize that triangle throughput is not the only bottleneck within a given frame rendered @1080p, however it should be the predominant bottleneck. I hope someone with both a 1080 and a 1070 can run the good old B3D suite and compare triangle throughput. If I'm wrong, I'm wrong. I just want to know what's going on.
Data sourced from Guru3d review, all tests @ 1080p high/highest settings:
Rise of the Tomb Raider DX12
1080: 131fps
1070: 108fps
difference: 21.2%
Hitman DX12
1080: 107fps
1070: 86fps
difference: 24.4%
Doom OpenGL
1080: 176fps
1070: 142fps
difference: 23.9%
FarCry Primal DX11
1080: 110fps
1070: 94fps
difference: 17%
Anno 2205 DX11
1080: 120fps
1070: 100fps
difference: 20%
Fallout 4 DX11
1080: 134fps
1070: 129fps
difference: 3.8%
GTA V DX11
1080: 159fps
1070: 152fps
difference: 4.6%
The Division DX11
1080: 108fps
1070: 82fps
difference: 31.7%
Thief DX11
1080: 125fps
1070: 104fps
difference: 20.2%
The Witcher III DX11
1080: 105fps
1070: 83fps
difference: 26.5%
Battlefield Hardline DX11
1080: 124fps
1070: 101fps
difference: 22.8%
Alien Isolation DX11
1080: 186fps
1070: 157fps
difference: 18.5%
3dmark 11 X Score
1080: 10085
1070: 8290
difference: 21.7%