Genotypical
Newcomer
I figure some here would have good ideas on this. One reason I suspect is the difference in scheduling. I usually link people to this article
https://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3
I assume later Nvidia architectures continued and improved on this. Things weren't as great with Kepler so I left it out of the title, but it started there. As long as just this one difference remains its very unlikely consumption numbers will match at the same or similar manufacturing node, right? are there other factors?
https://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3
The end result is an interesting one, if only because by conventional standards it’s going in reverse. With GK104 NVIDIA is going back to static scheduling. Traditionally, processors have started with static scheduling and then moved to hardware scheduling as both software and hardware complexity has increased. Hardware instruction scheduling allows the processor to schedule instructions in the most efficient manner in real time as conditions permit, as opposed to strictly following the order of the code itself regardless of the code’s efficiency. This in turn improves the performance of the processor.
However based on their own internal research and simulations, in their search for efficiency NVIDIA found that hardware scheduling was consuming a fair bit of power and area for few benefits. In particular, since Kepler’s math pipeline has a fixed latency, hardware scheduling of the instruction inside of a warp was redundant since the compiler already knew the latency of each math instruction it issued. So NVIDIA has replaced Fermi’s complex scheduler with a far simpler scheduler that still uses scoreboarding and other methods for inter-warp scheduling, but moves the scheduling of instructions in a warp into NVIDIA’s compiler. In essence it’s a return to static scheduling.
Ultimately it remains to be seen just what the impact of this move will be. Hardware scheduling makes all the sense in the world for complex compute applications, which is a big reason why Fermi had hardware scheduling in the first place, and for that matter why AMD moved to hardware scheduling with GCN. At the same time however when it comes to graphics workloads even complex shader programs are simple relative to complex compute applications, so it’s not at all clear that this will have a significant impact on graphics performance, and indeed if it did have a significant impact on graphics performance we can’t imagine NVIDIA would go this way.
I assume later Nvidia architectures continued and improved on this. Things weren't as great with Kepler so I left it out of the title, but it started there. As long as just this one difference remains its very unlikely consumption numbers will match at the same or similar manufacturing node, right? are there other factors?