extrajudicial
Banned
Has anybody figured out why Volantis shows as 2,5 Ghz and not 2,3 on GB? It looks like TK1 can match Enhanced cyclone in single thread, but it's not even close in per core IPC. Still, impressive for 28nm.
Performance per watt is what actually matters in mobile, not performance per MHz. And higher MHz doesn't mean lower power efficiency (see Maxwell vs. Kepler and Denver vs. Cortex cores as an example).
As I have mentioned elsewhere, Denver eschews the power hungry OoO logic for a totally different approach where code is optimized via software and then executed in-order. The idea is to improve efficiency by optimizing code once and use many times rather than using more power hungry logic to optimize each and every time.
…
And higher MHz doesn't mean lower power efficiency (see Maxwell vs. Kepler and Denver vs. Cortex cores as an example).
The CPU perf. per watt of both devices is completely unknown at this time. Ideally one would need to use a CPU-intensive application and isolate CPU power consumption by measuring at the voltage rails, or measure the difference between idle and sustained power consumption with a CPU-intensive application. To accurately measure power consumption of either CPU or GPU, the platform power consumption needs to be isolated and accounted for. And in this particular comparison, there is also a difference in fabrication process node too.
It is, but 20nm is also not a strong improvement in power efficiency. It will take a transition to FinFETs to get where many traditionally expect a node transition to get to.
There is still some improvement, and the increased density can allow for more transistors to spend in the pursuit of power savings, so where the tipping point may be could depend on more detailed performance profiling and power testing.
Should they all transition to a similar FinFET node, the comparison's error bars should be much smaller.
From what I understand the benefits of maxwell are mostly due to power gating and clockspeed managment. Under heavy load the differences between Kepler and Maxwell efficiency is negligible.
Which has what to do with the topic at hand exactly? Yes the PowerVR is a TBDR and I've been following them since I was a teenager. As with all approaches there are both advantages as disadvantages.I also understand that mobile GPUs such as power vr 6xt render frames much more efficiently than desktop (read:nvidia) GPUs. Who is to say whether nvidia'a current lead in desktop GPUs would even translate?
From the CUDA thread, it's clear that they added some kind of register reuse cache that reduced register fetches from the register banks. Banks are pretty large, so that should result in quite an optimization. And if the register reuse cache is much closer to the ALUs, they will lose less power moving the operands around as well. And then there's reduced HW scheduling and the reduced crossbar not allowing operands to execute everywhere.From what I understand the benefits of maxwell are mostly due to power gating and clockspeed management.
Judging from battery size and battery life, it looks like Cyclone is way ahead in perf/watt also. The ipad has a much smaller battery, yet Apple advertises 10hrs to the Nexus' 9.
I am very skeptical about this Nexus, every Tegra has benched well, then gone on to be a Dog of a chip. Including 32bit K1.
I'm skeptical about Denver perf/watt too. But why do you think 32bit K1 is a "Dog of a chip"?
I don't mean it's not very fast on paper, but there are tons of bugs and every person I've seen use the 32b k1 returned the thing in hours. Lots of incompatibility, random crashes, etc. It may have had nothing to do with K1 and was just all the other hardware and software...
But I doubt it. Look at the sales figures, customer response has been negligible.
From the CUDA thread, it's clear that they added some kind of register reuse cache that reduced register fetches from the register banks. Banks are pretty large, so that should result in quite an optimization. And if the register reuse cache is much closer to the ALUs, they will lose less power moving the operands around as well. And then there's reduced HW scheduling and the reduced crossbar not allowing operands to execute everywhere.
The whole SM architecture has changed significantly, and they intuitively seem to benefit perf/W.
You don't get this kind of improvement with just clock gating (as if that's a new thing) and clock speed management (whatever that means.)
Tons of bugs, such as..?
Where are the sales figures for K1 devices?