NVIDIA Tegra Architecture

extrajudicial · Oct 23, 2014

Has anybody figured out why Volantis shows as 2,5 Ghz and not 2,3 on GB? It looks like TK1 can match Enhanced cyclone in single thread, but it's not even close in per core IPC. Still, impressive for 28nm.

Lazy8s · Oct 23, 2014

Then, of course, there's the question of how the differing approaches to CPU architecture affect the performance in a range of real world workloads.

It'll be interesting to see if nVidia's approach with Denver leads to a new way forward for other design teams, too.

ams · Oct 23, 2014

Performance per watt is what actually matters in mobile, not performance per MHz. And higher MHz doesn't mean lower power efficiency (see Maxwell vs. Kepler and Denver vs. Cortex cores as an example).

As I have mentioned elsewhere, Denver eschews the power hungry OoO logic for a totally different approach where code is optimized via software and then executed in-order. The idea is to improve efficiency by optimizing code once and use many times rather than using more power hungry logic to optimize each and every time.

extrajudicial · Oct 23, 2014

ams said:
Performance per watt is what actually matters in mobile, not performance per MHz. And higher MHz doesn't mean lower power efficiency (see Maxwell vs. Kepler and Denver vs. Cortex cores as an example).

As I have mentioned elsewhere, Denver eschews the power hungry OoO logic for a totally different approach where code is optimized via software and then executed in-order. The idea is to improve efficiency by optimizing code once and use many times rather than using more power hungry logic to optimize each and every time.

Judging from battery size and battery life, it looks like Cyclone is way ahead in perf/watt also. The ipad has a much smaller battery, yet Apple advertises 10hrs to the Nexus' 9.

I am very skeptical about this Nexus, every Tegra has benched well, then gone on to be a Dog of a chip. Including 32bit K1.

extrajudicial · Oct 23, 2014

I'd like to correct my previous post. The Nexus has a smaller battery. 6700mah vs 7340.

It's hard to say which is better per watt. The ipad has a larger screen and longer battery life, but the Nexus has a smaller battery.

ams · Oct 23, 2014

The CPU perf. per watt of both devices is completely unknown at this time. Ideally one would need to use a CPU-intensive application and isolate CPU power consumption by measuring at the voltage rails, or measure the difference between idle and sustained power consumption with a CPU-intensive application. To accurately measure power consumption of either CPU or GPU, the platform power consumption needs to be isolated and accounted for. And in this particular comparison, there is also a difference in fabrication process node too.

JohnH · Oct 23, 2014

ams said:
…
And higher MHz doesn't mean lower power efficiency (see Maxwell vs. Kepler and Denver vs. Cortex cores as an example).

You can't use different architectural generations as examples of higher clock != lower power efficiency, it's a completely meaningless comparison, it just shows that newer architectures may be designed to be more power efficient.

extrajudicial · Oct 23, 2014

ams said:
The CPU perf. per watt of both devices is completely unknown at this time. Ideally one would need to use a CPU-intensive application and isolate CPU power consumption by measuring at the voltage rails, or measure the difference between idle and sustained power consumption with a CPU-intensive application. To accurately measure power consumption of either CPU or GPU, the platform power consumption needs to be isolated and accounted for. And in this particular comparison, there is also a difference in fabrication process node too.

True, but they both offer roughly the same battery size and we can infer something about their power efficiency, even if it's only in the context of that OS or form factor.

The Tegra is at a node disadvantage, but should we judge its performance as somehow more impressive based on what it might do?

ninelven · Oct 23, 2014

Indeed, isn't A8X 20nm?

I don't see that it matters much anyway, Nvidia and Apple aren't really competing against each other.

3dilettante · Oct 23, 2014

It is, but 20nm is also not a strong improvement in power efficiency. It will take a transition to FinFETs to get where many traditionally expect a node transition to get to.
There is still some improvement, and the increased density can allow for more transistors to spend in the pursuit of power savings, so where the tipping point may be could depend on more detailed performance profiling and power testing.
Should they all transition to a similar FinFET node, the comparison's error bars should be much smaller.

Ailuros · Oct 24, 2014

3dilettante said:
It is, but 20nm is also not a strong improvement in power efficiency. It will take a transition to FinFETs to get where many traditionally expect a node transition to get to.
There is still some improvement, and the increased density can allow for more transistors to spend in the pursuit of power savings, so where the tipping point may be could depend on more detailed performance profiling and power testing.
Should they all transition to a similar FinFET node, the comparison's error bars should be much smaller.

20SoC according to insider indications isn't exactly what you'd expect it to be. Yes it comes with improvements of course but to put it into a more realistic Tegra Logan vs. Erista picture, if the latter GPU is for example by 80% (freely invented figure) more efficient, do you think it would be fair if I'd say that the majority of that persentage comes from architectural refinements (Kepler-->>Maxwell) and only a modest persentage from the 28HPm to 20SoC transition?

It's always nice to have a smaller and more advanced manufacturing process but if there aren't any leaps in architectural refinements it'll stay in the hw refresh realm.

extrajudicial · Oct 24, 2014

From what I understand the benefits of maxwell are mostly due to power gating and clockspeed managment. Under heavy load the differences between Kepler and Maxwell efficiency is negligible.

I also understand that mobile GPUs such as power vr 6xt render frames much more efficiently than desktop (read:nvidia) GPUs. Who is to say whether nvidia'a current lead in desktop GPUs would even translate?

Ailuros · Oct 24, 2014

extrajudicial said:
From what I understand the benefits of maxwell are mostly due to power gating and clockspeed managment. Under heavy load the differences between Kepler and Maxwell efficiency is negligible.

A GTX980 is almost by 60% faster than a GTX770 at comparable real time power consumption, while the GM204 is on the same process as GK104 and with about 33% more die area. Same goes for a GM107 vs. a GK107. You don't get that kind of differences just with power gating and frequency tricks. If you don't see any of these increases you're most likely comparing apples to oranges ie making the same mistake as many while comparing GM204 to GK110. The first is a performance chip, the latter a high end chip.

This doesn't come for free either; GM204 has roughly 48% more transistors than GK104 amongst them a portion for the higher compliance for the first.

I also understand that mobile GPUs such as power vr 6xt render frames much more efficiently than desktop (read:nvidia) GPUs. Who is to say whether nvidia'a current lead in desktop GPUs would even translate?

Which has what to do with the topic at hand exactly? Yes the PowerVR is a TBDR and I've been following them since I was a teenager. As with all approaches there are both advantages as disadvantages.

That still doesn't change one bit that the ULP Maxwell GPU in upcoming Erista will raise the efficiency bar significantly as will most future architectures in that market.

silent_guy · Oct 24, 2014

extrajudicial said:
From what I understand the benefits of maxwell are mostly due to power gating and clockspeed management.

From the CUDA thread, it's clear that they added some kind of register reuse cache that reduced register fetches from the register banks. Banks are pretty large, so that should result in quite an optimization. And if the register reuse cache is much closer to the ALUs, they will lose less power moving the operands around as well. And then there's reduced HW scheduling and the reduced crossbar not allowing operands to execute everywhere.

The whole SM architecture has changed significantly, and they intuitively seem to benefit perf/W.

You don't get this kind of improvement with just clock gating (as if that's a new thing) and clock speed management (whatever that means.)

RecessionCone · Oct 24, 2014

extrajudicial said:
Judging from battery size and battery life, it looks like Cyclone is way ahead in perf/watt also. The ipad has a much smaller battery, yet Apple advertises 10hrs to the Nexus' 9.

I am very skeptical about this Nexus, every Tegra has benched well, then gone on to be a Dog of a chip. Including 32bit K1.

I'm skeptical about Denver perf/watt too. But why do you think 32bit K1 is a "Dog of a chip"?

extrajudicial · Oct 24, 2014

RecessionCone said:
I'm skeptical about Denver perf/watt too. But why do you think 32bit K1 is a "Dog of a chip"?

I don't mean it's not very fast on paper, but there are tons of bugs and every person I've seen use the 32b k1 returned the thing in hours. Lots of incompatibility, random crashes, etc. It may have had nothing to do with K1 and was just all the other hardware and software...

But I doubt it. Look at the sales figures, customer response has been negligible.

Florin · Oct 24, 2014

extrajudicial said:
I don't mean it's not very fast on paper, but there are tons of bugs and every person I've seen use the 32b k1 returned the thing in hours. Lots of incompatibility, random crashes, etc. It may have had nothing to do with K1 and was just all the other hardware and software...

But I doubt it. Look at the sales figures, customer response has been negligible.

Tons of bugs, such as..?

Where are the sales figures for K1 devices?

extrajudicial · Oct 24, 2014

silent_guy said:
From the CUDA thread, it's clear that they added some kind of register reuse cache that reduced register fetches from the register banks. Banks are pretty large, so that should result in quite an optimization. And if the register reuse cache is much closer to the ALUs, they will lose less power moving the operands around as well. And then there's reduced HW scheduling and the reduced crossbar not allowing operands to execute everywhere.

The whole SM architecture has changed significantly, and they intuitively seem to benefit perf/W.

You don't get this kind of improvement with just clock gating (as if that's a new thing) and clock speed management (whatever that means.)

It's not just "clock gating" and how do you explain the fact that Kepler and Maxwell have the EXACT same power consumption on compute loads?

Your explanation is that "the architecture is different" ... Ok!

extrajudicial · Oct 24, 2014

Florin said:
Tons of bugs, such as..?

Where are the sales figures for K1 devices?

I'm on mobile and am having a hard time linking all the reviews. The consensus is that yes it's very fast but very buggy. Lots of crashing and updates.

I'd love to see the sales figures, I'll look for them in the next hr.

extrajudicial · Oct 24, 2014

http://www.anandtech.com/show/6147/nvidia-q2-fy13-earnings-report-104b-revenue-tegra-sales-recover

NVIDIA Tegra Architecture

extrajudicial

Lazy8s

ams

extrajudicial

extrajudicial

ams

JohnH

extrajudicial

ninelven

PM

3dilettante

Ailuros

Epsilon plus three

extrajudicial

Ailuros

Epsilon plus three

silent_guy

RecessionCone

extrajudicial

Florin

Merrily dodgy

extrajudicial

extrajudicial

extrajudicial

Similar threads