NVIDIA Tegra Architecture

By the way did it just fly by my head or did they postpone Project Denver by another process generation?

That part is unclear. At CES 2013, NVIDIA joked that Project Denver will be available in two years, and said that it will actually be available well before then. At GTC 2013, NVIDIA said that Project Denver may show up first in Tegra 6 "Parker", but again unclear.
 
That part is unclear. At CES 2013, NVIDIA joked that Project Denver will be available in two years, and said that it will actually be available well before then. At GTC 2013, NVIDIA said that Project Denver may show up first in Tegra 6 "Parker", but again unclear.

It shouldn't be too hard to find clear indications of the past for Denver integration into Maxwell and its first appearance under 20nm. Now guess what happened in the meantime.
 
The only thing we know is that Tegra-powered devices will not have a Project Denver CPU until Tegra 6 "Parker" (which happens to be the same time that Tegra-powered devices will have a Maxwell-derived GPU).
 
The only thing we know is that Tegra-powered devices will not have a Project Denver CPU until Tegra 6 "Parker" (which happens to be the same time that Tegra-powered devices will have a Maxwell-derived GPU).

Which is another milestone to master, since the next best question would be what to integrate into Logan in order to have a decent CPU performance increase.
 
How did yields look like in early 2012 for 28LP and why should any 20nm variant be any better in early 2014? Have any 20nm capacities been secured already and if yes by whom?

28LP in early 2012 looked good enough for Qualcomm to launch with it. You wouldn't argue that S4 should have been 40nm, would you?

I don't know anything about Maxwell but Logan at 28nm just doesn't make sense. Getting two major uarchs out of GPUs on a given process is pretty standard and if AMD doesn't move with aggressively with 20nm they'll have less incentive to. But the stakes are pretty different for Tegra. Tegra 4 is already not really looking great on the power consumption front; they just don't cover the lower end of the CPU perf/W spectrum well vs other competitors. The situation is going to be a lot worse if they release another 28nm part with the same problems shortly before Qualcomm releases 20nm parts (Intel's 22nm Atoms will probably already be on the market as well).

So the only way I could see 28nm making sense is if they change the CPU strategy to include big.LITTLE. But I don't think nVidia wants to start licensing more cores for only one generation of products, and they seem pretty committed to this 4+1 layout, at least until Denver.
 
Anandtech was just purely speculating that Tegra 5 would use a 28nm fabrication process. Based on NVIDIA's performance expectations, and based on the link you provided above, I cannot imagine that Tegra 5 will use anything other than a 20nm fabrication process.

Nvidia openly complained about TSMC's 20nm process, in terms of transistor cost vs 28nm, stating that there was little saving. Now perhaps the situation has been improved, but given Nvidia's early issues with TSMC's 28nm process, a move to 20nm & new architecture could be risky, especially if the risk is not offset by cost savings. 28nm HPM would offer improved performance, on a mature process, so 20nm is not a certainty.

Regarding a mega Gflop, extreme edition T5 Tablet SoC, and ignoring engineering arguments, who would actually buy that chip, that would come with a price premium over baby T5. No Apple, Intel or Samsung, the other Android vendors have shown no ability to sell tablets in serious volume, let alone premium priced iPad competitors, Nexus tablets are popular, but more in the price sensitive end of the market.

http://www.extremetech.com/computin...y-with-tsmc-claims-22nm-essentially-worthless
 
28LP in early 2012 looked good enough for Qualcomm to launch with it. You wouldn't argue that S4 should have been 40nm, would you?

I don't recall Qualcomm being satisfied with their yields though, nor them meeting their goals back then which was official. If yields would had been better we might have seen higher Krait frequencies earlier than just recently.

I don't know anything about Maxwell but Logan at 28nm just doesn't make sense. Getting two major uarchs out of GPUs on a given process is pretty standard and if AMD doesn't move with aggressively with 20nm they'll have less incentive to. But the stakes are pretty different for Tegra. Tegra 4 is already not really looking great on the power consumption front; they just don't cover the lower end of the CPU perf/W spectrum well vs other competitors. The situation is going to be a lot worse if they release another 28nm part with the same problems shortly before Qualcomm releases 20nm parts (Intel's 22nm Atoms will probably already be on the market as well).

Tegra4 is not looking great on the power consumption front why exactly according to your opinion?

So the only way I could see 28nm making sense is if they change the CPU strategy to include big.LITTLE. But I don't think nVidia wants to start licensing more cores for only one generation of products, and they seem pretty committed to this 4+1 layout, at least until Denver.

And 20nm is going to save the day? It'll just make things quite a bit easier, but it doesn't have only advantages; frankly given the risk involved considering a new GPU architecture on a very new and probably very immature process I'm not so sure the advantages overweigh the disadvantages after all.
 
Cost savings or not nVidia can't afford to lag far behind in power efficiency for mobile SoCs. So unless their complaints have extended to include TSMC 20nm offering no power savings they're going to need it.

nVidia complains about and blames TSMC every chance they get, it doesn't really change anything because they're the only real option.
 
I don't recall Qualcomm being satisfied with their yields though, nor them meeting their goals back then which was official. If yields would had been better we might have seen higher Krait frequencies earlier than just recently.

You're right, Qualcomm was unhappy with TSMC 28nm and it could have been better. Does that mean they would have been better off sticking with 40nm? I don't think so and I doubt they think so. You take the best you can get. TSMC currently holds all the cards.

Tegra4 is not looking great on the power consumption front why exactly according to your opinion?

I've commented on this before, but the statement from nVidia is that the 825MHz power saver Cortex-A15 core will use 40% less power than a Tegra 3 Cortex-A9 core at 1.6GHz.

That's pretty much a best-case scenario for them. That's not good. They may claim that Cortex-A15 offers nearly twice the IPC as Cortex-A9 and that may be true for SPEC2k but for most programs it won't be.

Starting at 1.6GHz for Tegra 3 is a very bad place to make a comparison: the power consumption curve drops a lot quicker going down from the Cortex-A9 at 1.6GHz than it does for a Cortex-A15 going down from 825MHz. And you can only run that one power saver core in isolation. We don't know how much worse the other cores are but hopefully nVidia wasn't daft enough to make a separate mutually exclusive core that barely uses less power.

So my expectation is that Tegra 4's perf/W curve will barely look better than Tegra 3's, if at all for the lower regions.

No idea about GPU but a lot of an SoC's common use time is with the GPU under light load.

And 20nm is going to save the day? It'll just make things quite a bit easier, but it doesn't have only advantages; frankly given the risk involved considering a new GPU architecture on a very new and probably very immature process I'm not so sure the advantages overweigh the disadvantages after all.

I'm not saying 20nm will save the day or that there won't be significant challenges in making it work. I'm saying nVidia can't afford to release Logan on 28nm (but that doesn't mean they won't).
 
You're right, Qualcomm was unhappy with TSMC 28nm and it could have been better. Does that mean they would have been better off sticking with 40nm? I don't think so and I doubt they think so. You take the best you can get. TSMC currently holds all the cards.

No they obviously wouldn't had been better off with 40nm, but NVIDIA could have theoretically gone for 28nm with Tegra3 too.

I've commented on this before, but the statement from nVidia is that the 825MHz power saver Cortex-A15 core will use 40% less power than a Tegra 3 Cortex-A9 core at 1.6GHz.

That's pretty much a best-case scenario for them. That's not good. They may claim that Cortex-A15 offers nearly twice the IPC as Cortex-A9 and that may be true for SPEC2k but for most programs it won't be.

Starting at 1.6GHz for Tegra 3 is a very bad place to make a comparison: the power consumption curve drops a lot quicker going down from the Cortex-A9 at 1.6GHz than it does for a Cortex-A15 going down from 825MHz. And you can only run that one power saver core in isolation. We don't know how much worse the other cores are but hopefully nVidia wasn't daft enough to make a separate mutually exclusive core that barely uses less power.

So my expectation is that Tegra 4's perf/W curve will barely look better than Tegra 3's, if at all for the lower regions.

No idea about GPU but a lot of an SoC's common use time is with the GPU under light load.

I can claim then that the "power problem" for T4 is not in its majority tied to the manufacturing process?

I'm not saying 20nm will save the day or that there won't be significant challenges in making it work. I'm saying nVidia can't afford to release Logan on 28nm (but that doesn't mean they won't).

According to rumors TSMC has not granted to any other bigger competitors any priorities for 20nm. That former link of yours that paints the typical pretty marketing picture TSMC draws before each of their new process releases makes me almost think or suggest otherwise when it comes to Apple. It's not even granted that the majority of the ramblings in that writeup are accurate.

And no I'm not going to ignite any funky conspiracy theories, but if Apple truly has a major chunk of its production volume scheduled for 20nm at TSMC it is going to affect quite a few other IHVs at least to some degree.

I'm not saying that 20nm doesn't make more sense for Logan/NV; I just can't believe yet the 20nm and early 2014 combination that's all.
 
No they obviously wouldn't had been better off with 40nm, but NVIDIA could have theoretically gone for 28nm with Tegra3 too.

True, and maybe they should have, but the situation is a lot different going from Tegra 2 to Tegra 3 and going from Tegra 4 to Logan. Tegra 2 was a tiny core with fair room to grow while still keeping a competitive price. With NDK taking off like it did nVidia had to correct the boneheaded decision to keep NEON off the chips, and they had to make something that could individually power gate separate cores and therefore be useable in phones. The tablet market was still burgeoning and Tegra 2 was poorly paired with immature software so hardly locked in the market. They had to do something about memory bandwidth; while they still kept it single channel they grew to support much faster DDR3 which was vital for tablets. Because of growth in tablets there was still a fair amount of untapped power consumption that nVidia could tap into.

And going for quad cores was an easy marketing and review site win. nVidia knew neither that Qualcomm nor any of its other serious competitors were going to be offering quad cores any time soon (Samsung eventually did but quite a bit later). And they knew that stupid benchmarks and PR stunts work. I have no doubt that going quad first was a big part of what success Tegra 3 had. The power saver core was also in a much better light than it is today because no one had anything comparable, and nVidia ramped up the marketing behind it.

Fast forward to early 2014 and almost none of this will apply to Tegra 4. The only big problems on the table that it can fix without changing process node are the CPU organization, which I really doubt nVidia will change, and the GPU. GPU is big but IMO not big enough to drive things by itself. There are no cheap and easy wins on the CPU side. They could make an octa-core Cortex-A15 or A57 but I think we both know that would be outrageous.

I can claim then that the "power problem" for T4 is not in its majority tied to the manufacturing process?

Sure. But it doesn't matter. nVidia has basically one option here: big.LITTLE. If you think Tegra 5 will contain it then fine - but I doubt nVidia will swallow its pride on this one. Not when Denver is pending. They may have no faith in the technology to begin with.

So let's just say, if they can't move away from Cortex-A15 or A57 4+1 arrangement then they need 20nm to make it worthwhile.

According to rumors TSMC has not granted to any other bigger competitors any priorities for 20nm.

Do they tend to give priorities? What bearing does this have on schedule?

That former link of yours that paints the typical pretty marketing picture TSMC draws before each of their new process releases makes me almost think or suggest otherwise when it comes to Apple. It's not even granted that the majority of the ramblings in that writeup are accurate.

A lot of sites reported the same thing. It's pretty straightforward. I don't think TSMC is claiming something differently.

Yes, their schedule claims are not to be trusted, but if they really are two months ahead for something that was pretty near term that's positive news and should only improve the prospects for 20nm in 2014.

And no I'm not going to ignite any funky conspiracy theories, but if Apple truly has a major chunk of its production volume scheduled for 20nm at TSMC it is going to affect quite a few other IHVs at least to some degree.

I'm not saying that 20nm doesn't make more sense for Logan/NV; I just can't believe yet the 20nm and early 2014 combination that's all.

Apple could disrupt things, although rumors are that Apple wasn't able to buy priority allotments like everyone else.

I agree that 20nm seems hard for early 2014. But I also don't consider that to be a hard date at all. Why deny TSMC's schedules but take nVidia's at face value? Neither have ever kept close to them :p
 
Nvidia openly complained about TSMC's 20nm process, in terms of transistor cost vs 28nm, stating that there was little saving. Now perhaps the situation has been improved, but given Nvidia's early issues with TSMC's 28nm process, a move to 20nm & new architecture could be risky, especially if the risk is not offset by cost savings. 28nm HPM would offer improved performance, on a mature process, so 20nm is not a certainty.

I think the writer of the Extremetech article sensationalized far too much. NVIDIA never claimed that 20nm is "essentially worthless" as the writer suggests. NVIDIA privately discussed the challenges with moving to newer and more advanced fabrication process nodes. Wafer prices are increasing exponentially while newer fabrication processes no longer provide lower normalized transistor cost over time vs. older fabrication processes. That said, there is no question that power efficiency increases when moving to newer and more advanced fabrication process nodes, so companies such as NVIDIA will continue to invest in these nodes.

Regarding a mega Gflop, extreme edition T5 Tablet SoC, and ignoring engineering arguments, who would actually buy that chip, that would come with a price premium over baby T5. No Apple, Intel or Samsung, the other Android vendors have shown no ability to sell tablets in serious volume, let alone premium priced iPad competitors, Nexus tablets are popular, but more in the price sensitive end of the market.

I never suggested 1000 GFLOPS throughput for Tegra 5! The expectation is that Tegra 5 will have ~ 200 GFLOPS throughput at a minimum and up to ~ 400 GFLOPS throughput at a maximum. Using a 20nm fabrication process may allow them to achieve these performance goals while still having reasonably good power efficiency.
 
I never suggested 1000 GFLOPS throughput for Tegra 5! The expectation is that Tegra 5 will have ~ 200 GFLOPS throughput at a minimum and up to ~ 400 GFLOPS throughput at a maximum. Using a 20nm fabrication process may allow them to achieve these performance goals while still having reasonably good power efficiency.

I meant mega as in more badboy, rather than an absolute figure, a 400 Gflop T5 would fall into the category of badboy IMO:D
 
Tegra 4 is already not really looking great on the power consumption front; they just don't cover the lower end of the CPU perf/W spectrum well vs other competitors

Actually Tegra 4 does look good on the power consumption front in scenarios where the battery saver core can be used. When comparing Tegra 4 in a reference 1080p phone to the Snapdragon S4 Pro in a Droid DNA 1080p phone, Tegra 4 uses 24% less power with video playback, 20% less power with video recording, 15% less power with ebook/web reading, 55% less power with audio playback, and 47% less power during standby: http://hothardware.com/newsimages/Item24654/small_Tegra4-Battery-Claim.jpg

So my expectation is that Tegra 4's perf/W curve will barely look better than Tegra 3's, if at all for the lower regions.

Tegra 4 should have significantly better power efficiency in general compared to Tegra 3 because the CPU can complete tasks much more quickly and then switch over to the battery saver core for longer periods of time to save power. I would expect web browsing battery life to be significantly better on Tegra 4 compared to Tegra 3 for precisely this reason.
 
Last edited by a moderator:
Yeah, and did you see the power numbers nVidia claimed for Tegra 3 vs contemporary SoCs? Uncontrolled tests that are impossible for third parties to repeat mean little to me. Lumping together "ebook/web browsing" says to me that the CPU is probably halted for most of the time, not merely running at a lowish frequency on the so-called battery saver core. The other stuff will be dominated by components that aren't the CPU.

Saying that the lower power core uses 40% less power at 825MHz than a Tegra 3 Cortex-A9 at 1.6GHz is a hard figure. And it doesn't sound good at all. It doesn't matter if a core rushes to sleep in half the time if it uses three times as much power while active, unless a lot of power is burned from everything else in the system during that time.
 
Yeah, and did you see the power numbers nVidia claimed for Tegra 3 vs contemporary SoCs? Uncontrolled tests that are impossible for third parties to repeat mean little to me.

Those were measured results on 1080p phones. If you are willing to believe the measured SPECInt results from NVIDIA, then why would you be unwilling to believe the measured audio/video playback results from NVIDIA too?

Saying that the lower power core uses 40% less power at 825MHz than a Tegra 3 Cortex-A9 at 1.6GHz is a hard figure. And it doesn't sound good at all.

That doesn't make sense. Tegra 4 has 75% greater SPECInt performance per watt compared to Tegra 3, and 40% lower power for a given SPECInt performance level, and that "doesn't sound good at all"? What were you expecting, 100% lower power?

It doesn't matter if a core rushes to sleep in half the time if it uses three times as much power while active, unless a lot of power is burned from everything else in the system during that time.

What makes you think that quad-core Cortex A15 at 28nm fabrication process uses "three times as much power while active" compared to quad-core Cortex A9 at 40nm fabrication process with web page rendering and loading?
 
I don't recall Qualcomm being satisfied with their yields though, nor them meeting their goals back then which was official. If yields would had been better we might have seen higher Krait frequencies earlier than just recently.
I remembered it as Qualcomm not being satisfied with the available manufacturing capacity, not necessarily the yields. I could be wrong of course. Anyone got a source on this?

As for the Apple on TSMC. If/When we see Apple use TSMC to manufacture their chips, I expect them to first do a relatively low volume test chip on the process. Something like they did with the 32 nm A5 in the Apple TV and eventually the iPad 2. That means we'll first see a low production run of some sort of 20 nm Apple chip, which will not really deprive other companies of fab capacity. At least not initially. It could cause problems when Apple start ramping up volume, but by then chances are that TSMC will have sufficiently increased its available manufacturing capacity. All speculation of course. I'm still not entirely convinced that Apple will switch to TSMC. It's quite plausible for them to do so, but I'll believe it when I see it.

Eh. That was all a bit off topic for this thread unfortunately....
 
Those were measured results on 1080p phones. If you are willing to believe the measured SPECInt results from NVIDIA, then why would you be unwilling to believe the measured audio/video playback results from NVIDIA too?

I don't doubt the measurements, I'm saying they mean little to me because they're under arbitrary test conditions of which we know almost nothing and can't reproduce.

That doesn't make sense. Tegra 4 has 75% greater SPECInt performance per watt compared to Tegra 3, and 40% lower power for a given SPECInt performance level, and that "doesn't sound good at all"? What were you expecting, 100% lower power?

75% greater perf/W in SPECInt? According to whom? How did you get this number?

The 40% number was strictly for 825MHz Tegra 4 vs 1.6GHz Tegra 3, or at least that's what Anandtech said. It's misleading because few programs will perform at the same speed on an 825MHz Cortex-A15 and 1.6GHz Cortex-A9 - I speak from personal experience on this one. SPECInt 2k is an outlier on this (without seeing the individual tests I can't say why). This applies to the Krait scores even more so.

While it wasn't said outright that the 825MHz power number was on the power saver core it's kind of heavily implied given that that's the max clock speed of it. Why else would they give a number at this particular speed?

What makes you think that quad-core Cortex A15 at 28nm fabrication process uses "three times as much power while active" compared to quad-core Cortex A9 at 40nm fabrication process with web page rendering and loading?

Uhh, I was just using that as an example, not literal numbers....
 
That doesn't make sense. Tegra 4 has 75% greater SPECInt performance per watt compared to Tegra 3, and 40% lower power for a given SPECInt performance level, and that "doesn't sound good at all"? What were you expecting, 100% lower power?
Exophase believes (quite rightly I suspect) that SPECInt is basically a best case for the A15 and performance will not increase as much in most benchmarks - therefore the perf/W improvement would also be less impressive.

The *huge* question is whether 825MHz is actually on the power saver core or not. Everyone is assuming that it is but I'm not sure that's the case. It's a little known fact that Tegra 3's high speed cores are actually slightly *lower power* at 500MHz than the battery saver core. The only reason why the battery saver core can clock that high is so that it can handle short bursts and more importantly not be too slow for the 2ms it takes to wake up the other cores. And it might make sense to have the same minimum frequency on the high speed cores as the maximum frequency on the battery saver core.

The high speed A15s on Tegra 4 are very likely lower power at their lowest voltage than the battery saver core at the same frequency (and/or at its highest voltage). That's just the way 4+1 works in practice. So given NVIDIA obviously wanted the lowest possible power for Tegra 4 on that slide, I suspect it's actually a high speed core at its lowest voltage, and Tegra 4's battery saver core can clock significantly lower. Of course the A15s still need to be at their lowest possible voltage to significantly beat Tegra 3 at its highest voltage so it's nearly certainly less efficient at the process nominal voltage but it might work out fine with a bit of luck. I still wish they moved to a 4xA15+(4+1)xA7 architecture in Tegra 5 but we'll see.
 
I don't doubt the measurements, I'm saying they mean little to me because they're under arbitrary test conditions of which we know almost nothing and can't reproduce.

That's fine, but at least with respect to video playback, NVIDIA did demonstrate at MWC 2013 that power consumption hovers close to 920mW on a Tegra 4 reference phone: http://www.youtube.com/watch?v=Ne1nT_g5_vs

75% greater perf/W in SPECInt? According to whom? How did you get this number?

This is according to NVIDIA: http://images.anandtech.com/doci/6787/Screen Shot 2013-02-24 at 2.56.13 PM.png

While it wasn't said outright that the 825MHz power number was on the power saver core it's kind of heavily implied given that that's the max clock speed of it. Why else would they give a number at this particular speed?

According to Anandtech, the battery saver fifth CPU core will run up to 700-800MHz depending on SKU: http://www.anandtech.com/show/6550/...00-5th-core-is-a15-28nm-hpm-ue-category-3-lte . So this implies that the SPECInt data is from one of the main Cortex A15 CPU cores.
 
Back
Top