NVIDIA Tegra Architecture

Im thinking qualcomm has the sure bet next/this generation...nvidia has taken this trillion gpu core nonsense too far....tegra 3 was not an efficient chip...how the hell is tegra 4 with 6x the old style alus...and 4 power hungry a15s (5?) Clocked substantially higher going to fit into a smartphone with good batterylife?..

The battery saver core on tegra 3 didnt seem to be that impressive in real world did it not?...tegra 4 ups it to eagle...dual channel memory controller..more cache?...is process improvement enough to reduce power and heat whilst increasing performance to the claimed 6x??

Impressive if so, I have doubts.

And we seemed to have left intels mobile offerings out of any performance equation...why? Cloverview plus looks at least on paper to be very competitive with current last gen..if not THE LEADER...in day to day performace and battery life. ..hell that would be unthinkable even 1 year ago..remember when we all (yes yes including me:/) laughed intels medfield leak out of town?...check out the hanfs on reviews of it..it is an extremely good chip...the motorolla razr I won several comparison shootouts against the razr m!...and it came in affordable slim smsrtphones....incredible of you ask me and seemingly overlooked around here...wonder why?

That was there old written off technology..an old high clocked sgx 540....cloverview plus is much better...just think if intel pulls all the stops out with valley view and equips it with rogue...lpddr3. ...fully OoO redesigned atom that resembles something more akin to a proper modern intel processor...all stuck on 22nm trigate..baseband included....

That my friends will be the best mobile chip around when it debuts. ..it just wont sell like qualcomm..at first.

Oh 22nm seemed to me at least to have been pretty average at best on ivybridge...but we now know that process technology works better at lower voltages..

Just imagine a multi cluster rogue setup on that process..only more mature...exciting or what? ?
 
Last edited by a moderator:
Im thinking qualcomm has the sure bet next/this generation...nvidia has taken this trillion gpu core nonsense too far....tegra 3 was not an efficient chip...how the hell is tegra 4 with 6x the old style alus...and 4 power hungry a15s (5?) Clocked substantially higher going to fit into a smartphone with good batterylife?..

The battery saver core on tegra 3 didnt seem to be that impressive in real world did it not?...tegra 4 ups it to eagle...dual channel memory controller..more cache?...is process improvement enough to reduce power and heat whilst increasing performance to the claimed 6x??

Impressive if so, I have doubts.

And we seemed to have left intels mobile offerings out of any performance equation...why? Cloverview plus looks at least on paper to be very competitive with current last gen..if not THE LEADER...in day to day performace and battery life. ..hell that would be unthinkable even 1 year ago..remember when we all (yes yes including me:/) laughed intels medfield leak out of town?...check out the hanfs on reviews of it..it is an extremely good chip...the motorolla razr I won several comparison shootouts against the razr m!...and it came in affordable slim smsrtphones....incredible of you ask me and seemingly overlooked around here...wonder why?

That was there old written off technology..an old high clocked sgx 540....cloverview plus is much better...just think if intel pulls all the stops out with valley view and equips it with rogue...lpddr3. ...fully OoO redesigned atom that resembles something more akin to a proper modern intel processor...all stuck on 22nm trigate..baseband included....

That my friends will be the best mobile chip around when it debuts. ..it just wont sell like qualcomm..at first.

Oh 22nm seemed to me at least to have been pretty average at best on ivybridge...but we now know that process technology works better at lower voltages..

Just imagine a multi cluster rogue setup on that process..only more mature...exciting or what? ?

Don't forget that Tegra 3 was a 40 chip whereas Krait was 28nm. No doubt that has an impact. I'm sure even Exynos 4 Quad at 32nm with higher clocks would have had battery issues, but Samsung always had full control of the stack from the kernel used to the size of the battery and the aggressiveness of power-saving features in the OS. It's all about optimizations.
 
Im thinking qualcomm has the sure bet next/this generation...nvidia has taken this trillion gpu core nonsense too far....tegra 3 was not an efficient chip...how the hell is tegra 4 with 6x the old style alus...and 4 power hungry a15s (5?) Clocked substantially higher going to fit into a smartphone with good batterylife?..

The battery saver core on tegra 3 didnt seem to be that impressive in real world did it not?...tegra 4 ups it to eagle...dual channel memory controller..more cache?...is process improvement enough to reduce power and heat whilst increasing performance to the claimed 6x??

Impressive if so, I have doubts.

And we seemed to have left intels mobile offerings out of any performance equation...why? Cloverview plus looks at least on paper to be very competitive with current last gen..if not THE LEADER...in day to day performace and battery life. ..hell that would be unthinkable even 1 year ago..remember when we all (yes yes including me:/) laughed intels medfield leak out of town?...check out the hanfs on reviews of it..it is an extremely good chip...the motorolla razr I won several comparison shootouts against the razr m!...and it came in affordable slim smsrtphones....incredible of you ask me and seemingly overlooked around here...wonder why?

That was there old written off technology..an old high clocked sgx 540....cloverview plus is much better...just think if intel pulls all the stops out with valley view and equips it with rogue...lpddr3. ...fully OoO redesigned atom that resembles something more akin to a proper modern intel processor...all stuck on 22nm trigate..baseband included....

That my friends will be the best mobile chip around when it debuts. ..it just wont sell like qualcomm..at first.

Oh 22nm seemed to me at least to have been pretty average at best on ivybridge...but we now know that process technology works better at lower voltages..

Just imagine a multi cluster rogue setup on that process..only more mature...exciting or what? ?

The next Atom is Valleview. And it uses Intels own GPU architecture, not Rogue

And by the time that comes out, it will be competing with Tegra 5, not Tegra 4
 
The next Atom is Valleview. And it uses Intels own GPU architecture, not Rogue

And by the time that comes out, it will be competing with Tegra 5, not Tegra 4

I know :) are you sure about that? Tegra 3 was demoed at ces 2011 was it not? Its 2013 and still no tegra 4 devices have shown up....I thought valley view was late 2013 release?

I didnt realise they were using there own gpu uarch...thats dissapointing too be honest...lets see what they manage to cook up...may suprise some people.
 
Don't forget that Tegra 3 was a 40 chip whereas Krait was 28nm. No doubt that has an impact. I'm sure even Exynos 4 Quad at 32nm with higher clocks would have had battery issues, but Samsung always had full control of the stack from the kernel used to the size of the battery and the aggressiveness of power-saving features in the OS. It's all about optimizations.

Yes tegra 3 used a mature 40nm chip...but im not sure how usefull that baby core is...perhaps the benefits will be more pronounced with a15 uarch. ..although this is somewhat mitigated by the fact that the shadow core is also a15...

Switching to 28nm hkmg is going to bring some good power consumption advantages...however uppimg the alu count by 6x...with the same inefficient non unified uarch...and switching to quad a15s clocked at a high 1.9ghz is going to really suck more power..especially playing games unless some serious thermal throttling is going on...

Exynos 4412 was an excellent chip ppwer consumption and performance wise...samsung has stated 75% power savings with their new series 5 octa....which with cortex a7s in big little and 28nm hkmg seems believable....how is nvidia going to beat that when they suggest only 45% savings over the inefficent 40nm tegra 3?..

Nvidia has a history of inaccurate hyperbole. ..perhaps asus has tested and found this out.
 
I know :) are you sure about that? Tegra 3 was demoed at ces 2011 was it not? Its 2013 and still no tegra 4 devices have shown up....I thought valley view was late 2013 release?

I didnt realise they were using there own gpu uarch...thats dissapointing too be honest...lets see what they manage to cook up...may suprise some people.

I'm not sure whether NVIDIA ever demoed Tegra 3 at CES 2011, but I do know they released it in November of 2011 and it was less than 3 months later that devices started showing up. They just released Tegra 4 and NVIDIA itself has said that we should expect devices based on it in Q2 of this year. Seems pretty reasonable to me, given how things have been for them in the past.

ValleyView for Tablets will be released this year, along with some other version I believe. I haven't really read anything about some sort of version of ValleyView for phones. Has anyone got any information relating to Intel's future smartphone plans? What will come after Clovertrail+?
 
...but im not sure how usefull that baby core is...perhaps the benefits will be more pronounced with a15 uarch. ..although this is somewhat mitigated by the fact that the shadow core is also a15...

The A15 companion core in Tegra 4 will run at a max clock operating frequency of only 700-800MHz, which is nearly 1/3 the max clock operating frequency of the four main A15 cores. The companion core should be very useful with light tasks.

Switching to 28nm hkmg is going to bring some good power consumption advantages...however uppimg the alu count by 6x...with the same inefficient non unified uarch

The non-unified shader architecture of Tegra 4 may actually be relatively efficient when looking at die size area, transistor count, and power consumption. Note that when there are no vertex shading tasks being performed, the vertex shaders can be clock-gated and put in a low power state. And when the pixel shaders are working on calculations that do not require texture fetches, the texture units can be clock-gated. And when the GPU is not actively rendering anything, the system memory can be put in a low power state. Tegra also has a pretty sophisticated DVFS technique which surely has been refined and improved over time. See whitepaper here: http://www.nvidia.com/content/PDF/t...ing_High-End_Graphics_to_Handheld_Devices.pdf

...and switching to quad a15s clocked at a high 1.9ghz is going to really suck more power..especially playing games unless some serious thermal throttling is going on...

For those who want to use a handheld device for heavy gaming, Shield would be the best option. That said, the four A15 main cores in Tegra 4 will naturally have clock-gating and thermal overload protection circuitry too. NVIDIA should have some room to adjust CPU/GPU clock operating frequencies to better fit thermal/heat requirements for different form factors and for different handheld devices.
 
Considering power and/or clock gating of units on mobile SFF GPUs: who says that it's not common practice on GPUs with USC ALUs anyway?
 
The A15 companion core in Tegra 4 will run at a max clock operating frequency of only 700-800MHz, which is nearly 1/3 the max clock operating frequency of the four main A15 cores. The companion core should be very useful with light tasks.



The non-unified shader architecture of Tegra 4 may actually be relatively efficient when looking at die size area, transistor count, and power consumption. Note that when there are no vertex shading tasks being performed, the vertex shaders can be clock-gated and put in a low power state. And when the pixel shaders are working on calculations that do not require texture fetches, the texture units can be clock-gated. And when the GPU is not actively rendering anything, the system memory can be put in a low power state. Tegra also has a pretty sophisticated DVFS technique which surely has been refined and improved over time. See whitepaper here: http://www.nvidia.com/content/PDF/t...ing_High-End_Graphics_to_Handheld_Devices.pdf



For those who want to use a handheld device for heavy gaming, Shield would be the best option. That said, the four A15 main cores in Tegra 4 will naturally have clock-gating and thermal overload protection circuitry too. NVIDIA should have some room to adjust CPU/GPU clock operating frequencies to better fit thermal/heat requirements for different form factors and for different handheld devices.

Thanks for the white paper...interesting read :) yes thermal limits will be in effect..and yes tegra does have various power saving techniques..
However we are comparing it to other previous tegra likely with only minor improvements to those features.

I believe that to be th case because if they were going to invest serious engineering resources into tegra 4 gpu...they would surely just update the whole uarch no?

The fact is the claimed performance of tegra 4 vs tegra 3 seems to far out weigh the process impovements and minor power saving improvements that come along with it...

Nvidia themselfs claim an improvement of 45% over tegra 3.... how the hell is this possible with the performances enhancements they promise??
 
Nvidia themselfs claim an improvement of 45% over tegra 3.... how the hell is this possible with the performances enhancements they promise??

Simple; these are probably the case scenarios where more performance is already redundant and you obviously can gain from higher frequencies combined with a smaller process such an advantage (hello it's from 40nm/T3 to 28nm/T4 after all).

Did NVIDIA claim that that 45% improvement will be under stressfull 3D for example? :rolleyes:
 
The A15 companion core in Tegra 4 will run at a max clock operating frequency of only 700-800MHz, which is nearly 1/3 the max clock operating frequency of the four main A15 cores. The companion core should be very useful with light tasks.

But the only relevant question is how much less power it uses at 800MHz when compared to the other cores. Supposedly the only difference between the two cores is layout. I have no idea how big of a difference this makes, maybe someone else can give some kind of idea. I wonder if they were both heavily hand optimized?

TSMC says that their 28nm HPM process delivers similar leakage to the LP process (if really so I wonder what the disadvantage is then), so it may not have made sense to have LP parts. TSMC still seems to push the HPL process for devices like phones.

french toast said:
Nvidia themselfs claim an improvement of 45% over tegra 3.... how the hell is this possible with the performances enhancements they promise??

I take the claim to mean 45% lower power consumption while providing the same level of performance. Definitely not while maxing out the GPU, much less the CPU. The battery life vs capacity figures given for Shield make this obvious.

This number is pretty realistic, all things considered. The process shrink should provide more than usual since it's transitioning to HKMG. And for some part of the perf/W curve a typical Cortex-A15 may do better than a typical Cortex-A9 even on the same process because of how power consumption scales with frequency. A lot of people are ragging on Cortex-A15's power consumption (on Exynos 5) but no one has really tried analyzing a perf/W curve, they've only looked at power consumption at peak.

This applies much more so for the GPU. Perf will scale well with functional unit count, especially in these ranges, so they get a more immediately obvious win of trading area for efficiency on top of the process advantage. And they probably have improved the efficiency of the intrinsic design. Render target compression alone will probably save power.
 
But the only relevant question is how much less power it uses at 800MHz when compared to the other cores. Supposedly the only difference between the two cores is layout. I have no idea how big of a difference this makes, maybe someone else can give some kind of idea. I wonder if they were both heavily hand optimized?

TSMC says that their 28nm HPM process delivers similar leakage to the LP process (if really so I wonder what the disadvantage is then), so it may not have made sense to have LP parts. TSMC still seems to push the HPL process for devices like phones.



I take the claim to mean 45% lower power consumption while providing the same level of performance. Definitely not while maxing out the GPU, much less the CPU. The battery life vs capacity figures given for Shield make this obvious.

This number is pretty realistic, all things considered. The process shrink should provide more than usual since it's transitioning to HKMG. And for some part of the perf/W curve a typical Cortex-A15 may do better than a typical Cortex-A9 even on the same process because of how power consumption scales with frequency. A lot of people are ragging on Cortex-A15's power consumption (on Exynos 5) but no one has really tried analyzing a perf/W curve, they've only looked at power consumption at peak.

This applies much more so for the GPU. Perf will scale well with functional unit count, especially in these ranges, so they get a more immediately obvious win of trading area for efficiency on top of the process advantage. And they probably have improved the efficiency of the intrinsic design. Render target compression alone will probably save power.

Thanks this and ailuros description makings sense..however asus stumbling on tegra adoption (if true) points to something....maybe only cost or availability and nothing to do do with performance?

However although they dont specify when used at maximum...playinga game would tax the processors would it not? .if its a tegra optimised games more so...as it would likely use all cores.....in that gaming scenario. ..an obvious nvidia selling point...in is ...In my limited opinion...likely to draw more power than tegra 3...and we wont be seeing 6x the performance in smartphones at all...more performance multi core hyperbole from nvidia...

If you compare thos to qualcomm and even samsung....they dont come out with crazy 6x perfomace figures for their next gen hardware...its more like 50/70%..or 2x....

I expect nvidia will do well to see a 3x gpu improvement in smartphones lower clocked a15s.
 
TSMC says that their 28nm HPM process delivers similar leakage to the LP process (if really so I wonder what the disadvantage is then), so it may not have made sense to have LP parts. TSMC still seems to push the HPL process for devices like phones.
Tegra 4 is completely 28nm HPL isn't it?
 
Tegra 4 is completely 28nm HPL isn't it?

My mistake; Anandtech initially reported HPM and I didn't realize they had corrected it to HPL. Thanks for the correction.

Thanks this and ailuros description makings sense..however asus stumbling on tegra adoption (if true) points to something....maybe only cost or availability and nothing to do do with performance?

These reports are way too preliminary and vague to draw much from, I wouldn't jump to any conclusions just yet. Although I doubt Tegra 4 is going to be a big hit for phones, much like I don't think Samsung will put Exynos 5 Dual in one..

However although they dont specify when used at maximum...playinga game would tax the processors would it not? .if its a tegra optimised games more so...as it would likely use all cores.....in that gaming scenario. ..an obvious nvidia selling point...in is ...In my limited opinion...likely to draw more power than tegra 3...and we wont be seeing 6x the performance in smartphones at all...more performance multi core hyperbole from nvidia...

You really think phone games are going to overnight increase their CPU requirements by something like 3x? As it is now I'm told most are heavily GPU limited and don't need a lot of CPU, even in SoCs with stronger GPUs. That's the norm because it's a lot easier to scale to use available GPU power than to scale CPU demands. That goes in both directions. The game can turn down resolution, effects, frame rate, etc to lower GPU load, but lowering CPU load is a lot harder. And while some games may by Tegra optimized I'm skeptical as to how many really want to develop exclusively for nVidia.. and even if they do, exclusively for Tegra 4 and not Tegra 3.

But they will burn more power with the GPU unless you cap the settings, and AFAIK most mobile games don't give you a lot of choices to manually lower quality.

nVidia already admitted this, they say almost outright that Shield can draw around 8W under normal use cases (gaming probably, probably not with an especially heavy CPU load).

If you compare thos to qualcomm and even samsung....they dont come out with crazy 6x perfomace figures for their next gen hardware...its more like 50/70%..or 2x....

I expect nvidia will do well to see a 3x gpu improvement in smartphones lower clocked a15s.

Where has nVidia actually claimed 6x? I thought I remembered a 3-4x performance improvement claim..
 
But the only relevant question is how much less power it uses at 800MHz when compared to the other cores. Supposedly the only difference between the two cores is layout. I have no idea how big of a difference this makes, maybe someone else can give some kind of idea. I wonder if they were both heavily hand optimized?

Is it possible to reduce the voltage of the companion core further separately to the main cores? If so, a pretty hefty power reduction would be possible. Dropping clocks reduces power consumption substantially, dropping voltage even more so.

I have to admit to being a complete layman when it comes to knowing anything about split voltage planes in a chip and what is possible. :smile:
 
Is it possible to reduce the voltage of the companion core further separately to the main cores? If so, a pretty hefty power reduction would be possible. Dropping clocks reduces power consumption substantially, dropping voltage even more so.

I have to admit to being a complete layman when it comes to knowing anything about split voltage planes in a chip and what is possible. :smile:

On Tegra 3 it isn't possible to run the companion core simultaneously with any of the other four cores in the first place, so the question of split voltage or frequency planes becomes moot. While it's possible nVidia changed this for Tegra 4 I doubt it, because they haven't mentioned anything to that effect.

I don't know if the companion core can run at lower voltages than the main cores or not.. on Tegra 3 I think the case was actually the opposite of this because the companion core was on a more leakage optimized process that needs higher core voltages for the same clock speed. It's all on the same process for Tegra 4.. I don't know if you can do anything to optimize for voltage in layout. But if it does run at a lower voltage I'm sure the same rails will be able to provide it.
 
Is it possible to reduce the voltage of the companion core further separately to the main cores? If so, a pretty hefty power reduction would be possible. Dropping clocks reduces power consumption substantially, dropping voltage even more so.

I have to admit to being a complete layman when it comes to knowing anything about split voltage planes in a chip and what is possible. :smile:

I think they said the shadow core uses less power than tegra 3...which does seem possible with better process and likely better optimisations to the prism uarch.
 
My mistake; Anandtech initially reported HPM and I didn't realize they had corrected it to HPL. Thanks for the correction.



These reports are way too preliminary and vague to draw much from, I wouldn't jump to any conclusions just yet. Although I doubt Tegra 4 is going to be a big hit for phones, much like I don't think Samsung will put Exynos 5 Dual in one..



You really think phone games are going to overnight increase their CPU requirements by something like 3x? As it is now I'm told most are heavily GPU limited and don't need a lot of CPU, even in SoCs with stronger GPUs. That's the norm because it's a lot easier to scale to use available GPU power than to scale CPU demands. That goes in both directions. The game can turn down resolution, effects, frame rate, etc to lower GPU load, but lowering CPU load is a lot harder. And while some games may by Tegra optimized I'm skeptical as to how many really want to develop exclusively for nVidia.. and even if they do, exclusively for Tegra 4 and not Tegra 3.

But they will burn more power with the GPU unless you cap the settings, and AFAIK most mobile games don't give you a lot of choices to manually lower quality.

nVidia already admitted this, they say almost outright that Shield can draw around 8W under normal use cases (gaming probably, probably not with an especially heavy CPU load).



Where has nVidia actually claimed 6x? I thought I remembered a 3-4x performance improvement claim..

Wow 8w is massive for a mobile chip!...wonder how much ipad 4 consumes?...

Anandtech reports that nvidia has indicated to them it will be faster than ipad 4....which will have to be around 5-6 times faster to do that....thats assuming they use the same gpu clocks in a tablet design @ 520mhz
http://www.anandtech.com/show/6666/the-tegra-4-gpu-nvidia-claims-better-performance-than-ipad-4
Wikipedia:
http://en.wikipedia.org/wiki/Tegra
" The SoC is said to be about 20 times faster than Tegra 2 and 6 times faster than Tegra 3"
Found a slide which backs up this claim...
uploadfromtaptalk1359962861358.jpg

I think its fair to say they have claimed or suggested that Tegra 4 will be hitting 6x the performance of tegra 3 and also said it uses less power...im sure as I watched the ces unveiling that they said the gpu its self uses less power...although dont quote me on that as I would have to re watch it to make sure.
I really dont think this is achievable in a smartphone with the design that they showed..no way!!.

Besides qualcomms snapdragon 800 os shaping up to be a hell of a chip...a quite believable 75% performace increase with spruced up and clocked up krait 400s....that is up to 2.3ghz per core....not burst mode that nvidia implies....cores should be able to run at that speed untill thermal/power limits put the stoppers on, where as nvidia will use some turbo mode fpr the a15s.... (quite sensible)...so im guessing cpu power will belong to the snapdragon.

Gpu power they claim 50% performance increase (ik guessing gaming/gl benchmark 2.5?) And 2x compute increase....with the gpu using half the power of "last gen".. (whether that means 225 or 320 who knows?)

What we do know is adreno 320 on very early drivers is still the most powerfull smartphone gpu out there.. (something I predicted would be the case last summer naysayers;) )
So with the performace increase of the gpu core...combined with likely new drivers for that new uarch and increased bandwidth (snapdragon 600 will be interesting for this very reason....is the adreno 320 also bandwidth starved?...we get to find out!)
Im guessing at least untill apple A7/valleyview qualcomm will hold its lead on smartphone gpus....a massive achievement of that turns out to be a year :)...simila claims can also be made on its cpus (s4 pro) and modems..

Samsung has released details bof its series 5 octa which im guessing you would have known thst on day one....they claim..again quite believable on low end tasks...that thier big-little uarch consumes up to 75% less power than I assume 4 quad....woth an on stage demo pointong this out showing each core usage...very impressive snd much more believable than nvidia seeing as how exynos 4 quad turned out to be so bloody good.
Their gpu tech in the series 5 octa is rumoured to be sgx 544 mp3 @ 500MHz or so...very dissapointing of true...I was hoping for some next gen hard ware like mali t604 or even better t654....although that last one may be jumping the gun somewhat...still they claim gpu power is twice that of exynos 4 quad...again very believable and lower power consumption on the gpu is also believable as they thst would spread the performance across more alus and at a better process.

At those power consumption targets I will guarantee series 5 octa will be hitting the galaxy s4....and im gonna bet on 1.7-1.8 ghz judging how they like some continuity between SOCs..
 
When companies say that their new SoC will be X times more powerful and Y times more power efficient they never mean those two things at the same time. If they actually imply this it's some marketing person screwing it up. Usually they're careful not to.

The claims also tend to be pretty vague, especially the power consumption ones. And often end up being just plain wrong. You shouldn't take them that seriously.

I expect that all four of Exynos 5 Octa's Cortex-A15 cores running at even 1.7GHz will be too much for a phone to bear. That'll probably use over 5W just for the CPUs, unless the bins have much better characteristics (I doubt the 28nm process has strikingly better power consumption vs the 32nm one). So I don't think you'll see that clock speed unless some cores are turned off, maybe all but one.
 
Back
Top