NVIDIA Tegra Architecture

french toast · Jan 17, 2012

With NVIDIA just releasing the TEGRA 3 it has a very competitive chip with a powerfull gpu.

Nvidia has also released a rather impressive roadmap, which at first glance would give the impression that its rivals should give up.

However under closer inspection it seems things may not be quite so easy. Tegra 3 GPU is not as powerfull as NVIDIA would have you believe, it does not beat out a gpu that was released nearly a year previous (A5 SGX543MP2)
And if you look at that impressive roadmap, it also seems not quite so impressive when you dig a little deeper.
Nvidia claimed 5x better graphics with tegra 3, however that number included the quadcore A9's, when in actual fact the gpu was closer to 2x.

The above chart then probably follows a similar pattern.With Wayne probably closer to 2x TEGRA 3, and if you imagine that Wayne can't be a new architecture to only be 2x, then it may not have full compatibility for future API's.

IMG TECH, for instance will have far far more powerfull gpu's out this year (543 mp4? 554 mp2? @ 400mhz?) based on their 'old' 5 series tech, and will be introducing its game changing 6 series 'rogue' with in 12 months.

If you take a gaze at other GPU vendors, Qualcomm with their ADRENO 3xx series, ARM with their MALI T604 & T658. Vivendi and broadcomm all providing advanced gpus, offering next gen API's such open GL-ES 3.0 Haiti as a minimum and DX 11, where does that leave NVIDIA??

Is that roadmap ambitious enough for NVIDA to succeed?

Ailuros · Jan 17, 2012

french toast said:
The above chart then probably follows a similar pattern.With the 5x Wayne probably closer to 2x TEGRA 3, and if you imagine that Wayne can't be a new architecture to only be 2x, then it may not have full compatibility for future API's.

Why? If I take it at face value and since it's only a performance roadmap on SoC basis, I could assume that because they might support future APIs a lot of transistors go into supporting those instead of investing them on the performance level. I'm not saying that it will be like that, but I'm merely wondering what drove you to that conclusion?

Wayne will be most likely on 28nm and between that and 40G for Tegra3 there's quite some headroom, since TSMC doesn't have a 32nm process.

IMG TECH, for instance will have far far more powerfull gpu's out this year (543 mp4? 554 mp2? @ 400mhz?) based on their 'old' 5 series tech, and will be introducing its game changing 6 series 'rogue' with in 12 months.

If you take a gaze at other GPU vendors, Qualcomm with their ADRENO 3xx series, ARM with their MALI T604 & T658. Vivendi and broadcomm all providing advanced gpus, offering next gen API's such open GL-ES 3.0 Haiti as a minimum and DX 11, where does that leave NVIDIA??

IMG, ARM and Vivante are merely selling GPU IP; they're not competing with NV on the same level since the latter operates as a SoC manufacturer. For the time being and up until Wayne they haven't more than one SoC per year because they're fairly new to the market and it's natural that at least until the business takes off they're using a rather conservative strategy at first. Around Wayne timeframe they'll have a smaller smart-phone oriented SoC codenamed Grey.
Up until NV releases any Wayne related details there's no real answer to that question, but since NV doesn't intend to only serve Android devices but also win8 devices not delivering on time something that is on DX11 doesn't sound like a smart idea.

Because NV isn't selling any GPU IP to third parties like IMG, ARM, Vivante, it doesn't make all that much sense for them to go multi-core, since they can't develop a gazillion of SoCs for different markets and with multi-core there's always some redundancy involved; it's better for them to just increase unit amounts within each GPU of a SoC generation. As few as possible for each timeframe. GPU IP IHVs have it a lot easier there. They desing a core that's multi-core capable and scale from there for each market's needs. F.e. one core for mainstream smartphones, two cores for high end smartphones, 4 cores for tablets etc.

Is that roadmap ambitious enough for NVIDA to succeed?

It's merely a rough performance estimate. Since CPU cores for small form factor SoCs can't scale its amounts endlessly and high frequencies are a bad idea in a primarily power consumption restricted environment, I'd say that they'll draw a large portion out of the scaling performance of future SoCs from the GPU block. I'd even suspect that the Wayne placement in that graph is complete nonsense as much as the parts after it. Wayne should be higher and the following parts somewhat lower. But that's marketing and estimated performance increases probably from marketing folks in order to entertain their own engineers.

french toast · Jan 17, 2012

Why? If I take it at face value and since it's only a performance roadmap on SoC basis, I could assume that because they might support future APIs a lot of transistors go into supporting those instead of investing them on the performance level. I'm not saying that it will be like that, but I'm merely wondering what drove you to that conclusion?

Well i was thinking Next gen API's do need extra transisters like you point out, but if they were going to the trouble of designing a gpu with those extra abilities, surely then it would make sense to bump up the performance more than 2x??
Take ARM for example, their Mali t-604 claims 5x AND the extra API's, with the folllowing t-658 another 2-4 times faster than that..
IMG TECH something similar, and likely Adreno as well.

So from that point of view if they were adding new API's i would think they would do a complete redesign, unless what you propose is true and they just add open GL ES 3.0, and leave out the expensive open gl 4.0 & DX 11.

But even in that instance, would only 2x be competitive? after all NVIDIA prides its self on having the out right best GPU performance and 2x is not going to cut it for that mantle...

Ailuros · Jan 17, 2012

french toast said:
Well i was thinking Next gen API's do need extra transisters like you point out, but if they were going to the trouble of designing a gpu with those extra abilities, surely then it would make sense to bump up the performance more than 2x??

I'll just quote myself then:
I'd even suspect that the Wayne placement in that graph is complete nonsense as much as the parts after it. Wayne should be higher and the following parts somewhat lower. But that's marketing and estimated performance increases probably from marketing folks in order to entertain their own engineers.

If Wayne should be as a SoC only 2x times as fast as Tegra3, then they probably have a problem. Better if Logan should be up to 75x times faster than Tegra2 than they probably are using some sort of pixy dust to achieve that. That's marketing for you.

Take ARM for example, their Mali t-604 claims 5x AND the extra API's, with the folllowing t-658 another 2-4 times faster than that..
IMG TECH something similar, and likely Adreno as well.

Again NVIDIA is comparing in that graph total SoC performance increases and not just GPU performance increases like the others you mention. Apples vs. oranges and yes there's a healthy portion of marketing involved in those claims too.

NV claims that the ULP GF in Tegra3 is compared to the ULP GF in Tegra2 up to 3x times faster. In terms of FLOPs from the pixel shader ALUs on paper it is, but that's about it.

So from that point of view if they were adding new API's i would think they would do a complete redesign, unless what you propose is true and they just add open GL ES 3.0, and leave out the expensive open gl 4.0 & DX 11.

I didn't say or imply anything like that. I said that under the win8 light for small form factor platforms they'd be foolish not to go DX11 eventually.

But even in that instance, would only 2x be competitive? after all NVIDIA prides its self on having the out right best GPU performance and 2x is not going to cut it for that mantle...

See above.

french toast · Jan 18, 2012

Lol, there is no need to re quote your self, i understood you perfectly well the first time, just because you have an opinion doesn't make it a certain fact, its all subjective speculation, that what both of us are doing here isn't it?

If we take every piece of technology roadmap and just dismiss it as 'marketing' then im afraid we all would have very little to talk about with regards to future chips wouldn't we?

The roadmap has to based on a certain amount of truth, of course its only an outline, things get to change/dropped/moved forward, the above roadmap is not nailed on the be 100% accurate..but it does give us an idea though doesn't it?

As you already know, chips take years to plan out, they can only change so much, they cant for instance swap a gpu thats dx 9.3 and only 2x more powerfull to one thats 5x more powerfull with dx11 in a short time span can they?

So with those tight parameters we can assume at least for the next 2-3 years that the above is vaguely accurate, bar clock speeds, API support etc.

My point is, to increase the API support above 9.3, they will need a complete architecture change..to accomodate things like unified shaders and tesserlation.
If they were going to do such a change, NVIDIA with their gpu knowhow would be leveraging alot more performance than 2-3 times just for the gpu alone, which the above slide does not seem to indicate.

So if the above is accurate then IMHO it is not ambitious enough, they may have shot their bolt.

Just my own opinion mind, thats why i started this thread to hear others..

Ailuros · Jan 18, 2012

french toast said:
Lol, there is no need to re quote your self, i understood you perfectly well the first time, just because you have an opinion doesn't make it a certain fact, its all subjective speculation, that what both of us are doing here isn't it?

If we take every piece of technology roadmap and just dismiss it as 'marketing' then im afraid we all would have very little to talk about with regards to future chips wouldn't we?

Depends how much you want to take marketing word for word or if you're willing to read behind what their stating. It's the actual job of any marketing to present matters in the most optimistic fashion, regardless if on average things are way more balanced and make way more sense.

What's worst is that I recall reading a quote from someone from NV but unfortunately didn't keep a link being sarcastic what would anyone want to do with a 75x times increase in small form factor devices in such a short timeframe.

The roadmap has to based on a certain amount of truth, of course its only an outline, things get to change/dropped/moved forward, the above roadmap is not nailed on the be 100% accurate..but it does give us an idea though doesn't it?

Those slides get usually created by marketing people and some mistakes are common place. Since the scale in the graph goes from 10x to 100x all you'd need is placing Wayne a notch higher. 2x times more SoC performance under 28nm compared to Tegra3@40nm sounds like a joke. If it would be true all they'd need is a direct shrink with somewhat higher frequencies to T3 and the claim is reached. Not impossible but the trouble then is that it conflicts with another more recent slide when they announced Grey. In that one Grey is meant to serve mainstream/smart-phone markets and Wayne to serve tablets and reaching into the netbook if not higher realms.

As you already know, chips take years to plan out, they can only change so much, they cant for instance swap a gpu thats dx 9.3 and only 2x more powerfull to one thats 5x more powerfull with dx11 in a short time span can they?

If NV would develop it from ground zero (meaning a tabula rasa) it would sound nearly impossible. Tegra3 is already DX9Level3 irrrelevant of what OGL_ES so far exposes, so their next generation for Wayne/Tegra4 is simply a dark spot for now. A possible scenario would be that Wayne's ULP GeForce is on DX10 level (which would be enough I guess to cover the early win8 needs and they could finally start dealing with GPGPU on their Tegras) and the succeeding generation being then on DX11. And yes if you're going by sterile FLOP counts Wayne could very well have 5x times more FLOPs and be on average 2x or 3x times faster than the T3 GPU. It depends how creative your marketing is and how you're counting FLOPs. In Tegra3 vs. Tegra2 for the GPU they counted only pixel shader ALU FLOPs because the went from 1 Vec4 PS ALU in T2 to 2 Vec4 PS ALUs, while both T2 and T3 have just 1 Vec4 vertex shader ALU.

My point is, to increase the API support above 9.3, they will need a complete architecture change..to accomodate things like unified shaders and tesserlation.

If they were going to do such a change, NVIDIA with their gpu knowhow would be leveraging alot more performance than 2-3 times just for the gpu alone, which the above slide does not seem to indicate.

So if the above is accurate then IMHO it is not ambitious enough, they may have shot their bolt.

Just my own opinion mind, thats why i started this thread to hear others..

I didn't give you any particular facts, but rather food for thought. Again if on a SoC basis Wayne should be only up to 2x times faster than Tegra3 they're in deep shit.

french toast · Jan 18, 2012

Depends how much you want to take marketing word for word or if you're willing to read behind what their stating. It's the actual job of any marketing to present matters in the most optimistic fashion, regardless if on average things are way more balanced and make way more sense.

Well lets hope for NV sake it is just some marketing, if it is optimistic marketing, it has to be about the worst 'optimism' ive seen.

Were basically on the same page, it seems unbelievable that the slide is a true accurate representation of there line up, because if it was they would be in deep shit indeed.

I think they have to have a redesign for wayne, as it is coming q4 2012/2013.
Not rehash old geforce designs, else they are going to be the graphics specialist with the worst graphics.

How difficult would it be for NV to just go DX 10 -gl es 3.0? that would still involve unified shaders right? and yea in that instance 'flops' would increase the performance past the 5 times mark.

Ailuros · Jan 18, 2012

french toast said:
Well lets hope for NV sake it is just some marketing, if it is optimistic marketing, it has to be about the worst 'optimism' ive seen.

It could be just a simple mistake. As I said above since that stuff comes from marketing guys, errors aren't a surprise.

I think they have to have a redesign for wayne, as it is coming q4 2012/2013.
Not rehash old geforce designs, else they are going to be the graphics specialist with the worst graphics.

Performance aside, they don't have the hottest GPU in town at the moment either. In terms of graphics capabilities the current ULP GeForces are anything but worthy of one of the world's largest GPU IHVs.

How difficult would it be for NV to just go DX 10 -gl es 3.0? that would still involve unified shaders right?

I'd say that going >DX9 and not going for a USC would be nonsense. I don't see NV having any kind of difficulties with any DX support; what could limit them in theory to something DX10 only, would be die area for example. Depends what they're exactly planning for the rest of the SoC. Up to Tegra3 they've dedicated a relatively small portion of the SoC area estate to the GPU. Let's see when that will change.

french toast · Jan 18, 2012

In Tegra3 vs. Tegra2 for the GPU they counted only pixel shader ALU FLOPs because the went from 1 Vec4 PS ALU in T2 to 2 Vec4 PS ALUs, while both T2 and T3 have just 1 Vec4 vertex shader ALU

Question; Is a pixel shader/pipe the same as the Mali fragment processor?

Exophase · Jan 18, 2012

Ailuros said:
In Tegra3 vs. Tegra2 for the GPU they counted only pixel shader ALU FLOPs because the went from 1 Vec4 PS ALU in T2 to 2 Vec4 PS ALUs, while both T2 and T3 have just 1 Vec4 vertex shader ALU.

What about clock speed? Also, any scaling on TMUs, ROPs, triangle setup limits, etc.. If they were never really vertex shader limited to begin with it'd be a pretty fair claim. And having as much VS as PS ALUs in Tegra 2 seems like it was really unbalanced, especially if the former is FP32 and the latter is only FP20. I always wondered what was up with that, if the vertex shaders had worse performance metrics in some other way or just supported a lot fewer ops directly..

Ailuros · Jan 19, 2012

french toast said:
Question; Is a pixel shader/pipe the same as the Mali fragment processor?

Tegra1 and 2 have 1 Vec4 FP20 PS ALU and Tegra3 2 Vec4 FP20 (?) ALUs (at different frequencies).

Mali400 has 1 Vec4 FP16 PS ALU per core, with a maximum of 4 Vec4 FP16 PS ALUs in the Mali400MP4.

Exophase said:
What about clock speed? Also, any scaling on TMUs, ROPs, triangle setup limits, etc.. If they were never really vertex shader limited to begin with it'd be a pretty fair claim. And having as much VS as PS ALUs in Tegra 2 seems like it was really unbalanced, especially if the former is FP32 and the latter is only FP20. I always wondered what was up with that, if the vertex shaders had worse performance metrics in some other way or just supported a lot fewer ops directly..

NV isn't very clear about those things, unless they've released anything for T3 in the meantime that I've missed. I always assumed that Tegra1 and 2 had 2 TMUs, but I'm not so sure about it anymore. It could very well be just 1 TMU, but whether 1 or 2 I'd say that that amount is stable between all 3 SoCs. Since it seems that the ULP GeForce block in Tegra3 has a smaller analogy to the total SoC die estate than the GPU block in T3 I'd assume that they might have just added another Vec4 PS ALU and increased the frequency from 333MHz T20 to 520MHz T30 (AP20=300MHz, AP30=416MHz). Since frequency between the two increased by roughly 56% there's frankly no need to change anything for the VS ALUs and probably not for the triangle setup either. And yes the VS ALUs in all Tegras are FP32.

In GLBenchmark2.1 T30 scores in Egypt 720p offscreen by 2.1x times more than T20 (from the same vendor) and I've added also the Lenovo K2 which runs at nearly 1080p, where T30 scores in 1080p Egypt standard almost as much as T20 in 720p Egypt standard:

http://www.glbenchmark.com/compare.... Eee Pad Transformer TF101&D3=Lenovo LePad K2

In case you haven't read it yet: http://www.nvidia.com/content/PDF/t...ing_High-End_Graphics_to_Handheld_Devices.pdf

Page 7:

The GeForce GPU includes four pixel shader cores and four vertex shader cores for high speed vertex and pixel processing. The GPU pipeline uses an 80-bit RBGA pixel format with FP20 data precision in the pixel pipeline, and FP32 precision in the vertex pipeline.

I don't see any other GPU related whitepapers on the Tegra whitepaper page here: http://www.nvidia.com/object/white-papers.html which strengthens my suspicion that either nothing or very little has changed between T2 and T3.

french toast · Jan 19, 2012

Tegra1 and 2 have 1 Vec4 FP20 PS ALU and Tegra3 2 Vec4 FP20 (?) ALUs (at different frequencies).

Mali400 has 1 Vec4 FP16 PS ALU per core, with a maximum of 4 Vec4 FP16 PS ALUs in the Mali400MP4.

Cheers!.

anexanhume · Jan 20, 2012

You also have to keep in mind their claims of performance don't hold much weight because they'll be working with the same IP that all the other SoC manufacturers have. Maybe their GPUs will pull ahead, but right now it seems they have their hands full with beating ImgTec on efficiency in the mobile space.

Exophase · Jan 20, 2012

anexanhume said:
You also have to keep in mind their claims of performance don't hold much weight because they'll be working with the same IP that all the other SoC manufacturers have. Maybe their GPUs will pull ahead, but right now it seems they have their hands full with beating ImgTec on efficiency in the mobile space.

Not forever, one would assume that eventually they'll be using Project Denver ARM cores. I'm also under the impression that they use their own memory controller IP (makes sense, they would have a lot of experience with it from GPU and northbridge development) where other vendors may be using third party IP. Probably including IP provided by ARM. nVidia has claimed in the past that they use a more efficient (while, currently, narrower) memory interface. Based on numbers I've seen for some SoCs I could see this being true at least some of the time.

Of course then you have other vendors like Qualcomm which is throwing as much first party stuff on their SoCs as they can manage.

ltcommander.data · Jan 20, 2012

If Tegra 3's pixel shaders are still FP20, how come they are demoing DX9 and Windows 8 tablets since DX9 requires at least FP24? Are they emulating FP24 or has Microsoft given them an exemption?

anexanhume · Jan 20, 2012

Exophase said:
Not forever, one would assume that eventually they'll be using Project Denver ARM cores. I'm also under the impression that they use their own memory controller IP (makes sense, they would have a lot of experience with it from GPU and northbridge development) where other vendors may be using third party IP. Probably including IP provided by ARM. nVidia has claimed in the past that they use a more efficient (while, currently, narrower) memory interface. Based on numbers I've seen for some SoCs I could see this being true at least some of the time.

Of course then you have other vendors like Qualcomm which is throwing as much first party stuff on their SoCs as they can manage.

Well, if anyone knows how expensive wide memory buses can be, it's Nvidia. Wouldn't surprise me if they had optimizations for narrower buses as a consequence. They're also the first to use LPDDR3 AFAIK. BTW, go back to the gp32x/pandora boards

french toast · Jan 21, 2012

Wayne is going to have to be At least a unified shader core, with support for dx10 & GL ES 3.0. They can leave out DX 11 till a later date as it is not going to be used in real games on mobile for a couple of years i would think...

You would think they have factored this in, after all they are very experienced at this, i don't think they would leave out an important marketing feature as 'Haiti'.

I think they maybe have underestimated the competition, the die area for the gpu in tegra is actually quite small am i correct? so they could have easilly afforded more power if they had wanted.
Tegra 1 had the reputation as being a real jump up in graphics power to what we were used to, since then they have been very conservative on the gpu side.

But their unified shader architecture i would presume will be very efficient, as they have loads of experience, isn't an IMR architecture better for more powerfull gpus' over TBDR?

Ailuros · Jan 21, 2012

ltcommander.data said:
If Tegra 3's pixel shaders are still FP20, how come they are demoing DX9 and Windows 8 tablets since DX9 requires at least FP24? Are they emulating FP24 or has Microsoft given them an exemption?

FP20 and 16bit Z should be exposed in OGL_ES probably partially for performance reasons. At least T3 should be >FP20 and that's why I used a question mark.

Ailuros · Jan 21, 2012

french toast said:
Wayne is going to have to be At least a unified shader core, with support for dx10 & GL ES 3.0. They can leave out DX 11 till a later date as it is not going to be used in real games on mobile for a couple of years i would think...

Most likely yes, no idea for DX11 though and when. If they should wait for real mobile games to use DX11 they'll wait a mighty long time.

You would think they have factored this in, after all they are very experienced at this, i don't think they would leave out an important marketing feature as 'Haiti'.

API compliance is less about marketing in such a case and more about supporting future API requirements in order to not run into any shortcomings against the competition. OGL_ES doesn't have to go as high in requirements for linux kernel based OSs (iOS, Android) for win8 though and higher it'll be a totally different chapter. Supporting DX11 for win8 or successing windows OS versions won't be just about marketing either.

I think they maybe have underestimated the competition, the die area for the gpu in tegra is actually quite small am i correct? so they could have easilly afforded more power if they had wanted.

Tegra3 is around 80mm2 big under 40nm/TSMC. NV can't afford considering manufacturing costs any much bigger SoC than that. Granted the concentration is more on the CPU side, but that's a clear design choice.

Tegra 1 had the reputation as being a real jump up in graphics power to what we were used to, since then they have been very conservative on the gpu side.

The differences between the ULP GeForces in all 3 Tegras are relatively small. No real change in architecture.

But their unified shader architecture i would presume will be very efficient, as they have loads of experience, isn't an IMR architecture better for more powerfull gpus' over TBDR?

Show me one recent TBDR desktop high end GPU I can investigate in terms of efficiency against a high end IMR and then you'd have at least some data for conclusions. Obviously IMG is concentrating for many years on small form factor markets now, but the last high end graphics design from PowerVR that saw the light of the day was the Dreamcast GPU. Comparisons given it's meager specifications and die area the GPU block used compared to PS1 & PS2 are easy.

I.S.T. · Jan 27, 2012

You're forgetting the Kyro series...

NVIDIA Tegra Architecture

french toast

Ailuros

Epsilon plus three

french toast

Ailuros

Epsilon plus three

french toast

Ailuros

Epsilon plus three

french toast

Ailuros

Epsilon plus three

french toast

Exophase

Ailuros

Epsilon plus three

french toast

anexanhume

Exophase

ltcommander.data

anexanhume

french toast

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

I.S.T.

Similar threads