NVIDIA Tegra Architecture

I'm not sure anymore but I think it's somewhat over a factor 5x for which I'd gladly stand corrected if its more.

http://www.glbenchmark.com/compare.jsp?D1=Apple+iPad+4&D2=Apple+iPad&D3=Apple+iPad+2&cols=3

Unfortunately there's no 2.5 score for the iPad1 available. But for GL2.1 the difference is at >24x at 1080p. iPad4 compared to iPad2 in GL2.5 is at a factor of 3.5x.

On a pure theoretical hw level between a SGX535@200MHz and a SGX554MP4@280MHz the highest factor increase is for arithmetic at =/>45x and the lowest being fillrate at 5.6x.

I was referring to power consumption.

Another metric that Apple grew a lot was how much die space they were willing to dedicate to GPU but that's hitting reasonable limits too..
 
Another metric that Apple grew a lot was how much die space they were willing to dedicate to GPU but that's hitting reasonable limits too..

Why? What's a reasonable limit for a GPU's die space in a SoC and why would there be such a thing?
 
Why? What's a reasonable limit for a GPU's die space in a SoC and why would there be such a thing?

You mean, why would there be a reasonable limit on die size? I'm not answering that obvious question.

Other parts of the die, like media engines, display controllers, I/O, image processing, etc might not demand the same kind of scaling so GPU + CPU can pick up some extra room from that. But this isn't something that'll keep happening. And don't expect overall die size to get that much bigger.
 
Not only were my statements pretty vague, which means from one side that I'm not going too deep into such details on purpose since it's not easy to compare the two in real time measurements. As for the specifications, what makes you think that I'm not aware of them?



Well then let's hear why. I'm sure you also have a full specification list of Apple's next generation SoC GPU :rolleyes:

Apple A5 (iPad 2) ---> March 2011
Apple A6x ---> October 2012.
About 4x increase in graphic pure power in about one and a half year.
So, I think (and it's a personal opinion, not the absolute truth :rolleyes: ), it's possible that xBox 360 level could be reached soon.
 
Apple A5 (iPad 2) ---> March 2011
Apple A6x ---> October 2012.
About 4x increase in graphic pure power in about one and a half year.
So, I think (and it's a personal opinion, not the absolute truth :rolleyes: ), it's possible that xBox 360 level could be reached soon.

http://xkcd.com/605/

(although I don't disagree that XBox 360 level will be reached pretty soon, just that I think it'll slow down and take a lot longer to reach Durango and Orbis)
 
You mean, why would there be a reasonable limit on die size? I'm not answering that obvious question.

From your post I thought you meant there should be a limit on the proportion of the SoC that is dedicated to the GPU.

"Larger" GPUs (more transistors) should be iterativy compensated with smaller process nodes.
 
Of course there's a limit to the proportion of die area that the GPU can take, it can't exceed 100% :p (in other words, this too can't keep scaling, and probably won't keep growing that much more - CPU is important too)

All one needs to do is look at discrete GPUs for comparison. In the 90s and early 2000s they grew a lot faster than in the last couple iterations, because power consumption and die size (not just number of transistors, actual size) grew a lot. For the last few generations things have slowed down a lot now that they hit a power consumption limit they don't want to exceed, a die size limit that makes yields unmanageably bad if you exceed it, and slower process node advancement.

The exact same thing is going to happen with nVidia's SoCs unless a new market potential opens up for much higher power consuming devices than tablets. And no, OUYA won't be that market that justifies an entirely different SoC, even if it got the volumes it's way too low cost.
 
The 554MP4@280MHz should easily level with the XBox1 GPU and in 2014 the latest their GPUs in tablets will level with the XBox360 GPU. The difference in processes between iPad1 and iPad4 is merely from 45 to [strike]28nm[/strike] (***edit: 32 and not 28nm) and while the scalability pace will slow down at some point it affects far less GPUs than CPUs, exactly because for the first performance can scale linearly with added cores or clusters and because they don't necessarily need exotic frequencies due their nature concentrating on high parallelism.

That high parallelism comes at a cost in both die space and power consumption however. And will affect mobile GPUs far more than desktop GPUs. Mobile SOCs don't have the option, IMO, of going over 400 mm^2 or even 200 mm^2 just for the GPU. Likewise they aren't likely to ever exceed even 20 watts power consumption where desktop GPUs can hit well over 200. And even budget ones hit over 50.

As well, GPUs on the desktop already hit their power wall a few years ago. Which is why progress has slowed quite significantly. And if rumors of delays for both Nvidia and AMD are true then the situation has gotten considerably worse.

Mobile GPUs still have some room, but not all that much, IMO.

NV is still a corner case for the SFF mobile market; they're a SoC manufacturer and up to now they didn't have the luxury to have more than one SoC per year. Only now with T4 they'll go for Grey a mainstream SoC, but they still don't have the luxury to design and sell a higher number of SoCs on a yearly cadence. GPU IP providers are a completely different story there, as there any of their partners can chose from a wide portofolio of performance levels. Especially for someone like Apple which doesn't necessarily mind to have big SoCs that exceed the 160mm2 level for a tablet.[

In cliff notes to take NV as a paradigm for the entire SFF mobile market is more than weird, since tendencies are usually where the real volume is.

Hence why I used Exynos 5 as an example as it at least has had a fairly nice study on the power characteristics of the chip, the CPU and the GPU.

The GPU in that can already exceed 4 watts. That's already getting to the point where it will make light weight mobile devices (tablets) more difficult to design. It's basically at a wall for mobile designs unless battery power density increases to match increased power consumption. Otherwise we'll have devices getting heavier and heavier. Or battery life getting shorter.

I asked on purpose if anyone dares to predict how big upcoming console SoCs could be with two node shrinks down the future but no one seems to be willing to bite the bullet.

It's impossible to guess since we don't even have the smallest clue as to how large the Durango or Orbis SOCs are.

Regards,
SB
 
Silent_Buddha said:
As well, GPUs on the desktop already hit their power wall a few years ago. Which is why progress has slowed quite significantly. And if rumors of delays for both Nvidia and AMD are true then the situation has gotten considerably worse.
While this is true, I actually expect the situation to improve somewhat in the future. A not insignificant amount of power is tied up providing bandwidth to GPUs today. If something like HMC comes along and dramatically increases bandwidth/watt, then that will dramatically increase performance/watt as the power can either just be saved or be spent doing actual work.
 
While this is true, I actually expect the situation to improve somewhat in the future. A not insignificant amount of power is tied up providing bandwidth to GPUs today. If something like HMC comes along and dramatically increases bandwidth/watt, then that will dramatically increase performance/watt as the power can either just be saved or be spent doing actual work.

That will help somewhat with overall power consumption. But reducing board consumption doesn't necessarily equate to being able to increase GPU consumption. Both vendors are faced with a fairly involved task of trying to keep their monolithic GPUs cool under load without having the noise level of the cooling subsystem becoming so intrusive that a user would rather not use it.

Increasing the power envelope of the GPU's much higher than what they currently are isn't a real option, IMO.

It represents a similar design constraint in the consumer space to the battery weight, bulk, and heat constraint of mobile device design.

It could be possible that one vendor or the other will take a gamble on more exotic cooling solutions, but that also potentially increases the cost and complexity introduces more areas for component failures.

Meanwhile for the mobile SOC makers and their associated GPUs there's not much you can do about their design constraints other than hope a shrink brings meaningful improvements. And with shrinks scaling less and less well as you move to smaller and smaller nodes, there aren't going to be many miracles available there either.

Regards,
SB
 
Silent_Buddha said:
But reducing board consumption doesn't necessarily equate to being able to increase GPU consumption.
Well, it isn't just board consumption. There are ways to reduce energy use for moving data around the chip as well. I think there are some fairly decent gains to be made over the next several years.
 
That high parallelism comes at a cost in both die space and power consumption however. And will affect mobile GPUs far more than desktop GPUs. Mobile SOCs don't have the option, IMO, of going over 400 mm^2 or even 200 mm^2 just for the GPU. Likewise they aren't likely to ever exceed even 20 watts power consumption where desktop GPUs can hit well over 200. And even budget ones hit over 50.

As well, GPUs on the desktop already hit their power wall a few years ago. Which is why progress has slowed quite significantly. And if rumors of delays for both Nvidia and AMD are true then the situation has gotten considerably worse.

Mobile GPUs still have some room, but not all that much, IMO.

There's a new generation of mobile GPUs ahead to unfold (for NV it might take 1 or more Tegras depending on their roadmap). With each new process (on estimate for the time being 2-3 years for each full node for the time being) and a new hw generation every 4-5 years. We're at the verge of 28nm for those and a new GPU generation at the same time, so there's one major step ahead escalating with a slower pace until the next process node and then we'll have in about 5 years or so another generation.

No it's fairly impossible that SFF SoCs will scale beyond 200mm2 and that 400mm2 paradigm sounds even exaggerated for upcoming console SoCs. Let's say they're in the 300-350@28nm league; with 2 shrinks down the road console manufacturers will be able to go down to either south of 200 or at 200mm2 and no those SoCs weren't designed from the get go for low power envelopes either.

Assume under 28nm some SoC designers go again as far as up to 160mm2 and hypothetically "just" 10mm2 get dedicated into GPU ALUs. Based on the data I've fetched you need for synthesis only 0.01mm2 at 1GHz /28HP for each FP32 unit. Consider that synthesis isn't obviously the entire story and you don't necessarily need neither 1GHz frequencies nor would you use HP for a SFF SoC. Under 28HP 1GHz however the theoretical peak is at 2 TFLOPs and you may dump down from there. Note that I'm not confident that those GPUs will reach or come close to the TFLOP range under 28nm, but I'd be VERY surprised if they won't under 20LP and no not obviously at its start.

NV and AMD are slowing things down but it's a strategic move since it's better to milk the remaining crop of gaming enthusiasts at high margins, than go for volume. However GK110 comes within this month and it'll have somewhere north of 4.5 TFLOPs FP32 (which is 3x times the GF110 peak value) and the Maxwell top dog in either late 2014 or early 2015 isn't going to scale by less if projections will be met and that's always not will all clusters enabled.

Hence why I used Exynos 5 as an example as it at least has had a fairly nice study on the power characteristics of the chip, the CPU and the GPU.
What is it's peak power envelope 8W with roughly half going for the CPU and half the GPU? Another nice uber-minority paradigm to judge from what exactly? Not only is the T604 market penentration for the moment uber-ridiculous, but we'd have to ask why on earth ARM was so eager to integrate that early FP64 units into the GPU. I'll leave it up to anyone's speculation how much it affects die area and power consumption exactly, but in extension to the former synthesis rate of 0.01mm2@28HP you need for the same process same frequency 0.025mm2. That's a factor of 2.5x and while they obviously have only a limited number of FP64 units in T604, it's not a particular wonder that they're stuck at "just" 72GFLOPs peak theoretical while a upcoming G6400 Rogue would be on estimate at the same frequency at over 170GFLOPs.

***edit: note that I have not a single clue how ARM integrated FP64, but whether they used FP32 units with loops or dedicated FP64 units, it's going to affect die area either way.

The GPU in that can already exceed 4 watts. That's already getting to the point where it will make light weight mobile devices (tablets) more difficult to design. It's basically at a wall for mobile designs unless battery power density increases to match increased power consumption. Otherwise we'll have devices getting heavier and heavier. Or battery life getting shorter.
See above. It yields in GL2.5 around 4100 frames while the 554MP4@280MHz is at ~5900. Still wondering why Samsung picked a 544MP3 at high frequencies for the octacore? Even more ridiculous the latter will be somewhat faster than the Nexus10 GPU. An even dumber question from my side would be why Samsung didn't chose a Mali4xxMP8 instead for the octacore; both die area and power consumption would had been quite attractive.

It's impossible to guess since we don't even have the smallest clue as to how large the Durango or Orbis SOCs are.

Regards,
SB
See first paragraph above; as a layman I have the luxury to not mind to ridicule myself. I'm merely waiting to stand corrected for my chain of thought. Mark that I NEVER supported the original posters 5 year notion for Tegras. I said more than once that if you stretch that timespan it's NOT impossible.
 
From your post I thought you meant there should be a limit on the proportion of the SoC that is dedicated to the GPU.

"Larger" GPUs (more transistors) should be iterativy compensated with smaller process nodes.

Yep but 5 years for Tegras (especially for something like the OUYA consoles) is quite a bit of a stretch. Make that a minimum of 8 years and we have a game. I was willing to bet that we'll see USC SIMDs in T4; I'd like to bet those for T5 but can you or anyone else guarantee it at this point? You're losing one year from OUYA's using yesteryears SoCs, another year of generational difference and another healthy bit of the fact that NV cannot go that easily yet for >100mm2 SoCs and therefore dedicate as much die area for GPUs. If you clock a G6400 at 600MHz you get around 200GFLOPs, that's almost 3x times of what you get from a T4 GPU.
 
Of course there's a limit to the proportion of die area that the GPU can take, it can't exceed 100% :p (in other words, this too can't keep scaling, and probably won't keep growing that much more - CPU is important too).

Could it be that you're all forgetting that tablet manufacturers might end up as anal to go beyond 10" displays? If yes that would be the primary limiting factor for future tablets IMHO besides all others. Obviously they might be some low hanging fruit for display technology to tackle, but by how much exactly?

Let's assume they stick for the foreseeable future at maximum 10" and mobile games don't render in higher than 1080p resolutions (irrelevant of display native resolution); yes those hypothetical SoCs will be CPU bound, but there's a vsync cap in place also. I could think of quite a few IQ improving tricks one could use in such cases. IMG's Venice demo for Rogue showcases 8xMSAA; no it's not publicly available yet but if you do a search in youtube it shouldn't be too hard to find.

All one needs to do is look at discrete GPUs for comparison. In the 90s and early 2000s they grew a lot faster than in the last couple iterations, because power consumption and die size (not just number of transistors, actual size) grew a lot. For the last few generations things have slowed down a lot now that they hit a power consumption limit they don't want to exceed, a die size limit that makes yields unmanageably bad if you exceed it, and slower process node advancement.
The primary limiting factor for current desktop GPUs are manufacturing processes and their respective roadmaps. Processes get increasingly problematic. Exactly because power consumption is more critical than die area itself NVIDIA went instead of a 3:1 SP/DP ratio from the same FP32 ALUs to have dedicated FP64 units in Keplers. On GK110 there are 960 FP64 SPs and K20X has 896@732MHz with an entire board TDP of 235W.

Process nodes aside if anyone goes out and tell me that GPU architectures still don't have a LOT of low hanging fruit to pick I'd be very eager to hear why. Yes it takes longer for GPUs to appear these days due to the process problems but a GTX680 today is in real time almost twice as fast as a 560Ti and no those aren't peak values. The fact that IHVs come along and continue to come along with uber-prices isn't necessarily alone due to manufacturing costs having increased by some margin at least IMHO. There's also a "let's milk the cow as long as we can" factor playing a role there.

***edit: by the way to avoid any misunderstandings for all that "GFLOP craze" I still think that many might be understimating GPGPU for SFF. How about someone fills me in why Vivante as a quite young IP startup has sold more GPU IP than ARM itself according to JPR statistics and even further why they've won the BMW embedded automotive deal where NV was also contending with Tegras.
 
Apple A5 (iPad 2) ---> March 2011
Apple A6x ---> October 2012.
About 4x increase in graphic pure power in about one and a half year.
So, I think (and it's a personal opinion, not the absolute truth :rolleyes: ), it's possible that xBox 360 level could be reached soon.

What kind of frequencies is Apple using up to now for its GPUs and in what way are they related to CPU frequencies exactly? In theory a 4 cluster Rogue could be on Xenos GPU level, but for that you'd need to reach/exceed the 600MHz frequency level.

I wish I knew myself what Apple is cooking exactly, but you're assumption is completely worthless without knowing what Apple has planned. Besides an Nx times increase in GFLOPs doesn't necessarily mean an equal increase of final GPU performance. At theoretical 600MHz for a G6400 you have f.e. just a tad over 2x times more fillrate than for a 554MP4@280MHz.
 
Adreno 320 is a very very good smartphone chip...qualcomm doesnt change clocks on their smart phone gpus in comparison to tablets. ..what they claim will happen with x soc (ie S4 pro)will be the same for either smartphone or tablet..unlike nvidia who put out ridiculous performance figures which will never see the light of day in a smartphone.

Do you really think that Snapdragon S4 Pro (with quad-core Krait CPU and Adreno 320 GPU) is a "very very good" chip for smartphones? Here is what Anandtech had to say about the S4 Pro (http://www.anandtech.com/show/6425/google-nexus-4-and-nexus-10-review/2): "The Nexus 4 was really hot by the end of our GLBenchmark run, which does point to some thermal throttling going on here. I do wonder if the Snapdragon S4 Pro is a bit too much for a smartphone, and is better suited for a tablet at 28nm".

Any performance projections that Qualcomm (or any other company including Apple, Samsung, NVIDIA, etc.) makes should be taken with a grain of salt. Note that Qualcomm's CMO Anand Chandrasekher recently claimed that Tegra 4 (with quad-core A15 CPU and non-unified 72 "core" GPU) looks a lot like the S4 Pro (!?!). Qualcomm's CEO Paul Jacobs recently claimed that Exynos 5 Octa is a publicity stunt. Qualcomm's Snapdragon VP Raj Talluri recently claimed that there is no difference between Project Shield and a Moga controller. Qualcomm is spreading FUD on their competitor's future products that have not even been released yet.
 
Do you really think that Snapdragon S4 Pro (with quad-core Krait CPU and Adreno 320 GPU) is a "very very good" chip for smartphones? Here is what Anandtech had to say about the S4 Pro (http://www.anandtech.com/show/6425/google-nexus-4-and-nexus-10-review/2): "The Nexus 4 was really hot by the end of our GLBenchmark run, which does point to some thermal throttling going on here. I do wonder if the Snapdragon S4 Pro is a bit too much for a smartphone, and is better suited for a tablet at 28nm".

Ironically the Nexus4 is the only smartphone that is affected with the thermal throttling problem. Are you sure that it's fair to stretch one problem of one singled out device over a long list of other solutions that don't seem to be plagued by anything similar?

Any performance projections that Qualcomm (or any other company including Apple, Samsung, NVIDIA, etc.) makes should be taken with a grain of salt. Note that Qualcomm's CMO Anand Chandrasekher recently claimed that Tegra 4 (with quad-core A15 CPU and non-unified 72 "core" GPU) looks a lot like the S4 Pro (!?!). Qualcomm's CEO Paul Jacobs recently claimed that Exynos 5 Octa is a publicity stunt. Qualcomm's Snapdragon VP Raj Talluri recently claimed that there is no difference between Project Shield and a Moga controller. Qualcomm is spreading FUD on their competitor's future products that have not even been released yet.

Let's take that one from the route of the problem: when you're a SoC manufacturer like Qualcomm or NVIDIA you typically don't need any fancy marketing and/or PR stunts since your customers aren't obviously the end consumers but device manufacturers/OEMs. NVIDIA introduced that kind of marketing sillyness in the SFF mobile market, so I don't think anyone seriously expected anyone else to just sit back and not react in the end. Search for Mr. Chandrasekher's CV on the net and it won't be too hard to find where he used to work before and his field of experience.

Qualcomm doesn't sell based on FUD, marketing stunts or anything else. When this year runs out we'll see who has sold how much exactly and it will be fairly easy to try to find another line of excuses why any other company couldn't meet its own expectations.
 
Ironically the Nexus4 is the only smartphone that is affected with the thermal throttling problem. Are you sure that it's fair to stretch one problem of one singled out device over a long list of other solutions that don't seem to be plagued by anything similar?

Quote from Anandtech on the same page linked above: "the [LG] Optimus G can't complete a single, continuous run of GLBenchmark 2.5 - the app will run out of texture memory and crash if you try to run through the entire suite in a single setting. The outcome is that the Optimus G avoids some otherwise nasty throttling". At the end of the day, whether it is crashing or throttling, the Snapdragon S4 Pro appears to do a poor job handling one continuous run of GLBenchmark 2.5 on the two most high profile S4 Pro smartphone products.

Qualcomm doesn't sell based on FUD, marketing stunts or anything else.

One would be incredibly naive to think this. The three examples I gave above promote FUD, pure and simple. And with respect to marketing stunts, clearly you didn't see Qualcomm's CES 2013 keynote presentation this year (http://www.youtube.com/watch?v=v7qTHbOEiDY and http://www.theverge.com/2013/1/8/3850056/qualcomms-insane-ces-2013-keynote-pictures-tweets) :D

When this year runs out we'll see who has sold how much exactly and it will be fairly easy to try to find another line of excuses why any other company couldn't meet its own expectations.

Number of units sold is completely irrelevant to the discussion at hand (in fact, sales leaders are just as likely if not more likely to promote FUD than anyone else). Qualcomm's strength in smartphone design wins is largely due to lack of alternative options for a 4G LTE baseband processor bundled with a CPU/GPU.
 
Last edited by a moderator:
Quote from Anandtech on the same page linked above: "the [LG] Optimus G can't complete a single, continuous run of GLBenchmark 2.5 - the app will run out of texture memory and crash if you try to run through the entire suite in a single setting. The outcome is that the Optimus G avoids some otherwise nasty throttling". At the end of the day, whether it is crashing or throttling, the Snapdragon S4 Pro appears to do a poor job handling one continuous run of GLBenchmark 2.5 on the two most high profile S4 Pro smartphone products.

Do the devices also throttle in games, has Anand revisited the issue ever since? Has anyone bothered so far to try to find the source of the problem and who or what is exactly to blame? I won't point any fingers without having an answer to the last question, but ironically LG is the Google 4 manufacturer.

One would be incredibly naive to think this. The three examples I gave above promote FUD, pure and simple. And with respect to marketing stunts, clearly you didn't see Qualcomm's CES 2013 keynote presentation this year :D

I don't have (thank God) or want to.

Number of units sold is completely irrelevant to the discussion at hand. Qualcomm's strength in smartphone design wins is largely due to lack of alternative options for a 4G LTE baseband processor.

Oh it would be if sales volumes wouldn't be a complete minority report for NVIDIA. Qualcomm used to be vastly successful in its fields even before LTE existed. But we can if you want drive the debate in the direction which of the two can and invests more resources in the SFF mobile market. In that regard NV is bound by its low volumes; they could theoretically go for very bold investments as early as now with the only other difference that it would burn a huge hole into their financial results.

If there's one thing I won't have a second thought applauding NV for for their Tegras it would be their GPU drivers (not really a wonder with their vast sw team and huge experience); I couldn't on the other hand claim the same thing for Qualcomm just yet.
 
Do the devices also throttle in games, has Anand revisited the issue ever since? Has anyone bothered so far to try to find the source of the problem and who or what is exactly to blame? I won't point any fingers without having an answer to the last question, but ironically LG is the Google 4 manufacturer.

Anandtech hasn't posted any followup as far as I know.

I don't have (thank God) or want to.

I just edited my post above to include a link to a short video of the CES 2013 Qualcomm keynote highlights. Quite a spectacle really.

Oh it would be if sales volumes wouldn't be a complete minority report for NVIDIA.

What does Qualcomm promoting FUD have anything to do with NVIDIA's sales volumes?

Qualcomm used to be vastly successful in its fields even before LTE existed.

Qualcomm was "born" mobile, remember? :D Qualcomm's strength in baseband processors (both 3G and 4G LTE) and focus on mobile computing are largely responsible for their success in the smartphone market today. The realization of a 4G LTE baseband processor from Icera should be a game changer for NVIDIA with respect to getting design wins in the smartphone space, but this baseband processor has only recently started sampling, so it will be some months before any commercial product is ready.
 
Last edited by a moderator:
Back
Top