NVIDIA Tegra Architecture

It is a different chapter because wherever GPGPU makes sense a GPU will usually be more efficient for it and will burn less power at the same time. The frequency differences alone between mobile CPUs and GPUs speak volumes on that matter.

Of course are a couple of things possible on a mobile CPU, however GPUs in SoCs are getting increasingly bigger and complex in order to offload CPUs as good as possible and not the other way around.

I'd like to stand corrected but afaik even in the most demanding current mobile games quad CPUs might have all their cores enabled, but are nowhere running at their peak frequencies. The more you add, the higher the workload for the CPU and inevitably the higher the power consumption. In the very least it's quite obvious that A15 cores aren't exactly mediocre when it comes to power consumption rather the contrary.
 
Well im still sitting on the fence somewhat ailuros....was hoping the big 3 android soc manufacturers would make a complete transition to next gen apis such as halti and open cl.

That combined with likely tegra 4 optimised games would have really moved gaming forward.

Developers/ISVs usually concentrate on the predominant hw when coding games. What was the GPU market share of Tegras in spring 2012 again according to JPR? 3.2%? Yes NV's DevRel has done an excellent job driving a few ISV for a limited number of optimized games, yet NV is still lightyears away from getting Qualcomm or Samsung market share or smartphone design wins if you prefer.

NV at this point needed something to win a couple of impressions; if you think over it you can win quite a few with top performance and far less with a very high feature set and quite a bit less performance.

Again from a user perspective (and outside the usual technical oriented debates we drive here) what I at least will be looking for on an upcoming device are design quality, sw stability, power consumption, performance or else typically what most users look for. N device manufacturer isn't going to glue on the smartphone box a sticker with a note "OGL_ES3.0 GPU inside" is it?

That said NV got quite a bit of criticism for the limited API support in T4 (and I'm not excluded from it); however from a business POV their decision isn't all that absurd. If now the T4 successor in about a year's time from now is still only OGL_ES2.0 they deserve to get shot ;)
 
It is a different chapter because wherever GPGPU makes sense a GPU will usually be more efficient for it and will burn less power at the same time. The frequency differences alone between mobile CPUs and GPUs speak volumes on that matter.

Let me refresh your memory, then:

Yes, we were running Havok cloth demos on multi-core CPU as well as GPU via OpenCL, all with the same OpenCL code underneath the Havok API. As was said above, there is no visible difference between the OpenCL code on either the CPU or the GPU and Havok's native code. The dancer dances off screen if you don't have the camera follow enabled, but the camera follow has a "bob" to it that makes some people sick after watching it for awhile. ;-)

We had a few demos we were cycling between. All OpenCL with no specific AMD functions or native code.
(...)

I can't find the exact hardware being used in the demo, but I'd bet the CPU was a ~3GHz Phenom II w/ 125W TDP, 758M Transistors@45nm and the GPU was a regular 750MHz HD4870 w/ 150W TDP, 956M transistors @ 55nm.


Regarding the power efficiency, I don't know how you see frequency as an absolute measure of power consumption. There's obviously more to it than that.

A quick look at anandtech's CPU and GPU power measurements in modern SoCs with balanced CPU/GPU performance (dual Krait + Adreno 225; dual A15 + Mali T604), you can see that in 3D games, the GPU is already consuming a lot more than the CPU despite being lower clocked.

If I were to implement advanced physics and AI in those games, I would obviously try to use more of the CPU instead of pushing the GPU's power resources even higher..
It's not like these GPUs have tens/hundreds of ALUs that go mostly unused, like the desktop parts.



Of course are a couple of things possible on a mobile CPU, however GPUs in SoCs are getting increasingly bigger and complex in order to offload CPUs as good as possible and not the other way around.

I don't agree with that.
GPUs in SoCs are getting increasingly bigger and complex in order to provide better performance in 2D and 3D, mostly because the screen resolutions in smartphones and tablets from all ranges are getting larger at a ridiculously fast pace.
And they mustn't become fill-rate limited because people are getting awfully anal about snappyness in their phones and tablets.

Then again, the CPUs in SoCs are also getting increasingly bigger and complex. And although getting a "quad-core" stamp looks good for PR, the fact is that all that horsepower can be used for something other than just the background services from the e-mail, facebook, twitter and weather apps.
 
Let me refresh your memory, then:
I can't find the exact hardware being used in the demo, but I'd bet the CPU was a ~3GHz Phenom II w/ 125W TDP, 758M Transistors@45nm and the GPU was a regular 750MHz HD4870 w/ 150W TDP, 956M transistors @ 55nm.

Developers will use the GPU for heterogeneous computing wherever it makes sense. IMG had shown image processing on a humble SGX540 compared to a CPU where the first was twice as fast with signficantly lower power consumption. And yes that might be just one case example, but it's against your case example which isn't necessarily relevant.

Regarding the power efficiency, I don't know how you see frequency as an absolute measure of power consumption. There's obviously more to it than that.

It isn't? Obviously a core clocked at a couple of GHz vs. a core clocked at a couple of MHz should have comparable power consumption.

A quick look at anandtech's CPU and GPU power measurements in modern SoCs with balanced CPU/GPU performance (dual Krait + Adreno 225; dual A15 + Mali T604), you can see that in 3D games, the GPU is already consuming a lot more than the CPU despite being lower clocked.

Yep 3D games where most of the graphics related tasks fall on the GPU and not the CPU. Why don't you try to throw on that very same CPU the very same tasks as the GPU with no tradeoffs and watch the glory of seconds per frame (if it will ever start rendering that is and not just dump you back to the desktop) and we'll see then how power consumption looks like or even better perf/W. A GPU is made for high parallelism for a reason and that's exactly the point why they don't need as extravangant frequencies as a CPU. In order to increase single threaded performance (which would be abysmally bad on a GPU au contraire) there are some architectural changes that help from time to time but the major leaps in terms of CPU performance there come from frequency.

If I were to implement advanced physics and AI in those games, I would obviously try to use more of the CPU instead of pushing the GPU's power resources even higher..
It's not like these GPUs have tens/hundreds of ALUs that go mostly unused, like the desktop parts.

It's not like mobile games are anywhere near as complex and demanding as desktop games either. ALUs are increasing in mobile GPUs and to that on a quite fast pace. A few years ago mobile GPUs had a couple of ALU lanes, nowadays the iPad4 has already 128 and it's not going to slow down anytime soon either. In 2014 give or take mobile GPUs will have roughly the on paper specs of a XBox360 GPU.

I don't agree with that.
GPUs in SoCs are getting increasingly bigger and complex in order to provide better performance in 2D and 3D, mostly because the screen resolutions in smartphones and tablets from all ranges are getting larger at a ridiculously fast pace.
And they mustn't become fill-rate limited because people are getting awfully anal about snappyness in their phones and tablets.

The downside being that fillrates don't increase with the pace arithmetic efficiency does in mobile GPUs. Since we're in a Tegra thread: I assume the ULP GF in T2 has 2 TMUs; 8 ALU lanes all at 333MHz. 666 MTexels/s texel fillrate, 5.33 GFLOPs. Further assuming T4 has 4 TMUs and is clocked at at least 520MHz= 2080 MTexels texel fillrate, 74.88 GFLOPs. Else an increase of 3.1x times in fillrate and an increase in GFLOPs of 14x times in just 2 years time.

Then again, the CPUs in SoCs are also getting increasingly bigger and complex. And although getting a "quad-core" stamp looks good for PR, the fact is that all that horsepower can be used for something other than just the background services from the e-mail, facebook, twitter and weather apps.

In the case future SoCs should dedicate more die area to GPUs than CPUs, it's not because I have some sort of crystal ball, but because neither CPU core count nor CPU frequencies can scale endlessly in such a power conscious SoC.
 
Because OpenCL is a platform for heterogeneous computing, so if some developer decides to implement OpenCL in his game/app, Tegra 4 will be able to run it. Nonetheless, OpenCL on ARM isn't anything new.
I know all that. But why should, of all companies, Nvidia invest in an OpenCL compiler for an ARM CPU??? Especially when their current library that'd benefit the most doesn't even use it?
 
Developers will use the GPU for heterogeneous computing wherever it makes sense. IMG had shown image processing on a humble SGX540 compared to a CPU where the first was twice as fast with signficantly lower power consumption. And yes that might be just one case example, but it's against your case example which isn't necessarily relevant.
In terms of image processing, Anand found in his iPad 3 review that iPhoto could use some help in speed and responsiveness when applying brushes since it pegs the CPUs at 100%. Now dual Cortex-A9 at 1GHz isn't exactly fast even back in early 2012, so throwing a faster CPU would certainly help for performance, but this would seem like a use case where more GPU acceleration would be helpful. Presumably iPhoto is already making as much use of OpenGL ES as it can through Core Image, but the increased flexibility of OpenCL could allow more of the algorithms to run on the GPU.
 
I know all that. But why should, of all companies, Nvidia invest in an OpenCL compiler for an ARM CPU??? Especially when their current library that'd benefit the most doesn't even use it?

Because the GeForce ULP in Tegra 4 doesn't support OpenCL, so in the case there are apps and games using OpenCL in the future, Tegra 4 would support them.
 
Because the GeForce ULP in Tegra 4 doesn't support OpenCL, so in the case there are apps and games using OpenCL in the future, Tegra 4 would support them.
In 99% of the use cases, CUDA or OpenCL is used as an accelerator. That is: small kernels that replace scraps of existing general CPU code.

There is no point in running OpenCL when there is already a C version of the same code.

So that market that you try to create is never going to be very large...
 
All im saying is that if nvida had moved to halti along with exynos and adreno...we would be more likely to be seeing halti made games...thats without any apple a7/a7x soc in the picture....if everyone moves into advanced apis and game developers decide the broad soc manufacturer compatibility makes it cost effective..then nvidia will see its puny market share stagnate.

Thats a big if..and maybe a long shot....but I was just looking forward to some tegra zone halti optimised games being played on my #hacked# galaxy s4!! :)
 
I think we have to consider what OGL ES 3.0 features are really critically useful/in high demand for new games and what features Tegra 4 actually fails to provide. For instance, nVidia already offered MRTs, mipmap level specific rendering, 16 texture units in fragment shaders, NPOT textures, texture arrays, and possibly other stuff I'm not catching. This was from a March 2012 document so it could have been updated to include more, and who knows what Tegra 4 may add.

While I'm sure it's annoying to have to use an OGL ES 2 extension instead of a standard feature, especially if it's an nVidia one, if it's useful I have no doubt you'll see it. Developers are still going to have to have to be able to fall back on ES 2 feature sets if they want to get any reasonable market share for quite a while into the future.
 
Since you mention MRTs I'm curious what the performance penalties for it are on T3 and T4.

All im saying is that if nvida had moved to halti along with exynos and adreno...we would be more likely to be seeing halti made games...thats without any apple a7/a7x soc in the picture....if everyone moves into advanced apis and game developers decide the broad soc manufacturer compatibility makes it cost effective..then nvidia will see its puny market share stagnate.

Thats a big if..and maybe a long shot....but I was just looking forward to some tegra zone halti optimised games being played on my #hacked# galaxy s4!! :smile:

At the moment we have Exynos5, Adreno3xx supporting OGL_ES3.0, Series5XT partially via additional extensions and soon Rogue. If you add all of them up in terms of market share I don't see what could stop developers if they want to using OGL_ES3.0 functionalities in upcoming games, whether NV fully supports it for the time being or not. Besides as Exophase notes NV can if they want add custom extensions for OGL_ES2.0 for those 3.0 functionalities that are already possible in T3/4.

Not only will OGL_ES3.0 game development take its time, but typically hw has to support with every new API N functionalities but it's not necessarily the case that the resulting performance is the best you can get. With early API support developers have tools to code for and IHVs can break their heads about actual performance for N functionalities for when those games actually ship.
 
I think we have to consider what OGL ES 3.0 features are really critically useful/in high demand for new games and what features Tegra 4 actually fails to provide. For instance, nVidia already offered MRTs, mipmap level specific rendering, 16 texture units in fragment shaders, NPOT textures, texture arrays, and possibly other stuff I'm not catching. This was from a March 2012 document so it could have been updated to include more, and who knows what Tegra 4 may add.

While I'm sure it's annoying to have to use an OGL ES 2 extension instead of a standard feature, especially if it's an nVidia one, if it's useful I have no doubt you'll see it. Developers are still going to have to have to be able to fall back on ES 2 feature sets if they want to get any reasonable market share for quite a while into the future.
It's notable that OpenGL ES 3.0 is backwards compatible with OpenGL ES 2.0 unlike OES 2.0 with OES 1.1. Khronos seems to allow targeting the OES 3.0 API using OES 2.0 era GLSL ES 1.0 which will provide more capabilities than the language has available in OES 2.0 but less than OES 3.0's native GLSL ES 3.0. If the initial wave of next-gen mobile games stick with GLSL ES 1.0 to provide hybrid OES 2.0/3.0 games in order to maximize code sharing, this should benefit Tegra 4 since they'll remain compatible with OES 2.0 and can't take full advantage of OES 3.0 so the visual difference won't be as great. After those initial games, nVidia's developer relations will no doubt, as you say, try to emphasize to developers the user base benefits of maintaining a OES 2.0/GLSL ES 1.0 code path in addition to a OES 3.0/GLSL ES 3.0 code path outweighs the cost/time efficiencies of going OES 3.0/GLSL ES 3.0 only.
 
Sounds like it might not be a disaster then...but still would have bern nice for nvidia to live up to its desktop graphics reputation.

Cant help but feel they are doing just enough.
 
Sounds like it might not be a disaster then...but still would have bern nice for nvidia to live up to its desktop graphics reputation.

Cant help but feel they are doing just enough.

Just enough would be a better description for Intel's mobile strategy :LOL:
 
Sounds like it might not be a disaster then...but still would have bern nice for nvidia to live up to its desktop graphics reputation.

Cant help but feel they are doing just enough.

All things considered, I am quite impressed so far by Tegra 4. Tegra 4 has an SoC die size area of ~ 80mm^2 (compared to A6X that has an SoC die size area of ~ 123mm^2). Even though A6X has an SoC die size area that is roughly 50% more (!) than Tegra 4, Tegra 4 is reportedly able to outperform A6X with respect to both CPU and GPU performance, all while having [45%?] lower average power consumption than Tegra 3 (in addition to a much faster CPU than Tegra 3, and 6x more GPU pixel shader and vertex shader execution units than Tegra 3 too). That is a pretty impressive achievement, even with the new fabrication process taken into account. Another thing to note is that Tegra 4 appears to be suitable for use in both high end smartphones and tablets. Compared to the GPU used in the latest and greatest iphone 5 smartphone, the Tegra 4 GPU should be much faster (2x faster?) in comparison, with a significantly smaller SoC die size area too (where A6 has an SoC die size area that is roughly 20% more than Tegra 4). Couple that with new smartphone-friendly features such as super-fast HDR camera (where Tegra 4 is reportedly ~ 10x faster than the iphone 5 smartphone), and Tegra 4 looks very nice for use in a variety of different handheld devices. Even time to market is not too bad considering that Tegra 4 equipped devices will appear on the market just a few months after the latest and greatest iphone and ipad.
 
Is that roadmap ambitious enough for NVIDA to succeed?

I just took a closer look at the roadmap listed in the first post of this thread, and noticed that the baseline comparison is to a Core 2 Duo CPU processor. That means that this roadmap is valid for only the CPU, not the GPU!
 
I just took a closer look at the roadmap listed in the first post of this thread, and noticed that the baseline comparison is to a Core 2 Duo CPU processor. That means that this roadmap is valid for only the CPU, not the GPU!

Good point! Yea i think tegra will be a good chip takingintoconsideration those factors, just api thats not quite cutting edge..
 
I just took a closer look at the roadmap listed in the first post of this thread, and noticed that the baseline comparison is to a Core 2 Duo CPU processor. That means that this roadmap is valid for only the CPU, not the GPU!

The roadmap also puts Tegra 3 above said Core 2 Duo, so…
 
All things considered, I am quite impressed so far by Tegra 4. Tegra 4 has an SoC die size area of ~ 80mm^2 (compared to A6X that has an SoC die size area of ~ 123mm^2). Even though A6X has an SoC die size area that is roughly 50% more (!) than Tegra 4, Tegra 4 is reportedly able to outperform A6X with respect to both CPU and GPU performance, all while having [45%?] lower average power consumption than Tegra 3 (in addition to a much faster CPU than Tegra 3, and 6x more GPU pixel shader and vertex shader execution units than Tegra 3 too). That is a pretty impressive achievement, even with the new fabrication process taken into account.

It's a matter of perspective. NV can't afford bigger than 80mm2 SoCs while Apple can with the volumes its dealing with. As for the respective claims we will find out soon in real time measurements if and to what degree is all of it true and we'll also see in due time what other upcoming 28nm SoCs will be able of in more apples to apples comparisons.

They managed for the first time after 6 generatons of mobile stuff to have the fastest SoC in terms of GPU performance and that for a limited amount of time; else count from device availability until the next Apple tablet launch. It is an achievement but not an 8th wonder either. More simple finally something is moving on the GPU side of things for NVIDIA, but the path is still long on a quite a few fronts and not just performance.

Another thing to note is that Tegra 4 appears to be suitable for use in both high end smartphones and tablets.

Are you willing to bet that frequencies will be exactly the same between T40 and AP40?

Compared to the GPU used in the latest and greatest iphone 5 smartphone, the Tegra 4 GPU should be much faster (2x faster?) in comparison, with a significantly smaller SoC die size area too (where A6 has an SoC die size area that is roughly 20% more than Tegra 4).

2x faster? :LOL: Ok....in any case as above I'd rather compare AP40 when it appears in final devices vs. iPhone6 or anything else other competitors will release in the meantime. Just in case you've missed it there are differences in power envelopes between a tablet and a smartphone SoC.

Couple that with new smartphone-friendly features such as super-fast HDR camera (where Tegra 4 is reportedly ~ 10x faster than the iphone 5 smartphone), and Tegra 4 looks very nice for use in a variety of different handheld devices. Even time to market is not too bad considering that Tegra 4 equipped devices will appear on the market just a few months after the latest and greatest iphone and ipad.

Ultra yawn for the HDR camera stuff. Let's see final devices appear on shelves and I reserve then any judgement in comparison to anything else. And just in case you haven't also noticed the market doesn't spin around Apple by far. NVIDIA could for the moment only dream to yield the amount of smartphone design wins and sales volumes of either Qualcomm or Samsung.
 
Back
Top