Tegra 3 officially announced; in tablets by August, smartphones by Christmas

It'll be interesting to see if they come up with a bombshell GPU in the future. They certainly have significant graphics technology to leverage against everyone else. They seem to be getting pretty serious about games on these devices as shown by Tegra Zone for example. But I'm sure there is a limit as with desktop IGPs where it won't make sense to add more GPU power because few users would appreciate it and so the value is nil.

The whole thing could very well parallel Atom vs ARM. Intel has way more expertise in high-performance CPUs but is still struggling to come close to high end ARM on perf/W. nVidia at least has the good sense to keep W down, but that could cost as much as perf. Just knowing what we do about the arch it's not hard to imagine that Tegra isn't a perf/W leader in the mobile GPU space, even before you take into account any possible lack of low power engineering experience.

It's interesting that you mention limits of desktop IGPs because Trinity seems like it's going to be pushing the boundaries of how much GPU capability they can shove alongside a CPU, and even Llano is already bandwidth limited much of the time. Seems like there's a user demand for putting in as much as is commercial practical. This is way more than what we saw on motherboard IGPs, though.

If nVidia didn't see something like SGX543MP2 taking off then they underestimated the market rather than the GPU vendors, since it was obvious that capacity (and much higher) existed in future PowerVR designs. To their credit, it really only is one company pushing this, and normally that wouldn't cause a big rift: too bad it happened to be Apple. At the same time, if they were really trying to sell to Sony for PSP2 then there's no way they could have thought GeForce ULV as it currently exists would be sufficient.

On the other hand I agree with Ailuros, that nVidia seems to have way more experience in writing high quality drivers. And in the GPU domain that's not that influenced by power consumption.
 
On the other hand I agree with Ailuros, that nVidia seems to have way more experience in writing high quality drivers. And in the GPU domain that's not that influenced by power consumption.

Yes and no. Remember that the driver stack runs on the CPU. So while it doesn't affect the power consumption of the GPU, if your optimization loop is too aggressive, it could easily take that Cortex A15 to peak frequency; that's a lot of power.

One of the Smartbench 3D "benchmarks" -- I use quotes because it can hardly be called that -- was actually limited by the CPU's binning loop on one of our chips.
 
A crappy driver could very well drain your battery in no time and cause at the same time severe system instabilities.

Look at the Tegra3 ULP GF performance which is likely to be give or take what I'd expect from something like the Adreno225 from Qualcomm. The first has 2 Vec4 PS + 1 VS ALU probably clocked at 500MHz and the latter 8 Vec4 USC ALUs@400MHz (according to Anand's relevant article at least). Despite the latter having more than twice the ALU lanes and the fact of unified shader ALUs on top, it seems that both roughly will end up give or take in the same performance ballpark in 3D but only for the so far available benchmarks.

For NV I suspect not only better optimised drivers, but probably also better pipeline utilization.

In any case I've said it again that we need mobile games to enable frame counters and why not timedemos. The so far used public benchmarks are few and are and will remain synthetic benchmarks, which are also quite easy to tune your drivers for.
 
Look at the Tegra3 ULP GF performance which is likely to be give or take what I'd expect from something like the Adreno225 from Qualcomm. The first has 2 Vec4 PS + 1 VS ALU probably clocked at 500MHz and the latter 8 Vec4 USC ALUs@400MHz (according to Anand's relevant article at least). Despite the latter having more than twice the ALU lanes and the fact of unified shader ALUs on top, it seems that both roughly will end up give or take in the same performance ballpark in 3D but only for the so far available benchmarks.
If your predictions end up being real then wouldn't that mean that the Adreno 225 is terribly inefficient? Oh and if so, where do you think most of that inefficiency comes from? Is it mainly driver related or mainly some bottleneck in the hardware?
 
One of the Smartbench 3D "benchmarks" -- I use quotes because it can hardly be called that -- was actually limited by the CPU's binning loop on one of our chips.

But good drivers on desktops are supposed to use as little CPU time as possible too, and if nVidia didn't know how to do that they wouldn't have been recognized as writing good GPU drivers up until now.

Point I was making is there isn't a special talent needed to go from writing good GPU drivers on desktops to good GPU drivers on mobile if one architecture reasonably resembles one of the others.
 
In any case I've said it again that we need mobile games to enable frame counters and why not timedemos.

Are you advocating for benchmarking with games against dedicated benchmarks? It's surely super important to know whether game X runs smoother on device A and not B, but it's synthetic benchmarks that tell you where's the actual bottleneck in the overall HW. IMO both are important in order to understand the perf characteristics of HW (and SW).
 
If your predictions end up being real then wouldn't that mean that the Adreno 225 is terribly inefficient?

I wouldn't call it terribly inefficient, but not as efficient as you'd expect looking at the pure paper specifications. Besides it's nothing new in the desktop either where any amount of N units is by far not something to draw conclusions from, but rather efficiency/capabilities per unit * amount of units * frequency, and even that could be a questionable equasion.

Please note that I merely looked at Adreno220 GL benchmark results and speculated based on the claim in Anand's S4 article that 225 will be roughly 50% faster than the former.

Oh and if so, where do you think most of that inefficiency comes from? Is it mainly driver related or mainly some bottleneck in the hardware?

I haven't the slightest clue how pipeline utilization looks like in each archicture; under normal conditions USC ALUs (and even more so if the unit count is higher than a compared non unified design) the first should win. In order to find that out a mobile developer should sit down and write a few lines of shader code and pull it through different architectures.

So far it sounds suspiciously like sw related problems; in that regard it could be that current mobile applications don't get along well with USC architechtures (which would be weird since the majority of them is unified), or the shader compiler simply stinks or the driver itself or all of the three and more combined.
 
Are you advocating for benchmarking with games against dedicated benchmarks? It's surely super important to know whether game X runs smoother on device A and not B, but it's synthetic benchmarks that tell you where's the actual bottleneck in the overall HW. IMO both are important in order to understand the perf characteristics of HW (and SW).

Yes they are. But if you put in a mobile game a framecounter and measurements take place in different locations of the game from any reviewer, you have more chances that it's not as optimized as any sort of timedemo or synthetic benchmark.

Another point I'd like to make is that measuring performance in N is fine and dandy, but I'd also advocate for the exact same output. Personally for the past decades I've always had it easier to detect something that is questionable while playing a game in real time, instead of running any timedemo and go for a pee in the meantime ;)
 
http://www.engadget.com/2011/11/14/exclusive-lenovo-to-release-a-10-1-inch-ice-cream-sandwich-tabl/


Engadget is claiming Lenovo will release a Tegra 3 tablet with 2GB of 1600MHz DDR3.
Supposedly, Tegra3 only supports DDR3L up to 1500MHz so it may be a typo... or flat-out wrong.


Nonetheless, using 1500MHz DDR3-L would lower the SoC's bandwidth handicap compared to the competition using dual-channel 800MHz LPDDR2.
It should be interesting to see how the performance varies with memory speed in this SoC.
 
People have different definitions of "MHz" when it comes to RAM. Some of them refer to the clock speed, others to the data transfer rate which is 2x the clock speed.
 
People have different definitions of "MHz" when it comes to RAM. Some of them refer to the clock speed, others to the data transfer rate which is 2x the clock speed.
Thanks, I know this. What's the point to refer to commands rate instead of data rate in presentations and specs?
 
Yup, those 400MHz LPDDR2 in the pdfs you found are usually mentioned as 800MHz LPDDR2 because of the double data rate (the "DDR" in LPDDR2).

I don't know if 400MHz (200MHz * 2) LPDDR2 chips were ever made, as that was the fastest available rate for LPDDR1.


Thanks, I know this. What's the point to refer to commands rate instead of data rate in presentations and specs?

Actually, LPDDR2's command rate is also made at twice the base clock speed (check the above link's last paragraph).

Anyways, the data rate is representative of performance (marketing, high-level software dudes), while the base clock speed is more important for low-level integration (engineers).
 
Last edited by a moderator:
It's obvious, please stop mention this:smile: I was thinking that data clocks are common in DDR nomenclature. DDR2-400B/C,...,DDR2-1066E/F all names refer to data clocks

I don't know if 400MHz (200MHz * 2) LPDDR2 chips were ever made, as that was the fastest available rate for LPDDR1.
Don't know about LPDDR, but 400MHz DDR2 definitely was in production

Actually, LPDDR2's command rate is also made at twice the base clock speed (check the above link's last paragraph)
I know this as well

while the base clock speed is more important for low-level integration (engineers)
Base clock important for latencies and thus for processor memory sub system from engineering POV, however it's not so important for the end user, that's why data rates are common in ram nomenclature
 
Last edited by a moderator:
Don't know about LPDDR, but 400MHz DDR2 definitely was in production
Pretty sure LPDDR2 chips never into production below 533MHz (wouldn't make sense given the process technology evolution) but there are several SoCs (e.g. Samsung S5PC110 aka Exynos 3110) that support LPDDR2 at only up to 200MHz (400MHz data). Those chips basically didn't need the bandwidth but could benefit from the lower power consumption.
Base clock important for latencies and thus for processor memory sub system from engineering POV, however it's not so important for the end user, that's why data rates are common in ram nomenclature
To be fair, latency is often more important than bandwidth for CPUs. It's also harder for consumers to understand though...
 
To be fair, latency is often more important than bandwidth for CPUs. It's also harder for consumers to understand though...
It's pretty clear considering huge bandwidth gains and poor latency scaling from the nineties, however it's engineers job to design memory sub system which will cover latencies, once it's done, memory bandwidth become a defining characteristic for processor performance for end user(just because he can radically change memory configuration from bandwidth point of view and cannot do nothing to reduce latencies by the same amount as he can do it for bandwidth)

PS it's hard for me to convey all my ideas clearly because of my poor english
 
Last edited by a moderator:
To be fair, latency is often more important than bandwidth for CPUs. It's also harder for consumers to understand though...

I think in the context of mobile SoC's, the memory subsystem is less skewed towards servicing the CPU. The video processor, GPU and frame buffer would seem to be the biggest hogs of the memory bus and those are all very latency-tolerant things.

This does present a problem for CPU performance, I agree but on SoC's, one can argue that CPU performance is far less important than on a standard PC architecture.
 
Back
Top