Tegra 3 officially announced; in tablets by August, smartphones by Christmas

Do you have any more information on saturating vmax/vmin instructions? I can only find reference to FMA and half precision conversion extensions (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/CIHJEBCE.html)

It's more accurate to say that the implementation of VMAX and VMIN on A5 doesn't quite match the documentation (or rather, what one would expect). It doesn't say what happens to the source register (the smaller of the two in VMIN, larger of the two in VMAX). In A5 and I believe A15 as well, IIRC, the sources are saturated to max/min values.
 
Yes. My point being that die area is far more premium than on 4MB L3+ desktop chips. And if you had the die area, spending it towards extra GPU pipelines may be more helpful....
In Yamato's case that may not be so obvious. Actually, I'd say it's far from obvious there. IMO, the architecture does need a balance between crunching power and tiling constraints - buffer resolves are not exactly free there. Increasing GMEM is a very low-hanging fruit performance-wise, so there's will always be the temptation to pick it, by whoever designs the next Yamato iteration.

At the end of the day, Yamato is a spawn of an architecture that took tiling as a second priority. Many of the characteristics of today's Yamato iterations stem from the original design priorities of that precursor.
 
It's more accurate to say that the implementation of VMAX and VMIN on A5 doesn't quite match the documentation (or rather, what one would expect). It doesn't say what happens to the source register (the smaller of the two in VMIN, larger of the two in VMAX). In A5 and I believe A15 as well, IIRC, the sources are saturated to max/min values.
The architecture manual looks clear to me, there's no reason for the source register(s) to be modified (unless of course it is also the dest reg). If A5 or A15 changes it then it looks like a bug.
 
First performance benchmarks:

antooooto300x500.png


transformerprimeantutu0.jpg
 
Is that frequency scaling or core-scaling for the "CPU" portion? I had no idea T3's memory system was clocked so high; it's still single-channel, right?
Wild guessing as I don't know that benchmark...

The integer part result is probably both freq and core scaling though it should be higher if there was perfect scaling.

The memory part might be due to either better bandwidth utilization thanks to the 4 cores (if the part of the benchmark isn't able to defeat L2 caching) or perhaps they just widened the DDRn memory interface.

The 3D graphics seems to be the same as before, that's odd.
 
The big difference in 3D scores between Xoom and Optimus 2X suggest that it's resolution dependent. Hence it's better to compare these results, most likely from a tablet, to Xoom's, where the 3D score is much higher. But could still influenced by vsync.

Kal-El doesn't support a larger DRAM width, and the supported frequency is only somewhat higher.
 
Kal-El doesn't support a larger DRAM width, and the supported frequency is only somewhat higher.
Ha yes indeed; according to Anandtech it's still 32-bit wide and frequency went from 600 to 800 MHz for LPDDR2. So I guess that's some benchmark artefact.
 
The Galaxy S2 and Galaxy Nexus scores are interesting; the graphics portions are similar even though the Mali400MP4 should far outpace the OMAP4's SGX540.
 
I think the 2D, 3D and database scores are most telling about how the overall experience will feel relative to other devices. I'd like to see some javascript / browsing benchmarks.

I'm surprised to see memory bandwidth improved considering that I read it would be very similar to Tegra 2.
 
I'm surprised to see memory bandwidth improved considering that I read it would be very similar to Tegra 2.
As I wrote this might be a benchmark issue: it might be unable to saturate T2 memory interface when running on only two cores. Hard to say given it seems to only provide a score and not MB or GB/s figure.
 
As I wrote this might be a benchmark issue: it might be unable to saturate T2 memory interface when running on only two cores. Hard to say given it seems to only provide a score and not MB or GB/s figure.

Incidentally, you've been able to saturate said Tegra 2 interface, right?

I've been thinking about memory bandwidth vs latency on these devices more. I think on something like OMAP3 it's not possible to come very close to saturating bandwidth with just the CPU, because the store queue is not big enough to hide latency for stores, and the preload queue is not big enough to hide it for loads. The latencies are just too high for the CPU/cache design. But you can get a lot closer using the pre-load engine on Cortex-A8, since it involves programmed sequences instead of explicit requests.

In a situation like this more cores would be an obvious help, like you say.

A9 should bring in a similar advantage by having automatic prefetching built into the core. Then it's a matter of whether or not the benchmark triggers it. But quality could still vary with Tegra 2/Kal-El having a potential advantage if the L2 cache and/or memory controller has its own additional prefetching or deeper issue queues.
 
I heard Tegra 3's memory controller is extremely good - I wouldn't be surprised at all if most of the gains came from that direction. I was expecting the gains to be less significant in a synthetic benchmark though so I wonder what's it doing exactly....
 
Kal-El doesn't support a larger DRAM width, and the supported frequency is only somewhat higher.

Ha yes indeed; according to Anandtech it's still 32-bit wide and frequency went from 600 to 800 MHz for LPDDR2. So I guess that's some benchmark artefact.

I think the 2D, 3D and database scores are most telling about how the overall experience will feel relative to other devices. I'd like to see some javascript / browsing benchmarks.

I'm surprised to see memory bandwidth improved considering that I read it would be very similar to Tegra 2.


Anand's Tegra 3 article from February actually says:
NVIDIA also said that effective/usable memory bandwidth will nearly double with Kal-El vs. Tegra 2. Some of this doubling in bandwidth will come from faster LPDDR2 (perhaps up to 1066?) while the rest will come as a result of some changes NVIDIA made to the memory controller itself.

They're using Elpida LPDDR2 chips, which exist in either 800 or 1066MHz variants.

So where did the 800MHz assumption came from?
 
So where did the 800MHz assumption came from?
I assume most people are just speculating, but I looked into it and they're actually right. The OMAP4 Pandaboard uses Elpida chips with the same performance class (end of the marking) and OMAP4 doesn't support more than 800MHz so it seems unlikely they paid for something faster.
 
Weren't there some OMAP3 based HTC smart-phones in the past also? I'm not in the least surprised if phone manufacturers don't want to depend on just one SoC or technology source if you prefer.

SoC manufacturers lose in one spot and gain in another; the map is consistently changing and stakes seem to be higher than ever before.

What remains now to be seen is what a quad core CPU will accomplish in a smart-phone.
 
I'm far more optimistic on the CPU side of Tegra 3 than I was before from the standpoint of efficiency (but not performance), after listening better to their explanation of how processing will be balanced.

Tegra 3 will still be a let-down, though, due to the underwhelming graphics performance. Tegra 3+ is kind of funny when you think about it in context.

But, wow... those HTC Edge rumors are fascinating in a number of ways in its specs.
 
Weren't there some OMAP3 based HTC smart-phones in the past also? I'm not in the least surprised if phone manufacturers don't want to depend on just one SoC or technology source if you prefer.

SoC manufacturers lose in one spot and gain in another; the map is consistently changing and stakes seem to be higher than ever before.

Google tells me there was this HTC Qilin for chinamobile back in 2009, but it was a WM6.x device, not android (many of the old windows mobile HTC smartphones used OMAP SoCs, BTW).
Nonetheless, it's been at least two years since HTC used anything other than a Qualcomm SoC for any of their smartphones.


What remains now to be seen is what a quad core CPU will accomplish in a smart-phone.

18975304.png
 
Back
Top