Tegra 3 officially announced; in tablets by August, smartphones by Christmas

Right, that's what I assumed they would do based on our previous conversation about this: FMA+ADD+L/S with chained MADD being executed as MUL-via-FMA+ADD with an intermediary FIFO ala A8/A9. Oh well, a bit disappointing...

It is. But again, it's not wholely without merit. If all you did was chained and/or individual multiplies and adds, a chained implementation is more efficient energy wise and possibly latency wise. And looking at what armcc is producing, it appears that that's just what they plan.

Oh right, I agree it's a bigger disadvantage on smartphones. I meant specifically ARM Netbooks/Notebooks and (to a lesser extent) Tablets running Windows 8.

I don't see Windows 8 for ARM being benchmarked any differently than current tablets. Remember that these are supposed to be Macbook Air level devices. I doubt most people expect nor care about productivity performance on such devices. Most people editing video (which is the only CPU intensive task I can see on such a device) most likely will be happy with a codec that's hardware accelerated and not necessarily x264.

NVIDIA is good at making people focus on what they want them to focus on, but if Wayne is basically a faster shrink of Kal-El and also keeps the same old GPU arch, they won't get away with it against the competition. They can say they're targeting lower cost but that doesn't even make sense with Grey coming at nearly the same time anyway.

I don't know why the idea of Wayne being a shrink of Kal-El is assumed but all I'll say is, assume makes an ass out of you and me :)

I doubt anyone is ever going to run 7-zip on a proprietary DSP. As for x264, it's still the best software solution by far in terms of both maximum quality and quality-at-a-given-computational-cost.

Ya but how prevalent would its use be on an ultralight notebook, tablet or smartphone?

And that SGS3 is supposed to have a 1280x1024 screen... right, whatever. I also doubt the next Nexus will use a 1.5GHz Exynos. This looks like a very elaborate hoax to me although I suppose we'll (kinda) know when we see some ICS screenshots to check his other claims. It's true that the old Samsung roadmap had a quad-core A9 "Aquila" chip but many things changed since then (e.g. the clock target for Orion/Exynos was 800MHz!) and that was before they became a lead licensee for A15. I could see Samsung going for quad-A15 but not before High-K 28nm which won't be ready for a phone shipping in mid-2012.

FWIW, getting back to the topic of this thread, I think there's a >50% probability that Samsung's 1H12 flagship will use Kal-El (or maybe Kal-El+ depending on the timeframe, assuming it's just a clockbump and/or respin). I have no idea whether that will be the SGS3 or if Samsung's going to keep that brand for devices with their own chips.

I highly doubt that. With the exception of the t-mobile version of the SGS2 -- which I believe was some issue with interfacing with a 42mbps modem -- Samsung has pretty much stuck to its own chips for their handsets. And nothing I've seen of Kal-el (or even a 28nm Kal-el) makes me believe it's either suitable nor competitive with smartphone chips at its time of release (release for smartphones that is).

It'll make a nice tablet chip in 2011. But I really have a hard time seeing someone pass up MSM8960 for a superphone for Kal-el.
 
I don't see Windows 8 for ARM being benchmarked any differently than current tablets. Remember that these are supposed to be Macbook Air level devices. I doubt most people expect nor care about productivity performance on such devices.
Right, I'm just pointing out Windows-based Macbook Air-level devices are benchmarked *very* differently from today's tablets, and the benchmarks don't always match the use cases.

Most people editing video (which is the only CPU intensive task I can see on such a device) most likely will be happy with a codec that's hardware accelerated and not necessarily x264.
If it's exposed with a nice API and/or UI with higher performance than x264 for the same low quality (as opposed to only better battery life), then I probably agree with you. That remains to be seen though, and even if nobody used it, I can still see websites benchmarking high quality x264.

Just to be clear again: I agree 4xA9 is a large real-world performance disadvantage and it's also worse in terms of marketing and benchmarks. I'm just pointing out the difference in marketing and benchmarks is likely significantly smaller than the technical difference even if it's just significant. I could be wrong but I don't think we fundamentally disagree so let's leave it at that. :)

I don't know why the idea of Wayne being a shrink of Kal-El is assumed but all I'll say is, assume makes an ass out of you and me :)
Hehe, fair enough! ;)

With the exception of the t-mobile version of the SGS2 -- which I believe was some issue with interfacing with a 42mbps modem
*snicker* Sorry, that's such a nice way to say Qualcomm is using its baseband leadership to artificially limit application processor choice. Don't get me wrong, I don't mean this in a bad way, since it's a very smart business move as long as they can get away with it.

I realise it might be a "genuine technical limitation" (e.g. audio I/O) - I know Icera once told me they found a MDM8200-based data module with an application processor next to it, and their understanding was it was for voice certification, which is the same reasoning I heard for the MDM8220. I could be wrong, but I still suspect the decision came from management rather than engineering...

Samsung has pretty much stuck to its own chips for their handsets. And nothing I've seen of Kal-el (or even a 28nm Kal-el) makes me believe it's either suitable nor competitive with smartphone chips at its time of release (release for smartphones that is).

It'll make a nice tablet chip in 2011. But I really have a hard time seeing someone pass up MSM8960 for a superphone for Kal-el.
Agreed the MSM8960 is clearly superior (even with the clockbump I'm hearing about whether that's Kal-El+ or not) if Qualcomm delivers (which I assume they pretty much will), but Kal-El is still faster than just about everything else in that timeframe, and I have yet to see Samsung use Qualcomm for any smartphone flagship ever (I7500 doesn't count since it's hardly high-end and MSM7200A was basically the only choice for TTM). When Samsung started aggressively using Infineon basebands a few years ago, several analysts claimed it was a conscious move to get away from Qualcomm. Maybe those analysts were wrong or Samsung changed their mind though, I honestly wouldn't know.
 
Right, I'm just pointing out Windows-based Macbook Air-level devices are benchmarked *very* differently from today's tablets, and the benchmarks don't always match the use cases.

True, but Windows 8 on ARM won't be like current Windows laptops on tablets. I'm sure someone will try to benchmark x264 encoding on them but I suspect that the Metro style apps will be benchmarked more similarly to modern smartphones than modern laptops.

If it's exposed with a nice API and/or UI with higher performance than x264 for the same low quality (as opposed to only better battery life), then I probably agree with you. That remains to be seen though, and even if nobody used it, I can still see websites benchmarking high quality x264.

On x86 perhaps. I don't even know how much software there will be for Metro on ARM for video encoding.

Just to be clear again: I agree 4xA9 is a large real-world performance disadvantage and it's also worse in terms of marketing and benchmarks. I'm just pointing out the difference in marketing and benchmarks is likely significantly smaller than the technical difference even if it's just significant. I could be wrong but I don't think we fundamentally disagree so let's leave it at that. :)

Fair enough. I suppose we'll see.

*snicker* Sorry, that's such a nice way to say Qualcomm is using its baseband leadership to artificially limit application processor choice. Don't get me wrong, I don't mean this in a bad way, since it's a very smart business move as long as they can get away with it.

I honestly don't know what the situation was there. I've only heard that there was an issue with the 42mbps modem. But honestly, there are other vendors with 42mbps HSPA+ modems and they hit the market before qcom did...

Agreed the MSM8960 is clearly superior (even with the clockbump I'm hearing about whether that's Kal-El+ or not) if Qualcomm delivers (which I assume they pretty much will), but Kal-El is still faster than just about everything else in that timeframe, and I have yet to see Samsung use Qualcomm for any smartphone flagship ever (I7500 doesn't count since it's hardly high-end and MSM7200A was basically the only choice for TTM). When Samsung started aggressively using Infineon basebands a few years ago, several analysts claimed it was a conscious move to get away from Qualcomm. Maybe those analysts were wrong or Samsung changed their mind though, I honestly wouldn't know.

Well my point is, if they're going to use a non-Samsung chip and go for a third party, it might as well be 8960 over Kal-el. I don't know about the corporate relationship there nor the politics of it but I don't know that there's any more love for nVidia on Samsung's side.

But I think someone linked to a preliminary GLBenchmark result of what was thought to be Kal-el. It looked on par with Exynos and there is potentially a lot of improvements that could be had form now until product. I don't know that 8960's Adreno 225 will compete with that; or at the very least will be clearly better as Krait compared to A9.
 
I'll wait for a more apples to apples comparison from a third party then. If the results differ as much as you say it's not because of vsync and I'm sure driver enhancements haven't had that kind of alarming impact either - it's either Vivante did it wrong or the benchmark is not really the same.
Yes, a fair thing to do would be to wait for actual device comparison, lets face it a company marketing PDF is never going to be a good trusted source of benchmark data. It's also quite clear that Vivanti's claims don't quite gel with reality i.e. irrespective of the accuracy of their own data (also questionable), we _know_ iPad2 does better than the numbers they're giving in that pdf.

That, and can you really assert that this benchmark delivers perfect scaling with resolution for iPad2?
Yes, this is something I can assert that within a narrow error margin.

If it is because of vsync then the performance of Egypt is way too erratic. Makes me wish we had some frame-period vs time plots. But noting the results for "high" in 2.1 are a lot different (and more clearly vsync limited) than the results for 2.0 I'm going to go with it being unfair to compare the two.

I can say that the speed increase is a mixture of removal of vsync and other improvements related to the BM being done on iOS5. One thing different in the offscreen BM is that it will be running a 32BPP FB on all devices instead of 16BPP (which is what Vivanti will be quoting performance for), but that will only work in iPad2's advantage.
 
Besides the vsync on vs. vsync off trick, what are they exactly comparing here considering die area? The A5 as an SoC has 2.5x times the die area of a Tegra2 (whereby the GPU block in the first is just a tad below the entire T2 SoC estate) and 1.5x times the die area of a Tegra3.

TSMC 40nm obviously has a huge density advantage vs Samsung's 45nm (and probably other fabs at 45nm). Compare A4 vs Tegra 2 and you still see Tegra 2 coming in way smaller despite being better in every way.

i.MX6, presumably also fabbed at TSMC, will have the same density advantage, which makes the PDF's size comparison with A5's GPU unfair.

JohnH said:
Yes, a fair thing to do would be to wait for actual device comparison, lets face it a company marketing PDF is never going to be a good trusted source of benchmark data. It's also quite clear that Vivanti's claims don't quite gel with reality i.e. irrespective of the accuracy of their own data (also questionable), we _know_ iPad2 does better than the numbers they're giving in that pdf.

The current numbers on glbenchmark's site give only 2-3 FPS higher for Egypt in glbenchmark 2.0, which shows that either 2.1 is not the same benchmark or the two haven't been tested with the same OS revision. But it's enough to see that Vivante isn't misrepresenting Apple/IMG in these benchmarks; the slight deviation is also probably due to driver improvements.
 
TSMC 40nm obviously has a huge density advantage vs Samsung's 45nm (and probably other fabs at 45nm). Compare A4 vs Tegra 2 and you still see Tegra 2 coming in way smaller despite being better in every way.

i.MX6, presumably also fabbed at TSMC, will have the same density advantage, which makes the PDF's size comparison with A5's GPU unfair.

Of course is there a density advantage but I wouldn't call it huge or big enough to justify how big A5 turned out after all. I'd have to search in the forums here but I think the GPU block in the PS Vita SoC also under Samsung 45nm was also estimated between 35-40mm2 by someone. If that should be true that would be a SGX543MP4+@200MHz vs. SGX543MP2@250MHz (?) at roughly the same die size.

The current numbers on glbenchmark's site give only 2-3 FPS higher for Egypt in glbenchmark 2.0, which shows that either 2.1 is not the same benchmark or the two haven't been tested with the same OS revision. But it's enough to see that Vivante isn't misrepresenting Apple/IMG in these benchmarks; the slight deviation is also probably due to driver improvements.

The case in point is that Vivante is either way comparing vsynced against non vsynced results. Else how do you get an average of over 70fps with a typical vsync of 60Hz?

Oh and yes a split into another thread would be best.
 
Of course is there a density advantage but I wouldn't call it huge or big enough to justify how big A5 turned out after all. I'd have to search in the forums here but I think the GPU block in the PS Vita SoC also under Samsung 45nm was also estimated between 35-40mm2 by someone. If that should be true that would be a SGX543MP4+@200MHz vs. SGX543MP2@250MHz (?) at roughly the same die size.

Which I guess could be the case based on configuration, if A5 has a lot more RAM/queues/whatever. It's hard to compare density without having more direct examples of shrinks going from Samsung (or other 45nms) to TSMC 40nm but Tegra 2 being smaller than A4 is still pretty telling, IMO. Just look at the 512KB of L2 in A4 vs the 1MB of L2 in Tegra 2..

The case in point is that Vivante is either way comparing vsynced against non vsynced results. Else how do you get an average of over 70fps with a typical vsync of 60Hz?

The point I've been trying to make is that if iPad 2 gets 45FPS under vsync they're probably spending most frames under the limit, or the GPU load is extremely erratic. In that case it doesn't matter that much that one is under vsync and the other isn't.

Maybe GLBenchmark 2.0 was doing something particularly SGX unfriendly and it has been fixed in 2.1, hence the big change - or, maybe 2.1 was optimized in general and would be faster on GC2000. We don't know. But I think there's enough evidence to conclude that, if Vivante's numbers are legitimate, in the 2.0 Egypt benchmark i.MX6 can beat A5. With 2.1 there's too little to go with.
 
The current numbers on glbenchmark's site give only 2-3 FPS higher for Egypt in glbenchmark 2.0, which shows that either 2.1 is not the same benchmark or the two haven't been tested with the same OS revision. But it's enough to see that Vivante isn't misrepresenting Apple/IMG in these benchmarks; the slight deviation is also probably due to driver improvements.
The 2.0 results remain vsync limited, they also haven't been done against iOS5 which would make them even more vsync limited.

The fact is that comparing vsync'd against non-vsynced is mis-representation by any sane standard.
 
GLBenchmark 2.1 assets and visuals look identical to 2.0 to me. Changes to the software environment ranging from configurations of drivers, API, benchmark implementation, etc. can of course make huge differences to results. Judging from the lower level tests like pixel and triangle fill, 2.1 seems to provide a significantly more accurate representation of hardware potential.

I've seen enough inexplicably extreme results in individual tests for individual phones in many different benchmarks suites to know that trends among results are far more meaningful than any particular result.

I'm not sure why iPad 2 scored so erratically low on some of the prior tests. While the initial 3GS results on GLBenchmark seemed to guage the hardware's potential well relative to Android hardware benchmark results, later OS and driver revisions seemed to have iOS devices always relatively underperforming their hardware's potential. The latest iPad 2 results seem to indicate some level of correction to that disparity.
 
The GLBenchmark 2.0 listing for iPad 2 includes iOS 5 in the system information tab. Does this still mean that 2.1 is tested on newer drivers?
 
For the iPad 2, 2.1 was tested using a mid-life iOS5 beta (I think b5, but I'm not 100% sure). Not that I'm claiming b6+ is any faster/slower/better/worse, I claim b5 only for recreational purposes.
 
Which I guess could be the case based on configuration, if A5 has a lot more RAM/queues/whatever. It's hard to compare density without having more direct examples of shrinks going from Samsung (or other 45nms) to TSMC 40nm but Tegra 2 being smaller than A4 is still pretty telling, IMO. Just look at the 512KB of L2 in A4 vs the 1MB of L2 in Tegra 2.

Tegra2 doesn't have any NEON afaik and it also doesn't have as advanced power- and clock gating as A5 has. Power islands aren't for free either.

Before you say it T3 obviously has 4 instead of 2 A9 CPU cores (clocked at 1.5GHz), but each A9 core doesn't take up anything close to a huge amount of die area, has NEON this time and more sophistaced power-/clock-gating with a GPU core having 50% more ALUs than in T2 at a higher frequency, ending up at around 80mm2 under 40nm/TSMC.

Of course would it be a quite complicated equation of +/- if you'd compare any of those SoCs in terms of die area, but considering the latter the A5 doesn't sound as "huge" anymore, especially considering that the A5 has two GPU cores and not just one in T3 (stuff like twice the TMUs, a high amount of z/stencil units, twice the rasterizers/trisetups etc. which aren't exactly cheap either).



Maybe GLBenchmark 2.0 was doing something particularly SGX unfriendly and it has been fixed in 2.1, hence the big change - or, maybe 2.1 was optimized in general and would be faster on GC2000. We don't know. But I think there's enough evidence to conclude that, if Vivante's numbers are legitimate, in the 2.0 Egypt benchmark i.MX6 can beat A5. With 2.1 there's too little to go with.
Vsync on at 1024*768 for the iPad2 doesn't have any huge difference between 2.0 and 2.1. The major addition in 2.1 is that it this time allows to render offscreen at 1280*720 (720p) in order to avoid any vsync limitations.

GLBenchmark2.0/iPad2:
Egypt standard: 43,4 fps
http://www.glbenchmark.com/phonedetails.jsp?D=Apple+iPad+2&benchmark=glpro20

GLBenchmark2.1/iPad2:
Egypt standard: 58,5 fps
http://www.glbenchmark.com/phonedetails.jsp?benchmark=glpro21&D=Apple+iPad+2&testgroup=overall

Driver improvements kick in for all sides all the time. It wasn't too long ago when T20 Tegras increased in Egypt somewhere around 30% in performance.

When Vivante ran those benchmarks they couldn't obviously run under 2.0 the iPad2 without vsync, yet could run their own hw without it. I can't figure out any reasonable explanation how Vivante surpassed the 60Hz typical limit and ended at over 70fps in 1024 without vsync disabled. As you said let's see the independent measurements arrive with final products, but so far supplied data doesn't point at the iMX6 being faster in GL Benchmark2.0, rather the exact opposite.

Even more so since TBDRs at increasing resolutions with increasing demands for fill-rate and bandwith are hardly at a disadvantage.
 
Yeah, I expected PowerVR's performance advantage to grow along with the 720p higher resolution in true TBDR fashion. Also, finally having an application market where graphics engines are designed to at least account for TBDR is great to see (big difference from the desktop PowerVR days) -- visual design that can target higher image and texture qualities is welcome in my eyes.

i.MX6 is a year or more behind 543MP2 for products, so it should win in some performance cases.
 
Last edited by a moderator:
I don't see how it's usable by Windows though, unless it supports CPU hotplug. Anyone know if that's the case?
 
Not too different than what other semis are planning. Seems like it'll result in a good win in CPU efficiency for the design.
 
It's transparent to software. The companion core, when swapped, appears as core0.
But how do threads running on the main array migrate to the companion core when it kicks in, without the OS doing it? The OS also has to know how many cores there are to actively schedule on. On Android it seems to do it with hotplug and the kernel is actively notified.
 
But how do threads running on the main array migrate to the companion core when it kicks in, without the OS doing it? The OS also has to know how many cores there are to actively schedule on. On Android it seems to do it with hotplug and the kernel is actively notified.

I can't give more details, I'm afraid.
 
Back
Top