Tegra 3 officially announced; in tablets by August, smartphones by Christmas

I don't think I'd take the Intel model comments too seriously. They are likely far more influenced by 1. what their competition is doing and 2. engineering constraints than by any philosophy on when to release products.
I'm not saying they explicitly decided to follow such a cycle - just that's what it turned out to be, and so Jen-Hsun is explaining it in that way. He has made similar specific "rhythm" claims in the pre-G92/GT200 which turned out to be basically true (but arguably only for that specific generation). However I agree his comments are vague enough it doesn't even need to mean that, so yes, the evidence points in that direction but it is rather weak.

And when he made these comments and NVIDIA released their 'performance roadmap', they were still nearly one year from tape-out. Not strictly too late to change from A9 to A15 (hi QSC7230/MSM7230! ;)) - then again this is the company that released a 750MHz ARM11 to compete with OMAP3 then later downgraded it to 600MHz.

Sadly, that's the case for many of A15's new instructions; including fused MAC. A15's implementation of FMA was an afterthought (grafted onto the chained implementation) so ARM's compiler doesn't like to -- rather, just doesn't -- spit out FMA instructions.
Can you elaborate? Grafted onto the chained implementation HW-wise? I think we discussed this in another thread and concluded that would be more expensive and have no benefit whatsoever, or do you mean something else entirely? Also are you saying the compiler not spitting out FMA hints at bad HW performance, or it's just immature SW or trying to maintain compatibility?

But I suspect most applications that perform workloads like x264 have dedicated (either fixed function or DSP) processors to do that and won't be using the CPU.
x264 is the world's best encoder, period. Most handheld encoders are significantly below average quality. PowerVR VXE is pretty damn good from what I've seen (and can scale to higher performance through not only clocks but also encode core count) but still not competitive with the maximum quality (i.e. slowest) x264 profile afaict. Many handheld encoders get humiliated IQ-wise by very fast x264 profiles, and with a dual-core 2.5GHz A15 x264 might actually achieve more than the HW encoder's 30fps in some cases (at massively higher power).

I don't think there is a use case for x264 on smartphones or even much on tablets, but there is a very strong one on Windows 8 ARM clamshells. So it's definitely not just a theoretical gimmick long-term IMO. And even if it was, I'd expect it to get benchmarked because of its importance on x86 machines. Compression benchmarks (e.g. 7-Zip/Winrar) would also benefit from multi-core but not as much from NEON.

Once again, I agree 2xA15 is clearly superior, but I'm not sure the press/enthusiast reaction to 4xA9 would be as negative as you think it would be. Hard to say htough.

I'm not sure how that would really speed anything up. Most teams are divided into dedicated physical/synthesis groups and logic/architecture groups. Shifting physical design to India would only cause complications.
Agreed, the complexity of that kind of outsourcing is a big problem. But actually now that I reread the article, I think they might be handling much more of the logic/architecture as well: http://www.livemint.com/2011/06/01224527/Nvidia8217s-India-unit-to-s.html

Also I realise there is plenty of parallelism to the chip development process already, but what I meant is that if it was completely separate teams, it would be theoretically possible for Logan to be lagging behind Wayne by only, say, 6 months. I don't think that's very likely but it would certainly make it more reasonable to stick with A9 on Wayne. Then again with Grey added to the roadmap, I don't see how they could pull it off, it just doesn't make sense.

Looking at the market, 13-15" notebooks are where it's at. Even Intel has shifted their strategy to mainly target processors for that market. And A15 would fit perfectly in that area, more so than Haswell I'd say.
Hmm, that's true. The real question on whether it's worth the trouble is screen power consumption though obviously.

Hell, going from 45LP to 45LPG was a pretty big shift; die area ballooned somewhere on the order of 20%.
Woah, 20%? Just for the parts using the G transistors, or the entire die? I don't really understand how it could be that much either way.
 
If the thread gets pegged on the div-less core for a while a reasonable approach would be to trap the instruction when it's executed and patch the program to call a VFP-based division routine instead, if VFP is available.
Hmm that's a pretty reasonable solution, I assume the trap wouldn't have too big of an overhead?

BTW, do you have a source that indicates that A5 doesn't have integer division? It seems like a strange omission, everything else considered.
Well the A5 is a cost play and it was released before the A15, the stranger omission is that there's no integer division on the original ARMv7-A really. And here's some evidence I found with a quick google: http://permalink.gmane.org/gmane.linux.ports.arm.kernel/119538 (look for HWCAP_IDIV)
 
Hmm that's a pretty reasonable solution, I assume the trap wouldn't have too big of an overhead?

Hardware-wise it wouldn't be a lot, kernel-wise it'd probably be huge. I was thinking more that the trap would modify the divide instruction in the caller's code stream, to call the divide routine. Then to change them back to divide instructions the divide routine itself can be modified to backpatch the caller. The latter would be pretty expensive since you can't flush icache from user code, but I imagine so long as you're not constantly swapping threads from slow to fast cores this wouldn't be too bad.

Well the A5 is a cost play and it was released before the A15, the stranger omission is that there's no integer division on the original ARMv7-A really. And here's some evidence I found with a quick google: http://permalink.gmane.org/gmane.linux.ports.arm.kernel/119538 (look for HWCAP_IDIV)

I agree that the omission of divide in ARMv7a altogether is bizarre, considering that ARMv7m got it. Still, A5 got VFPv4 and the FMA instructions in NEON and R5 got divides. So ARM still has been revising the ISAs beyond A15 and you'd think A5 would have been a good candidate for this (for precisely the reasons listed). It doesn't exactly have to be a fast divider or anything, 1-bit/cycle should be sufficient.
 
I'm not saying they explicitly decided to follow such a cycle - just that's what it turned out to be, and so Jen-Hsun is explaining it in that way. He has made similar specific "rhythm" claims in the pre-G92/GT200 which turned out to be basically true (but arguably only for that specific generation). However I agree his comments are vague enough it doesn't even need to mean that, so yes, the evidence points in that direction but it is rather weak.

And when he made these comments and NVIDIA released their 'performance roadmap', they were still nearly one year from tape-out. Not strictly too late to change from A9 to A15 (hi QSC7230/MSM7230! ;)) - then again this is the company that released a 750MHz ARM11 to compete with OMAP3 then later downgraded it to 600MHz.

True, I guess there's no telling what they'd aim for. Tegra 2 without NEON, T3 smartphone on 40nm, etc.

Can you elaborate? Grafted onto the chained implementation HW-wise? I think we discussed this in another thread and concluded that would be more expensive and have no benefit whatsoever, or do you mean something else entirely? Also are you saying the compiler not spitting out FMA hints at bad HW performance, or it's just immature SW or trying to maintain compatibility?

The former. Use of FMA is discouraged right now due to the poor HW implementation (done just so the instruction wouldn't cause an exception). My guess is they initially did both a chained and fused implementation and ran out of die area; so they had to choose one and half-ass an implementation for the other. Current software obviously only uses chained (except for A5 software) so it'd make sense that they'd choose chained.

x264 is the world's best encoder, period. Most handheld encoders are significantly below average quality. PowerVR VXE is pretty damn good from what I've seen (and can scale to higher performance through not only clocks but also encode core count) but still not competitive with the maximum quality (i.e. slowest) x264 profile afaict. Many handheld encoders get humiliated IQ-wise by very fast x264 profiles, and with a dual-core 2.5GHz A15 x264 might actually achieve more than the HW encoder's 30fps in some cases (at massively higher power).

I don't think there is a use case for x264 on smartphones or even much on tablets, but there is a very strong one on Windows 8 ARM clamshells. So it's definitely not just a theoretical gimmick long-term IMO. And even if it was, I'd expect it to get benchmarked because of its importance on x86 machines. Compression benchmarks (e.g. 7-Zip/Winrar) would also benefit from multi-core but not as much from NEON.

I thought most compression algorithms were more storage bound today than they are CPU bound. But I suppose that's for modern desktop chips. Also, I don't think x264 encoding is really something x86 machines in the "ultrabook" or "slim notebook" portions are looking for. I've seen it benchmarked mostly to show "hey, look how much faster this new Core iInfinity XE Ultra-Extreme Unicorn Edition is!"

However, I agree that in the future, perhaps as people move towards small, light laptops as their main computer, being able to do things like x264 encoding in a relatively fast and real-time way may become important.

Once again, I agree 2xA15 is clearly superior, but I'm not sure the press/enthusiast reaction to 4xA9 would be as negative as you think it would be. Hard to say htough.

It is. But look at the reaction of A9 vs A8.

Also I realise there is plenty of parallelism to the chip development process already, but what I meant is that if it was completely separate teams, it would be theoretically possible for Logan to be lagging behind Wayne by only, say, 6 months. I don't think that's very likely but it would certainly make it more reasonable to stick with A9 on Wayne. Then again with Grey added to the roadmap, I don't see how they could pull it off, it just doesn't make sense.

I can see Grey being shifted to the India team. There's a certain cadence of development that your traditional teams usually have that you don't really want to break up.

Hmm, that's true. The real question on whether it's worth the trouble is screen power consumption though obviously.

Well, we're no longer talking about the 6.6WHr battery range now. Nor are we talking about a 500mW processor. A15 implementations that go into the clamshells will likely be on the order of 2-4W that will compete against the ~10W notebook chips Intel puts out. That will definitely be noticeable in the overall power equation.

Woah, 20%? Just for the parts using the G transistors, or the entire die? I don't really understand how it could be that much either way.

The core that used G transistors for its critical path went up by 20%. Some of it was cell sizes growing due to changes in FET structure. Some was due to more restrictive design rules when routing.
 
Is Grey named after some scientist or is it a nameplay referring to gnomes which are typically clothed in grey? :LOL: Heck if they started with "Superphones" and super-heroes like Kal El, why didn't they just name Wayne = Batman and Grey = Robin?
 
Is Grey named after some scientist or is it a nameplay referring to gnomes which are typically clothed in grey? :LOL: Heck if they started with "Superphones" and super-heroes like Kal El, why didn't they just name Wayne = Batman and Grey = Robin?

It's named after Jean Grey from X-Men.
 
I don't think any of us expected Kal-el to be faster than a SGX543MP2.

Maybe not faster, but at least not that much slower.

It's quite a fail for nVidia to launch a Q4-2011 SoC that's more than 50% slower than a Q1-2011 SoC from the competition. Even more considering it'll be the GPU for the whole 2012.

GPU performance is consistent with the ~2*Tegra 2 claims, though.
 
Maybe not faster, but at least not that much slower.

Why not? A5 drove ~2x the pixels 2x as fast as Tegra 2. Even with a 2x speedup, the A5 is still 2x faster, which it shows here.

It's quite a fail for nVidia to launch a Q4-2011 SoC that's more than 50% slower than a Q1-2011 SoC from the competition. Even more considering it'll be the GPU for the whole 2012.

GPU performance is consistent with the ~2*Tegra 2 claims, though.

iPad isn't really competition. Tegra has no chance of getting inside iStuff and A5 is not going to be distributed to other OEM's. Nor are people choosing between an Android tablet over an iOS tablet considering graphics processing speed as their litmus test.

I highly doubt most other SoC vendors are going to come close to A5 within the next year. My guess is it won't be until late 2012 where other 543MP, Rogue series or Adreno 320 comes out that they'll actually match or perhaps exceed A5 level performance.
 
Some folks probably think that IHVs have magical wands; may I remind that the iPad2 GPU block consists of 2 GPU cores and not one? Clock that MP2 at the frequency the ULP GF in T3 is clocked and it'll get even worse.

And metafor you're wrong; OMAP4470 should be quite close in performance compared to the iPad2 due to ~twice the frequency. By the time that one appears in devices though Apple will be rolling out its SoC refresh already.
 
Nor are people choosing between an Android tablet over an iOS tablet considering graphics processing speed as their litmus test.

Agree with that in general. There are no killer apps. which are graphics-based driving sales of these devices, at least not yet.

While they have a big graphics advantage, Apple might think about cutting a deal to bring a big game over to the platform, something of the stature of WoW or Starcraft.
 
And metafor you're wrong; OMAP4470 should be quite close in performance compared to the iPad2 due to ~twice the frequency. By the time that one appears in devices though Apple will be rolling out its SoC refresh already.

When are devices with OMAP4470 due to be available?
 
As computing platforms, mobiles are starting to stretch their legs and will moreso with OpenCL.

Semis that took a pass on PowerVR and Series5XT will have to focus that much more on their other selling points.
 
When are devices with OMAP4470 due to be available?

Trouble is that there's nothing definite in this market to go by just projections that can't be kept lately and that from all sides. I'd tip somewhere in H1 12' and I severely doubt we'll see anything from Rogue this year. ST Ericsson doesn't sound as optimistic as in the past and quite frankly to expect it to ship within 2012 was quite a tall order.

***edit: by the way that's a very good showcasing of Tegra3 as an early start at one synthetic benchmark. Any estimates on frequency from anyone?
 
Back
Top