Tegra 3 officially announced; in tablets by August, smartphones by Christmas

Don't you think this kind of frequency will make it too power hungry(or too hot?) for smartphones?
Although, I don't think Kal-El was ever really meant for smartphones.

Vivante's GC1000 goes up to 1100MHz on 40nm G+ and consuming 134 mW...

Kal-El is definitely headed to smartphones. nVidia doesn't have any other SoC planned for 2012.
 
Vivante's GC1000 goes up to 1100MHz on 40nm G+ and consuming 134 mW...

Kal-El is definitely headed to smartphones. nVidia doesn't have any other SoC planned for 2012.

Ok, so maybe not too power hungry. But too hot? Considering that current dual core SoC's are sometimes getting hot, wouldn't quad-core CPU be hotter?
 
Ok, so maybe not too power hungry. But too hot? Considering that current dual core SoC's are sometimes getting hot, wouldn't quad-core CPU be hotter?

Only if they're all being used at once near their peak operational frequency/voltage. In most cases, 2 cores will likely be shut off.
 
Wishmaster said:
Ok, so maybe not too power hungry. But too hot? Considering that current dual core SoC's are sometimes getting hot, wouldn't quad-core CPU be hotter?
If it's not too power hungry but it is too hot, all you're saying is thar the power removal would be different, since power and heat (as in: temperature) are otherwise the same thing.

Why would a 4 core chip have worse cooling than a 2 core chip?

In theory, a 4 core SOC can consume less than a 2 core *for the same absolute workload*, because each core individually can run slower, so voltage can be lowered.

It remains to be seen how that works out in practice, it depends heavily on how you can spread work among cores, but fundamentally it seems like a sound principle.
 
Vivante's GC1000 goes up to 1100MHz on 40nm G+ and consuming 134 mW...

According to Vivante's table TSMC 40G+ devices get 80% better perf/MHz than TSMC 40LP ones. I get that 40LP is leakage-optimized but is this really realistic?

Of course this is totally apples to oranges comparing it to something like Kal-El, Vivante's GPUs strike me as area optimized at the expense of high clocks (for instance, just one pixel/clock)
 
Vivante's GC1000 goes up to 1100MHz on 40nm G+ and consuming 134 mW...

Kal-El is definitely headed to smartphones. nVidia doesn't have any other SoC planned for 2012.

I checked the last few pages and unless i missed it entirely, noone's posted the updated roadmap yet - http://www.anandtech.com/show/4769/...ap-kalel-in-2012-wayne-in-late-2012early-2013http://images.anandtech.com/doci/4769/Screen Shot 2011-09-12 at 1.50.39 PM_575px.png

As per the roadmap, there is a Kal El+ coming in mid 2012. My guess is that Kal El+ is a 28nm shrink of Kal El and they had to push Wayne back as they decided to go from A9 to A15. Or maybe they renamed the existing Wayne config as Kal El+ :p (as it was A9 anyway right) and went with a new design for Wayne.

And my guess is that Grey is a dual A15 as it is targeted mainly at Smartphones. Given the timeframe, its going to be approximately a year behind MSM8960.
 
Last edited by a moderator:
Vivante's GC1000 goes up to 1100MHz on 40nm G+ and consuming 134 mW...

I wouldn't suggest it's at those wild frequencies otherwise it would kill the iPad2 with ease. A quick estimate based on the results (and considering that they just doubled the PS ALU amount) would be in the =/>450MHz region.

Kal-El is definitely headed to smartphones. nVidia doesn't have any other SoC planned for 2012.

The results from the Asus TF201 are obviously from a T30 SoC which is for tablets with higher frequencies and AP30/smartphones (due to early 2012 anyway) with lower frequencies and possibly a choice between quad and dual core A9s depending what the OEM wants.
 
As per the roadmap, there is a Kal El+ coming in mid 2012. My guess is that Kal El+ is a 28nm shrink of Kal El and they had to push Wayne back as they decided to go from A9 to A15. Or maybe they renamed the existing Wayne config as Kal El+ :p (as it was A9 anyway right) and went with a new design for Wayne.

I'd say that KalEL+ is a 28nm shrink as you say with higher frequencies.

And my guess is that Grey is a dual A15 as it is targeted mainly at Smartphones. Given the timeframe, its going to be approximately a year behind MSM8960.

Grey is supposed to serve mainstream smartphones and not higher end smartphones ("superphones" as NV calls them). Why not re-use the existing Kal-El design and make a bigger buck?
 
Grey is supposed to serve mainstream smartphones and not higher end smartphones ("superphones" as NV calls them). Why not re-use the existing Kal-El design and make a bigger buck?

A dual A15 config would be a far better option than a quad A9, i think that has been covered extensively by metafor and Arun already. From what i remember, dual A15's are better in performance for most apps than quad A9's and are also better both in terms of perf/mm2 and perf/W. And given that its going to come out in early 2013, dual A15's are going to be somewhat mainstream by then (TI, ST, Samsung, Apple should all have chips out around that timeframe, and Qualcomm will have dual Krait's out long before)
 
A dual A15 config would be a far better option than a quad A9, i think that has been covered extensively by metafor and Arun already. From what i remember, dual A15's are better in performance for most apps than quad A9's and are also better both in terms of perf/mm2 and perf/W. And given that its going to come out in early 2013, dual A15's are going to be somewhat mainstream by then (TI, ST, Samsung, Apple should all have chips out around that timeframe, and Qualcomm will have dual Krait's out long before)

I don't expect to see anything in devices with dual A15 from say TI or ST Ericsson before 2013. I can't know how the design wins for NV look like for AP30, yet I'd dare to speculate that the majority might go for the dual core variant instead for the quad core variant. It could very well be that Grey contains A15 cores, but considering the projected timeframe it's not as cost effective as alternative solutions. Grey isn't aiming for "superphones" but for mainstream smart-phones instead as the roadmap itself indicates. By today's measures a high end smartphone will contain an OMAP4 and a mainstream smartphone an OMAP3.
 
If TI manages to see 4470 devices before H2 2012 goes too far, I suppose it's possible to squeeze an OMAP5 device in before year's end, but I'm never optimistic on a TSMC process ramp up when combined with the usual chance for delays of phone productization.
 
If TI manages to see 4470 devices before H2 2012 goes too far, I suppose it's possible to squeeze an OMAP5 device in before year's end, but I'm never optimistic on a TSMC process ramp up when combined with the usual chance for delays of phone productization.

I thought TI moved some parts of their production to UMC?
 
Speaking of which, what's Samsung up to? It would seem that they're usually late with their silicon delivery but able to outperform their competition every year. What's on slate for 2012 after Exynos?
 
metafor said:
The former. Use of FMA is discouraged right now due to the poor HW implementation (done just so the instruction wouldn't cause an exception). My guess is they initially did both a chained and fused implementation and ran out of die area; so they had to choose one and half-ass an implementation for the other.
Any idea what that would look like HW-wise? I thought the intermediary results meant you couldn't really half-ass a FMA (but you could obviously half-ass a chained MADD like Fermi does). I also thought we had determined the chained MADD could be the hardest part of the chip to implement in its latency budget, not the FMA.
metafor said:
I thought most compression algorithms were more storage bound today than they are CPU bound. But I suppose that's for modern desktop chips. Also, I don't think x264 encoding is really something x86 machines in the "ultrabook" or "slim notebook" portions are looking for. I've seen it benchmarked mostly to show "hey, look how much faster this new Core iInfinity XE Ultra-Extreme Unicorn Edition is!"
I see your point, but no. There simply aren't that many good non-synthetic non-game benchmarks for review sites to run, so many sites run x264/7-zip/encryption/etc on practically everything. See this page: http://techreport.com/articles.x/21551/6

I think 4xA9 would do very well against 2xA15 on that page's benchmarks even if it wouldn't do as well for many common use-cases. As I said, I really don't think 4xA9 is a huge marketing disadvantage. If anything I'm more worried about the GPU on Wayne if it's still based on their current architecture (gosh I hope not).

Speaking of which, what's Samsung up to? It would seem that they're usually late with their silicon delivery but able to outperform their competition every year. What's on slate for 2012 after Exynos?
They're a lead licensee for both the Cortex-A15 and the Mali-T604MP, and they've made a lot of noise about their 32nm High-K process (implying it will definitely have products on it and not just on 28nm), so 2xA15/4xT604/32HPL is a very safe bet IMO.

---

Ailuros: I thought OMAP4470 was also 384MHz and the iPad 2 was 250MHz? That's clearly not 2x the frequency and not fast enough.

As for Vivante, Freescale claims some *extremely* impressive benchmark numbers for the GC2000 in i.MX61, but I'd like to see numbers on a platform with VSync on before passing judgement... Still much more impressive than anyone expected I think.
 
Any idea what that would look like HW-wise? I thought the intermediary results meant you couldn't really half-ass a FMA (but you could obviously half-ass a chained MADD like Fermi does). I also thought we had determined the chained MADD could be the hardest part of the chip to implement in its latency budget, not the FMA.

Yes and yes. You could half-ass a chained by storing the multiply result in its full width rather than shift-round-truncate down to 23bits (or 52bits for DP). This essentially means one of your pipeline stages will have ~416 bits for its data operand on top of the exponent data and control bits.

You'd want to silence those extra registers in chained mode, of course, to conserve power. And yes, it's incredibly wasteful. That isn't to say there aren't advantages of having a chained implementation; stand-alone multiplies and adds will take less power and possibly have less latency depending on the implementation.

That's assuming you only have enough die area for either a fused or a chained implementation and not both; or at the very least, a separate dedicated adder.

I see your point, but no. There simply aren't that many good non-synthetic non-game benchmarks for review sites to run, so many sites run x264/7-zip/encryption/etc on practically everything. See this page: http://techreport.com/articles.x/21551/6

I think 4xA9 would do very well against 2xA15 on that page's benchmarks even if it wouldn't do as well for many common use-cases. As I said, I really don't think 4xA9 is a huge marketing disadvantage. If anything I'm more worried about the GPU on Wayne if it's still based on their current architecture (gosh I hope not).

If we're talking market advantage, one should take a look at the benchmarks typically run on mobile devices. Anandtech's benchmark suite, for instance, features Sunspider, Browsermark, Linpack and Vellamo.

None of those are very multi-thread friendly; even Linpack doesn't spread its workload very well.

But you have a point in that in today's world, reviewing the CPU speed of PC's has pretty much been reduced to compression and encoding work. But again, with DSP's on SoC's, I question how much the CPU is really applicable.

They're a lead licensee for both the Cortex-A15 and the Mali-T604MP, and they've made a lot of noise about their 32nm High-K process (implying it will definitely have products on it and not just on 28nm), so 2xA15/4xT604/32HPL is a very safe bet IMO.

And just as I ask, /b/ (of all places) talks about a rumored 2.0GHz quad-core Galaxy S3 next year. Possibly 4xA9 but I suppose quad-A15 is possible.
 
That's assuming you only have enough die area for either a fused or a chained implementation and not both; or at the very least, a separate dedicated adder.
Right, that's what I assumed they would do based on our previous conversation about this: FMA+ADD+L/S with chained MADD being executed as MUL-via-FMA+ADD with an intermediary FIFO ala A8/A9. Oh well, a bit disappointing...

If we're talking market advantage, one should take a look at the benchmarks typically run on mobile devices. Anandtech's benchmark suite, for instance, features Sunspider, Browsermark, Linpack and Vellamo.
Oh right, I agree it's a bigger disadvantage on smartphones. I meant specifically ARM Netbooks/Notebooks and (to a lesser extent) Tablets running Windows 8. NVIDIA is good at making people focus on what they want them to focus on, but if Wayne is basically a faster shrink of Kal-El and also keeps the same old GPU arch, they won't get away with it against the competition. They can say they're targeting lower cost but that doesn't even make sense with Grey coming at nearly the same time anyway.

But again, with DSP's on SoC's, I question how much the CPU is really applicable.
I doubt anyone is ever going to run 7-zip on a proprietary DSP. As for x264, it's still the best software solution by far in terms of both maximum quality and quality-at-a-given-computational-cost.

And just as I ask, /b/ (of all places) talks about a rumored 2.0GHz quad-core Galaxy S3 next year. Possibly 4xA9 but I suppose quad-A15 is possible.
And that SGS3 is supposed to have a 1280x1024 screen... right, whatever. I also doubt the next Nexus will use a 1.5GHz Exynos. This looks like a very elaborate hoax to me although I suppose we'll (kinda) know when we see some ICS screenshots to check his other claims. It's true that the old Samsung roadmap had a quad-core A9 "Aquila" chip but many things changed since then (e.g. the clock target for Orion/Exynos was 800MHz!) and that was before they became a lead licensee for A15. I could see Samsung going for quad-A15 but not before High-K 28nm which won't be ready for a phone shipping in mid-2012.

FWIW, getting back to the topic of this thread, I think there's a >50% probability that Samsung's 1H12 flagship will use Kal-El (or maybe Kal-El+ depending on the timeframe, assuming it's just a clockbump and/or respin). I have no idea whether that will be the SGS3 or if Samsung's going to keep that brand for devices with their own chips.
 
I've read a couple reports where the same 384 MHz was assumed or maybe even tipped for the SGX544 in the OMAP4470, but nothing sounded first hand to me. TI keeps ramping up speeds aggressively, going from 1.0 to 1.5 to 1.8 GHz on the CPU; I suspect some scaling won't miss the GPU between OMAP4460 and 4470.

I think time-to-market has been one of Samsung's strongest assets, first with the first SGX540 SoC in Hummingbird and next with Mali-400MP4 the next year, ahead of Adreno 220 SoCs.

Implementations have been aggressive, too; I think the CE/product side of the company could do worse than relying on them. Of course, the product side is just as aggressive, trying to be among the first to launch just about anything and everything when it comes to phones and tablets and form factors and screens, etc.

My mistake on TI's lead foundry partner for 28nm... UMC it is, then. I still don't have a lot of faith in the projected timetables for non-Intel process transitions (and they miss here and there, too.)
 
Ailuros: I thought OMAP4470 was also 384MHz and the iPad 2 was 250MHz? That's clearly not 2x the frequency and not fast enough.

Based on what Anand wrote or some at least reasonable theory? Cause if it's the first case he simply took the GFLOPs of a SGX540 and multiplied them by 2.5x. In the meantime TI has confirmed that the 2.5x difference is based on the average of a set of internal tests.

Just for the record there was somewhere a public statement a while ago by someone from IMG that their partners are working on their GPU IP and are reaching well over 400MHz. If a dual A9 CPU goes from 1.0GHz to 1.8GHz under the same process where's exactly the problem going from 305 to ~500MHz exactly? 80% more frequency in the first case =/> 50% in the latter.
 
Back
Top