Tegra 3 officially announced; in tablets by August, smartphones by Christmas

Why is Nvidia, which made its bones in graphics, lagging behind in mobile GPUs?

I guess the game is pretty different when it comes to saving power and saving bandwidth as much as possible. Moving up and moving down are pretty different challenges. You can almost see a similar scenario with ARM and Atom right now..

Meanwhile, it's common perception that Tegra 2 was the most powerful mobile GPU out prior to the SGX543MP2 in A5, a perception nVidia may have won with marketing and perhaps more efficient drivers. I don't expect this perception to last.
 
Meanwhile, it's common perception that Tegra 2 was the most powerful mobile GPU out prior to the SGX543MP2 in A5, a perception nVidia may have won with marketing and perhaps more efficient drivers. I don't expect this perception to last.

It will most likely last until Tegra ULP GeForces get more agressive in the future. They could invest more in the GPU side if they wanted to IMHO. I'm not particularly fond of weird theories but at this point ARM is a vessel for NVIDIA to scale up into higher end SoCs and it's not like it's particularly inconvient for ARM either, rather the contrary. Invest more on the CPU side until you reach a theoretical point where you start investing more resources on the GPU side.

I was hoping that we won't see all that much of the wintel hegemony in the embedded space, but as it seems that Microsoft will have a strong foothold in it in the longrun.

On another note and yes it's completely OT but I'm too bored to open a separate thread HTC has bought from VIA their S3 graphics department.
 
On another note and yes it's completely OT but I'm too bored to open a separate thread HTC has bought from VIA their S3 graphics department.

Does this mean we'll never see a GPU integrated on Nano dies? Guess they could license... enh we all know Nano isn't going anywhere anyway :/
 
Does this mean we'll never see a GPU integrated on Nano dies? Guess they could license... enh we all know Nano isn't going anywhere anyway :/

I haven't digged up anything concerning S3 for a long time now. I don't think they have anything ready yet for embedded. If it's not for GPUs for SoCs I can't really imagine what HTC would want them for.

If yes it'll be interesting to watch more and more phone manufacturers to enter the SoC business. After LG now HTC?
 
Qualcomm owns a small but influential piece of HTC, so I don't think they'll be building their own SoCs at this point.
 
Uh, this patent positioning tactic as it relates to the suit against Apple in mobile graphics is potentially outrageous.

Is PVRTC the point of issue? If so, are they essentially trying to lay claim to block truncation coding in general?
 
BSN's Wayne "leak" is completely ridiculous. I could come up with something more credible in five minutes. 8xA9 isn't theoretically possible (without essentially making them two separate nodes - arguably viable for servers but even Calxeda didn't bother) and 8xA15 on 28nm is going to take *way* too much power. Even a quad-core A15 is very aggressive before 20nm.

The common consensus is that Wayne will be 4XA9 correct? Given that we're gonna see dual core A15's and Krait's at the same time, how do you think its going to fare? Wayne could possibly run at ~2Ghz given that Kal El on its 40nm process is at 1.5 ghz. Also with regard to your statement that quad core A15 is too aggressive before 20nm, isnt that what is expected for Logan on 28nm? (There's also APQ8064 with quad core Krait on 28nm).

I think NV is missing a trick by not designing a dual A15 chip targeted at smartphones. It looks like they are aiming to capture more of the tablet market instead with Kal-El/Wayne.
 
While their quad A9s will fall quite short of the dual A15s of the time, few would notice or care.

Wouldn't it come down to the quad A9 frequencies vs. dual A15 frequencies of the time?

If hypothetically both square out at around 2.0GHz, I don't see the quad A9 falling that much short if at all. Tegra3@40nm sounds like 4*A9@1.5GHz; what exactly speaks against an even higher frequency under 28nm?
 
Wouldn't it come down to the quad A9 frequencies vs. dual A15 frequencies of the time?

If hypothetically both square out at around 2.0GHz, I don't see the quad A9 falling that much short if at all. Tegra3@40nm sounds like 4*A9@1.5GHz; what exactly speaks against an even higher frequency under 28nm?

If it's a matter of dual A15 at 2GHz vs quad A9's at 2GHz, 99.99999% of the applications out there will run faster on the A15 solution.
 
It seems NVidia is evangelizing Tegra to some games developers so that may figure in marketing more, to have some games advertise Tegra support.

Perhaps they have an edge over other SOC vendors in that they can leverage their PC business to work with PC developers who are porting to mobile.

So far Tegra 2 has a big share of the Android tablets and aren't a couple of phones also using Tegra2 at this point?
 
... Yeah, the T-Mobile LG G2X and the Atrix and some others.

Qualcomm wasn't far behind in phones, though, with their 1.2 GHz dual core + Adreno 220 in the HTC Sensation and Evo 3D (and soon myTouch Slide 4G among others).
 
Is Tegra 2 the only SOC without NEON?

Are there apps. which benefit from NEON yet?

IOW is it worth getting devices with NEON or are the advantages mostly on paper at this point?
 
Is Tegra 2 the only SOC without NEON?

Are there apps. which benefit from NEON yet?

IOW is it worth getting devices with NEON or are the advantages mostly on paper at this point?

Yes, Tegra 2 is the only Cortex-A9 SoC without NEON and with A15 it'll return to being mandatory. Even Amlogic's single-core Cortex-A9 with a miserable L2 cache size of 128KB has NEON. I would not be terribly surprised if Tegra 2 remains the only NEON-less A9, and even nVidia clearly realized they fumbled on this one.

I'm sure all sorts of things use NEON. You can find articles about it or questions pertaining to it all over the place, and furthermore on Cortex-A8 it's the only way to get acceptable FPU performance. Anyone turning on tree-vectorize on an ARMv7a specified specified build in GCC stands to have NEON inserted somewhere, even if it doesn't improve performance a lot.

There are a lot of game libraries out there for phones, and I imagine NEON is standard for at least some of them. Even Android's glibc uses NEON for memcpy's, where available. PS1 and N64 emulators for ARM use NEON, and I expect this trend to expand into more emulators. SIMD has been part and parcel on console programming for a long time now, so I fully expect NEON to be utilized on PSVita, and from there pieces of the same game code to circulate to phones.

A better question would be how much needs NEON, ie is not built for non-SIMD paths. I'm sure nVidia's earlier insertion of Tegra 2 into the market (vs the other Cortex-A9 SoCs) helped prepare software developers for this compatibility issue. Then the second question becomes, how much doess software written for both benefit from the NEON paths.
 
Yes, Tegra 2 is the only Cortex-A9 SoC without NEON and with A15 it'll return to being mandatory. Even Amlogic's single-core Cortex-A9 with a miserable L2 cache size of 128KB has NEON. I would not be terribly surprised if Tegra 2 remains the only NEON-less A9, and even nVidia clearly realized they fumbled on this one.

The problem with NEON and cortex A9 is that the A9's ROB can't track data dependencies for NEON instructions and have to stall on potential RAW hazards from NEON instructions. Use of NEON instructions effectively turns an A9 into an in-order processor.

The ommission of NEON in Tegra 2 makes perfect sense from a performance point of view. From a software developer point of view it is a PITA having to support multiple code paths, - more work for the developer and app. bloat for everybody. I agree with you that overall it was a bad call by Nvidia.

As you say, Cortex A15 solves all these problems. NEON is mandatory and is integrated with the out-of-order scheduling machinery. Performance-wise the A15 should be equivalent to a Intel P-III Coppermine.

Cheers
 
The problem with NEON and cortex A9 is that the A9's ROB can't track data dependencies for NEON instructions and have to stall on potential RAW hazards from NEON instructions. Use of NEON instructions effectively turns an A9 into an in-order processor.

Interesting. Does this mean that a single uncommitted NEON instruction can stall younger non-NEON instructions as well or simply that NEON instructions must be in-order with respect to each other?

I can't imagine ARM breaking their design so badly as to make it the later.
 
The problem with NEON and cortex A9 is that the A9's ROB can't track data dependencies for NEON instructions and have to stall on potential RAW hazards from NEON instructions. Use of NEON instructions effectively turns an A9 into an in-order processor.

Yes, NEON is in-order. And for in-order it doesn't really have enough registers to cover its latency, which can be substantial (and you don't get single cycle for almost everything). It requires aggressive hand scheduling to get good utilization out of, but for the type of data parallel and data linear algorithms that you often use here it's fairly viable to get good utilization without OoOE and with generous prefetching. Just painful.

NEON is actually further handicapped on Cortex-A9 since it isn't staggered against the integer pipeline in the same way it was on A8, so while you don't get the same penalty going from NEON registers to ARM ones you also don't get the same latency hiding for loads. You also can't dual issue loads/stores/permutes anymore, so you'll probably see quite lower per-clock performance for very highly optimized code..

But I'd still want to have it on all Cortex-A9s.

The ommission of NEON in Tegra 2 makes perfect sense from a performance point of view. From a software developer point of view it is a PITA having to support multiple code paths, - more work for the developer and app. bloat for everybody. I agree with you that overall it was a bad call by Nvidia.

The thing is, those NEON cores took up very little die space, although maybe nVidia was more concerned with capping max TDP. Or maybe NEON constrained their clock potential more. I still say that for the right code NEON, even on A9, can improve performance by a highly substantial amount, but maybe nVidia needed more of that out already to determine that. If you use it naively it won't give you much at all.

As you say, Cortex A15 solves all these problems. NEON is mandatory and is integrated with the out-of-order scheduling machinery. Performance-wise the A15 should be equivalent to a Intel P-III Coppermine.

Yeah that'd be pretty nice. It's not quite as wide, though (afaik 2 ALUs vs 3), but then again neither is Bulldozer.

metafor said:
Interesting. Does this mean that a single uncommitted NEON instruction can stall younger non-NEON instructions as well or simply that NEON instructions must be in-order with respect to each other?

I can't imagine ARM breaking their design so badly as to make it the later.

On A8 NEON was decoupled via an instruction queue that'd only stall dispatch if it filled up (which it wouldn't if you interspersed non-NEON instructions), and the only stall I was aware of in the other direction is if you transferred from NEON to ARM registers, at which point any instruction touching the ARM register file at all would cause it.

A9 is probably still using a queue like this, and the integer core probably does even less work with NEON instructions than A8 did in order to keep that functionality in the optional module.

Looking here on page 12 you can see there's an instruction FIFO in the compute side:

http://www.arm.com/files/downloads/Cortex-A9_Devcon_2007_Microarchitecture.pdf

It's just that the dispatch to here occurs a lot earlier in the pipeline than it did on A8.
 
Back
Top