Tegra 2 Announcement

Lazy8s · Jan 13, 2010

Like Texas Instruments with their upgrade from OMAP3 to 4, nVidia would likely be claiming a 4x improvement in graphics had they added to their TMU count considering the near-doubling in clock rate with the process transition.

Blazkowicz · Jan 14, 2010

darkblu said:
from what's know so far, tegra2 does not seem to be any more OCL/CUDA friendly than tegra1 is, or, IOW, not friendly at all. combined with no SIMD, that would position tegra2 as one of the very few (the only?) pocket computing platforms without viable provision for running custom number-crunching software, whereas virtually all the competition has some means to do some heavy lifting, be that OCL, SIMD, proprietary vector engines or arbitrary combinations thereof. being the only one who can't do something which everybody else on the market can is not an advantageous position - there's no guarantee that tomorrow a killer number-crunching app for handhelds won't come out (say, hypotetically, a new AV codec). what will nv do then - come up with 'CUDA for tegra'? perhaps release a NEON-enabled SKU?

Tegra 2 is a huge CPU boost already. General purpose CPU is more important than GPGPU or SIMD for now. It allows running CPU hungry applications such as firefox, and leaves an iphone 3GS in the dust, or stuff like Geode and Xcore86.
Besides, you even get two CPU, so you can number-crunch a bit using regular multithreaded code.
I believe you need a good general purpose CPU first, be it for a tablet, a phone, a laptop or anything else

Ailuros said:
If that's narrowed again down to linux, flash and stuff like video decoding then I don't think their competition has anything to worry either.

I don't understand that much. if you wish to run a regular OS on a tegra tablet or laptop, and that can't be Windows, what will you run? even google android runs on a Linux kernel.
I guess even lay people will want to run regular software on a Tegra laptop, just as they do with an Atom laptop.
Running ubuntu with supported hardware will be useful and will look functionally identical to a core2duo laptop running the same software.

flash acceleration is an easy sell : you only have to tell it makes youtube run better.
video decoding is a feature many people care about even on desktops. easy sell as well : play any video with a long battery life, no need to transcode anything or care about the encoding.

Blazkowicz · Jan 14, 2010

Is there a hint this hardware may be released on a mini-ITX board?

I'd like a small fanless, always-on box in the living room. It has everything needed : very high perf/watt, media decoding and a decent slow GPU (to run emulators)

Ailuros · Jan 14, 2010

Blazkowicz said:
Tegra 2 is a huge CPU boost already. General purpose CPU is more important than GPGPU or SIMD for now. It allows running CPU hungry applications such as firefox, and leaves an iphone 3GS in the dust, or stuff like Geode and Xcore86.
Besides, you even get two CPU, so you can number-crunch a bit using regular multithreaded code.
I believe you need a good general purpose CPU first, be it for a tablet, a phone, a laptop or anything else

Hw has to preceed sw in order for something to get a kick start and that's perfectly within NV's usual philosophies. How sure are you exactly that CUDA/OpenCL would be without any benefits for multimedia/gp oriented stuff, especially with all the background investments NVIDIA is doing for those on all other platforms?

Multithreading is such a vague term that can have vast differences on any unit and while it's an understandable term for any GPU for eons now there are still vast differences in more recent architectures as in the past. The closer for instance to zero overhead for context switches and the better you can run multiple threads without any parts of the pipeline sitting idle the better and yes of course also for graphics.

Personally I wouldn't worry how it compares today in theory to a iPhone3GS until it gets integrated into a more advanced smartphone then the latter and reaches its sales figures. By that time Apple or any other manufacturer will have a refresh ready.

I don't understand that much. if you wish to run a regular OS on a tegra tablet or laptop, and that can't be Windows, what will you run? even google android runs on a Linux kernel.
I guess even lay people will want to run regular software on a Tegra laptop, just as they do with an Atom laptop.
Running ubuntu with supported hardware will be useful and will look functionally identical to a core2duo laptop running the same software.

flash acceleration is an easy sell : you only have to tell it makes youtube run better.
video decoding is a feature many people care about even on desktops. easy sell as well : play any video with a long battery life, no need to transcode anything or care about the encoding.

I merely said that I don't see competing parts being in a disadvantage there. If we'd be talking about a windows OS and considering how underwhelming Intel netbooks graphics drivers were/still are, then yes NV can and will make a sizeable difference. And that's a point the entire market will benefit from since it'll force any other competitor to finally wake up and fix the drivers.

tangey · Jan 14, 2010

Blazkowicz said:
Tegra 2 is a huge CPU boost already. General purpose CPU is more important than GPGPU or SIMD for now. It allows running CPU hungry applications such as firefox, and leaves an iphone 3GS in the dust, or stuff like Geode and Xcore86.
Besides, you even get two CPU, so you can number-crunch a bit using regular multithreaded code.
I believe you need a good general purpose CPU first, be it for a tablet, a phone, a laptop or anything else

Comparing Tegra2 with the 3GS is not terribly useful...Tegra2 is on paper, the chip inside the 3GS has been out for 8 months ???... it is likely that the sucessor to the 3GS with a possible change of Soc (or higher clocked) will appear on shelves long before Tegra2 ends up in a product.

Laurent06 · Jan 14, 2010

tangey said:
Comparing Tegra2 with the 3GS is not terribly useful...Tegra2 is on paper, the chip inside the 3GS has been out for 8 months ???... it is likely that the sucessor to the 3GS with a possible change of Soc (or higher clocked) will appear on shelves long before Tegra2 ends up in a product.

For sure Tegra2 isn't on paper only. Now for end-user products it's a different question

And I bet any info for the next gen 3GS is pure speculation.

ADEX · Jan 14, 2010

Gubbi said:
web browsing would gain *nothing* from SIMD.

well, apart from:

the network stack
image decode
font drawing / antialising
and many standard C lib functions (memcopy, sort, search)
see http://www.freevec.org/

I've seen a few times on B3D people who appear to be labouring under the assumption that vector units are only good for media processing. This is untrue, you can use them for all sorts of things.

tangey · Jan 15, 2010

Laurent06 said:
For sure Tegra2 isn't on paper only. Now for end-user products it's a different question And I bet any info for the next gen 3GS is pure speculation.

undoubtedly, but it'll be better that current gen 3GS AND likely to market before Tegra2. Saying Tegra2 is better that 3GS Soc is like saying Tegra2 is better than Tegra1....its would be awful if it wasn't.

Laurent06 · Jan 15, 2010

ADEX said:
I've seen a few times on B3D people who appear to be labouring under the assumption that vector units are only good for media processing. This is untrue, you can use them for all sorts of things.

I suddenly feel less lonely

A nice example is the pixman library.

Blazkowicz · Jan 15, 2010

The 3GS comparison in intended as a comparison with existing good hardware.
I have no doubt other Soc for other vendors will be excellent as well, that's what you get with an ultra small process and Cortex cores.

Blazkowicz · Jan 15, 2010

What differenciates nvidia from qualcomm or powervr is the software support I expect for the GPU side. i.e. what differentiate them on the PC market too.

sure Tegra 2 isn't there for the physx/cuda/opencl trend, but as I said I expect 1st class linux driver, similar to the binary driver for their GPUs (the only one that goes as far as replacing many parts of the X11 server). Not needed for a phone or a console, but brings credibility for a multipurpose platform.

nvidia has been pushed out of PC chipsets, and don't have a CPU of their own ; but Tegra is a platform of their own. It may be quite strategic for them, and be used on the mobile market but also low end, low power and 3rd world.

Tegra 3 may turn out to have a single precision 32SP Fermi, and dual or quad ARM cores (probably two variants). In essence, a CPU similar to an AMD Fusion or dual core Sandy Bridge, but on low power market segment.

I don't know if it's wishful thinking (power-wise). that would be a new twist in the history of GPU, an architecture scaling down to a new, lower market segment.
or maybe the Fermi would be simplified (less SP, no dual scheduler).. I don't know.

Arun · Jan 15, 2010

So uhh, I haven't replied in a while and I see a lot of very good arguements and at least as many incredibly dubious ones (I'd say probably a bit more but that wouldn't be nice

) - anyway this took quite some time to write and I don't expect anyone to read it, but here goes!

BTW, for what it's worth once again: I don't know for sure whether this Tegra2 chip doesn't have NEON. I'm pretty sure it doesn't, but theoretically what I heard from multiple people might only have been for the other chip (I'm not sure about the codenames, but I assume AP20 vs T20).

What is your estimate of area of NEON (for a single core, ofc) at 40nm?

0.45-0.55mm² depending on the target frequency etc. if I based myself on a recent ARM presentation with a Osprey floorplan that I found a few weeks ago - and yes, that's not very big, it seems to have gotten strangely more efficient since the A8 where I've seen figures as high as >3mm² on 65nm. We're literally talking 2mm² on 40nm for a full implementation versus 9mm² for the A8 in OMAP3 on what is, frankly, a surprisingly low-density 65nm process (but still!)

I really need to poke around a bit to figure out how such a massive gap is possible. Maybe the new NEON, unlike the old one, has many fewer truly separate paths (i.e. multi-purpose ALUs versus many specializes ones), which on a low-leakage process would make it a bit less power efficient but given the huge area savings would clearly be worth the cost.

Isn't 256K cache for an embedded GPU too much,especially one that is an IMR. AFAIK, even beefy desktop gpu's don't have that much cache to go around.

It's not a cache per-se AFAIK; think of it more as a buffer. I've always speculated that it's dual-mode: it either owns Hier-Z info for a tile or, if the compression ratio is good enough, the entire Z data which would allow the chip to not even *write* to external memory. That's just how I'd use such a buffer, which probably means it doesn't work that way at all though

Why do you think that? 32bit LPDDR2 667 should offer 2.7GB/s bandwidth. That might be "enough" for the cpu, but it looks to me like it could be pretty limiting for a (non-TBDR at least) renderer. I think the last nvidia gpus with such low memory bandwidth on the desktop were about 5 years ago (GeForce 6100/6200) and those versions which only had that little memory bandwidth (the normal ones had about twice that) were very limited by the lack of bandwidth.

There is still this bizarre misconception that Tim Sweeney's claims are anything more than marketing... Apparently this is a 240MHz 2xTMU chip; that's less raw texrate than a GeForce2 MX! (2x2 TMUs @ 175MHz) - which also had 2.7GB/s bandwidth for the top SKU, but with no framebuffer compression or other bandwidth saving techniques.

Ailuros said:
Best in class in terms of what exactly? I'm not very good in estimating die area from die shots but the graphics part, but if my rather dumb estimate should be close to reality and the GPU on T2 is roughly over 10mm2@40LP and it's still only 2PS/2VS/2TMUs@240MHz, then the perf/mm2 isn't exactly what I'd call ideal.

Wow Ail, I guess you answered yourself there...

I really don't know how you got to 10mm² for the GPU part. There's very little to base any estimate there except our own subjectivity. My own biases led me to three possible partitionings for the GPU, with die size going from 3.5 to 6mm² iirc. 10mm² really seems like a massive stretch to me!

I suspect the less efficient blocks are video decode & encode, personally. Plus there's some weird I/O on the chip (e.g. vertical bar on the left) that I can't understand the meaning of at all, and I'm sure some of the things like LVDS and IDE didn't scale from 65 to 40nm - both of which are on the chopping block for the smaller chip. We'll see how that one turns out area-wise, since it'll be the high-volume part for phones in theory.

BTW, when is 28LP coming up? Late this year, mid next year, late next year....?

28LP/LPT has had advanced test chips taped-out from lead partners (i.e. Qualcomm, NVIDIA, etc.) and should have real tape-outs in Q2 or Q3.

Ailuros said:
iPhone 3GS is a smartphone; what you're seeing as Tegra2 for the moment is the highest end variant which isn't necessarily targetted at smartphones unless you intend to run around with a battery backpack for them

Given that it's in some ways more smartphone-like than OMAP4 (e.g. 32-bit memory vs 64-bit), good to know that OMAP4 won't compete in that market either then

More seriously, it's obviously a flagship product for phones, which will attract various design wins but maybe not that much volume. The hope is probably that customers that have experience with their driver/software stack decide to use the smaller chip on more mainstream models.

There's also the fact that baseband/connectivity/touchscreen/etc. costs have gone down quite a bit, so for a given price segment it's possible to afford a more expensive application processor. In terms of power consumption, there's little reason for this to take a lot more power for doing *the same task* (i.e. VGA video decoding) than a much smaller chip. Certainly the peak power would be high, but that's not unprecedented; OMAP3 HD video encode via DSP+A8, anyone?

unless of course you're willing to believe the fairy tales that Tegra3 might be a GF100 grandchild just as Tegra2 was supposed to be a GF9 grandchild. The reality check for the latter will tell you that T2 might not even reach GF6 capabilities in the end.

Only a complete fool would have believed that NV would put an entire GPU architecture to the trash after a single generation; this would be unprecedented in the company's entire history, if not the industry's history. We both know that's not a fair basis to judge things on...

from what's know so far, tegra2 does not seem to be any more OCL/CUDA friendly than tegra1 is, or, IOW, not friendly at all. combined with no SIMD, that would position tegra2 as one of the very few (the only?) pocket computing platforms without viable provision for running custom number-crunching software, whereas virtually all the competition has some means to do some heavy lifting, be that OCL, SIMD, proprietary vector engines or arbitrary combinations thereof. being the only one who can't do something which everybody else on the market can is not an advantageous position - there's no guarantee that tomorrow a killer number-crunching app for handhelds won't come out (say, hypotetically, a new AV codec). what will nv do then - come up with 'CUDA for tegra'? perhaps release a NEON-enabled SKU?

Okay, I think that's a very good point, and my understanding of what the answer is should also answer a lot of things.

NV's strategy is velocity, rapidly moving from generation to generation. That means they don't feel they need to be exceedingly future-proof; they can be just in time (no pun intended!) - so the answer (assuming once again that they do not have NEON in Tegra2 as I'm pretty sure but not entirely certain is the case) is that they will have CUDA for their next-gen embedded GPU arch, probably for Tegra3. Obviously they'd much rather push to devs than NEON given their background.

Of course there's the problem that design win cycles are long and customers don't change their phones every month; so during the chip's lifecycle in users' hands, it's likely to become a limitation. Probably they believe it'll remain a niche thing and practically nobody will realize anything so it doesn't matter, and clearly many people here don't believe that. Both positions are a bit too extreme to my tastes, but heh.

If you narrow it all down to media playback and webbrowsing I don't think even a GPU was necessary after all.

I think even IMG would gladly set you straight on the GPU not being useful for web browsing. Of course, if you literally meant 'necessary', I guess maybe you don't need more than an old microcontroller...

I wonder if SIMD can't be put to good use for various small graphics tasks for which sending commands to some 2d/3d hardware would be more expensive. That'd make Web browsing faster.

Intriguing, I don't know. My guess is it wouldn't save much given that the GPU is so close to the CPU on a SoC, especially on the newer ones (Tegra1 was the first) that have a L2 cache controller that allows you to bypass external memory to send GPU commands. On SGX the context switch would be very cheap, on Tegra probably not so much although I don't know the specifics. Either way I doubt it'd be worth the software complexity & possible bugs.

the network stack
image decode
font drawing / antialising
and many standard C lib functions (memcopy, sort, search)
see http://www.freevec.org/

I've seen a few times on B3D people who appear to be labouring under the assumption that vector units are only good for media processing. This is untrue, you can use them for all sorts of things.

Nice examples, cheers. The standard C lib functions is something I don't remember as often as I should! For image decode, I guess on the web that's mostly jpeg & png. The former has a dedicated block on any modern SoC, the latter is mostly zlib I think; out of curiosity, is there actually any worthwhile SIMD implementation of that out there? Also I'm curious what you mean by 'network stack' - I assume processing the IP packets themselves? Obviously the entire MAC for not only 3G but also WiFi (unlike on PCs) is done in the connectivity chips for handhelds/smartbooks so I assume you meant IP, is that really still significant nowadays? (I'd assume not but I'm genuinely ignorant)

Also photo stitching and pixman are two nice/worthwhile ideas, cheers Simon/Laurent.

What differenciates nvidia from qualcomm or powervr is the software support I expect for the GPU side. i.e. what differentiate them on the PC market too.

The HTC thing is a fair bit more complicated than what even some industry insiders seem to believe; either way, hopefully Qualcomm GPU drivers will improve now that they've fully integrated the AMD Handheld GPU team.

Tegra 3 may turn out to have a single precision 32SP Fermi, and dual or quad ARM cores (probably two variants). In essence, a CPU similar to an AMD Fusion or dual core Sandy Bridge, but on low power market segment.

32 SP Fermi clearly seems too optimistic to me, but we'll see!

Ailuros · Jan 15, 2010

Blazkowicz said:
What differenciates nvidia from qualcomm or powervr is the software support I expect for the GPU side. i.e. what differentiate them on the PC market too.

Intel's GM500 has problematic DX 3D drivers, but to be fair it's probably both Intel's as IMG's fault that they haven't catered for better drivers. Indications whether they could be better or not here:

http://www.mitrax.de/?cont=artikel&aid=36&page=1

With decent drivers it should at least be in the KYRO performance ballpark.

sure Tegra 2 isn't there for the physx/cuda/opencl trend, but as I said I expect 1st class linux driver, similar to the binary driver for their GPUs (the only one that goes as far as replacing many parts of the X11 server). Not needed for a phone or a console, but brings credibility for a multipurpose platform.

http://beagleboard.org/ Just an example.

nvidia has been pushed out of PC chipsets, and don't have a CPU of their own ; but Tegra is a platform of their own. It may be quite strategic for them, and be used on the mobile market but also low end, low power and 3rd world.

NVIDIA has been "pushed out" by AMD/INTEL those platforms, but as a SoC manufacturer they're still competing against Intel, Samsung, Renesas+NEC etc and I figure probably in the longrun against AMD also (Ontario?).

Tegra 3 may turn out to have a single precision 32SP Fermi, and dual or quad ARM cores (probably two variants). In essence, a CPU similar to an AMD Fusion or dual core Sandy Bridge, but on low power market segment.

Ok that one cracked me up. You're too damn optimistic. Make that a g9x class grandchild and we'll get an agreement and there I don't see more than 8SPs but that's just me. Take the Fermi die size and divide it by 16 (which is rather dumb but it helps getting an idea) what each cluster accounts for.

32SPs will be Fermi's future ION replacements and I think you're quite confusing the markets Tegra and ION are supposed to address. And that's not even tomorrow either but with future manufacturing processes.

I don't know if it's wishful thinking (power-wise). that would be a new twist in the history of GPU, an architecture scaling down to a new, lower market segment.
or maybe the Fermi would be simplified (less SP, no dual scheduler).. I don't know.

More than a handful wishful thinking area and power wise. TDP for high end GF100 should be just one notch below that of a 5970. The only thing a hypothetical single cluster of GF100 would do in the 2010/11 in the embedded market is break every imaginable record in terms of power consumption.

And just to give you an idea (because I'm afraid many of you still haven't realized how "low" even theoretical figures are for GPUs like in Tegra2, the maximum theoretical GFLOP count from the PS ALUs should be a tad south of 9 GFLOPs. And while FLOPs are another quite meaningless measurement since there are vast differences for floating point throughput on different architecture, that's still a single digit value.

No those GPUs will scale into the desktop and some sort of magic wands will turn the drivers to cover all possible shortcomings of the hardware. Irrelevant of graphics developer performance is tuned for ultra low resolutions and adapted to those. It simply isn't possible to squeeze more into 5-10 square millimeters for "high end" embedded GPUs than they already have.

Ailuros · Jan 15, 2010

Arun said:
Wow Ail, I guess you answered yourself there... I really don't know how you got to 10mm² for the GPU part. There's very little to base any estimate there except our own subjectivity. My own biases led me to three possible partitionings for the GPU, with die size going from 3.5 to 6mm² iirc. 10mm² really seems like a massive stretch to me!

I hardly grasp things out of thin air especially since I'm very unsure in estimating from die shots. I had some help but yes that someone could be wrong too. For the time being (and considering there are other alike mysteries with die area and Intel's stuff we talked about) I'll leave it blank and on a completely OT note I may or may not owe you a big apology for Fermi and you know that I don't have a hard time either apologizing for my mistakes. In any case if yes and considering the flak I gave you it'll end up into a phat apology

Given that it's in some ways more smartphone-like than OMAP4 (e.g. 32-bit memory vs 64-bit), good to know that OMAP4 won't compete in that market either then More seriously, it's obviously a flagship product for phones, which will attract various design wins but maybe not that much volume. The hope is probably that customers that have experience with their driver/software stack decide to use the smaller chip on more mainstream models.

Well their stuff on their homepage seems still quite vague (Tegra2 = Tegra 250). I am expecting that they might differentiate themselves just as with T1 for APX and 650/600.

I think even IMG would gladly set you straight on the GPU not being useful for web browsing. Of course, if you literally meant 'necessary', I guess maybe you don't need more than an old microcontroller...

*ahem* http://www.imgtec.com/News/Release/index.asp?NewsID=498
http://www.imgtec.com/News/Release/index.asp?NewsID=496

Arun · Jan 15, 2010

Ailuros said:
I hardly grasp things out of thin air especially since I'm very unsure in estimating from die shots. I had some help but yes that someone could be wrong too. For the time being (and considering there are other alike mysteries with die area and Intel's stuff we talked about) I'll leave it blank

Fair enough, this stuff certainly isn't easy to estimate - well unless you have a chip and a Very-Precise-Thermometer(TM) at your disposal!

and on a completely OT note I may or may not owe you a big apology for Fermi and you know that I don't have a hard time either apologizing for my mistakes. In any case if yes and considering the flak I gave you it'll end up into a phat apology

Hehe. Well given how many things I got wrong in my pre-release/announcement speculation (including a few non-speculation bits actually) for Fermi... don't make too big a deal out of it since it'll just make me look bad too

jk

Well their stuff on their homepage seems still quite vague (Tegra2 = Tegra 250). I am expecting that they might differentiate themselves just as with T1 for APX and 650/600.

Oh yeah, I was responding in terms of the different chips, but I'm sure there will be different SKUs for this chip too... just like I expect the main OMAP4 SKU for smartphones will be the OMAP4430 at 720MHz, not the 1GHz OMAP4440. NV seems to claim there's still some headroom so there could be a low-volume even higher-clocked SKU down the road (like TI eventually released the OMAP3440) so hopefully the gap between the main tablet and phone SKUs won't be too massive.

*ahem* http://www.imgtec.com/News/Release/index.asp?NewsID=498
http://www.imgtec.com/News/Release/index.asp?NewsID=496

Hehe, yeah, forgot that for a second

But those are hardly 'old microcontrollers' are they? *grins* Either way IMG does claim SGX can help a lot for the rendering part of web browsing, especially making things like scrolling smoother. The same claims are made by NV, ARM (for Mali), etc. - so it's hardly a controversial thing to say as I'm sure you realize.

rpg.314 · Jan 15, 2010

Lot's of answers, lots to ask.

Arun said:
BTW, for what it's worth once again: I don't know for sure whether this Tegra2 chip doesn't have NEON. I'm pretty sure it doesn't, but theoretically what I heard from multiple people might only have been for the other chip (I'm not sure about the codenames, but I assume AP20 vs T20).

0.45-0.55mm² depending on the target frequency etc. if I based myself on a recent ARM presentation with a Osprey floorplan that I found a few weeks ago - and yes, that's not very big, it seems to have gotten strangely more efficient since the A8 where I've seen figures as high as >3mm² on 65nm. We're literally talking 2mm² on 40nm for a full implementation versus 9mm² for the A8 in OMAP3 on what is, frankly, a surprisingly low-density 65nm process (but still!)

For a dual core, NEON area is ~1 mm2.

Why bother removing this from the chip that will likely be used in gaming oriented devices? If indeed it is so small, I'd expect it to be there in T3.

I suspect the less efficient blocks are video decode & encode, personally. Plus there's some weird I/O on the chip (e.g. vertical bar on the left) that I can't understand the meaning of at all, and I'm sure some of the things like LVDS and IDE didn't scale from 65 to 40nm - both of which are on the chopping block for the smaller chip. We'll see how that one turns out area-wise, since it'll be the high-volume part for phones in theory.
28LP/LPT has had advanced test chips taped-out from lead partners (i.e. Qualcomm, NVIDIA, etc.) and should have real tape-outs in Q2 or Q3.

Do you expect the video decode/encode blocks to take less fraction of area in T3 than T2? T2 pretty much covers the video use cases, so I am expecting an area budget bump for the rest of the blocks next-gen.

BTW, since a dual core A9 + 1MB L2 only costs ~8mm2 on 40 nm, I am fully expecting a few of these babies in GF200. When your area budget is ~550mm2, why not slap these things in to avoid the memory roundtrip across PCIe bus? At 2GHz, a dual core A9 with 4MB L2 is going to cost <10mm2 on 28nm and will remove the final argument against gpu chips being very inefficient at control-oriented and branchy tasks? Also the driver latency will go way down.

NV's strategy is velocity, rapidly moving from generation to generation. That means they don't feel they need to be exceedingly future-proof; they can be just in time (no pun intended!) - so the answer (assuming once again that they do not have NEON in Tegra2 as I'm pretty sure but not entirely certain is the case) is that they will have CUDA for their next-gen embedded GPU arch, probably for Tegra3. Obviously they'd much rather push to devs than NEON given their background.

CUDA on mobile phones, interesting end exciting.

Ailuros · Jan 15, 2010

Arun said:
Hehe, yeah, forgot that for a second But those are hardly 'old microcontrollers' are they? *grins*

No they aren't of course old microcontrollers and while the META processor isn't a recent invention of IMG, they only recently decided to market it outside the so far domains. If you cut out the heart of a SGX (ie the USSE) and leave the surrounding enchilada out (yes oversimplyfied) what you'll have in the end is a general purpose processor or else a META.

Either way IMG does claim SGX can help a lot for the rendering part of web browsing, especially making things like scrolling smoother. The same claims are made by NV, ARM (for Mali), etc. - so it's hardly a controversial thing to say as I'm sure you realize.

If you have in a SoC already a GPU such a co-processor like the META will be redundant. If you have a device though that concentrates exclusively on web browing, media player or even a camera (and there's no fancy 3D GUI on the device's screen that is) then you might be better off with something far more simplistic and cheaper like a GPP.

mczak · Jan 15, 2010

Arun said:
There is still this bizarre misconception that Tim Sweeney's claims are anything more than marketing... Apparently this is a 240MHz 2xTMU chip; that's less raw texrate than a GeForce2 MX! (2x2 TMUs @ 175MHz) - which also had 2.7GB/s bandwidth for the top SKU, but with no framebuffer compression or other bandwidth saving techniques.

Ok I guess I overestimated performance of this thing slightly. Still, those 2.7GB/s are shared between cpu and gpu - assuming the chip has a theoretical pixel fill rate of 2 pixels / clock (?) it would still be enough for full rate including blending (at even 4 bytes per pixel). Not including z reads/writes but if it really has advanced compression schemes (and that large L2 might help there) I guess it shouldn't be too bad. Textures will be compressed almost always I guess too so don't consume that much bandwidth neither.

Only a complete fool would have believed that NV would put an entire GPU architecture to the trash after a single generation; this would be unprecedented in the company's entire history, if not the industry's history.

NV1?

Ailuros · Jan 15, 2010

mczak said:
Textures will be compressed almost always I guess too so don't consume that much bandwidth neither.

It even turns my guts if TC gets skipped on the PC; a gazillion times more on an embedded device where memory amount is so limited.

NV1?

You may excuse the exaggeration: but they weren't the giant back then they are today addressing that many markets but something more in the direction of a couple of friends glueing together a GPU in a garage

JohnH · Jan 15, 2010

Arun said:
My own biases led me to three possible partitionings for the GPU, with die size going from 3.5 to 6mm² iirc. 10mm² really seems like a massive stretch to me!

Are you including that 256KB of RAM in that area? It seems to pretty integral to their performance so I don't think you can seperate it from the GPU core itself.

John.

Tegra 2 Announcement

Lazy8s

Blazkowicz

Blazkowicz

Ailuros

Epsilon plus three

tangey

Laurent06

ADEX

tangey

Laurent06

Blazkowicz

Blazkowicz

Arun

Unknown.

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Arun

Unknown.

rpg.314

Ailuros

Epsilon plus three

mczak

Ailuros

Epsilon plus three

JohnH