Tegra 2 Announcement

Discussion in 'Mobile Devices and SoCs' started by Arun, Jan 7, 2010.

  1. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    18
    Like Texas Instruments with their upgrade from OMAP3 to 4, nVidia would likely be claiming a 4x improvement in graphics had they added to their TMU count considering the near-doubling in clock rate with the process transition.
     
  2. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Tegra 2 is a huge CPU boost already. General purpose CPU is more important than GPGPU or SIMD for now. It allows running CPU hungry applications such as firefox, and leaves an iphone 3GS in the dust, or stuff like Geode and Xcore86.
    Besides, you even get two CPU, so you can number-crunch a bit using regular multithreaded code.
    I believe you need a good general purpose CPU first, be it for a tablet, a phone, a laptop or anything else ;)


    I don't understand that much. if you wish to run a regular OS on a tegra tablet or laptop, and that can't be Windows, what will you run? even google android runs on a Linux kernel.
    I guess even lay people will want to run regular software on a Tegra laptop, just as they do with an Atom laptop.
    Running ubuntu with supported hardware will be useful and will look functionally identical to a core2duo laptop running the same software.

    flash acceleration is an easy sell : you only have to tell it makes youtube run better.
    video decoding is a feature many people care about even on desktops. easy sell as well : play any video with a long battery life, no need to transcode anything or care about the encoding.
     
  3. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Is there a hint this hardware may be released on a mini-ITX board?

    I'd like a small fanless, always-on box in the living room. It has everything needed : very high perf/watt, media decoding and a decent slow GPU (to run emulators)
     
    #63 Blazkowicz, Jan 14, 2010
    Last edited by a moderator: Jan 14, 2010
  4. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,416
    Likes Received:
    178
    Location:
    Chania
    Hw has to preceed sw in order for something to get a kick start and that's perfectly within NV's usual philosophies. How sure are you exactly that CUDA/OpenCL would be without any benefits for multimedia/gp oriented stuff, especially with all the background investments NVIDIA is doing for those on all other platforms?

    Multithreading is such a vague term that can have vast differences on any unit and while it's an understandable term for any GPU for eons now there are still vast differences in more recent architectures as in the past. The closer for instance to zero overhead for context switches and the better you can run multiple threads without any parts of the pipeline sitting idle the better and yes of course also for graphics.

    Personally I wouldn't worry how it compares today in theory to a iPhone3GS until it gets integrated into a more advanced smartphone then the latter and reaches its sales figures. By that time Apple or any other manufacturer will have a refresh ready.

    I merely said that I don't see competing parts being in a disadvantage there. If we'd be talking about a windows OS and considering how underwhelming Intel netbooks graphics drivers were/still are, then yes NV can and will make a sizeable difference. And that's a point the entire market will benefit from since it'll force any other competitor to finally wake up and fix the drivers.
     
  5. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,436
    Likes Received:
    193
    Location:
    0x5FF6BC
    Comparing Tegra2 with the 3GS is not terribly useful...Tegra2 is on paper, the chip inside the 3GS has been out for 8 months ???... it is likely that the sucessor to the 3GS with a possible change of Soc (or higher clocked) will appear on shelves long before Tegra2 ends up in a product.
     
  6. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    714
    Likes Received:
    33
    For sure Tegra2 isn't on paper only. Now for end-user products it's a different question :) And I bet any info for the next gen 3GS is pure speculation.
     
  7. ADEX

    Newcomer

    Joined:
    Sep 11, 2005
    Messages:
    231
    Likes Received:
    10
    Location:
    Here
    well, apart from:

    the network stack
    image decode
    font drawing / antialising
    and many standard C lib functions (memcopy, sort, search)
    see http://www.freevec.org/


    I've seen a few times on B3D people who appear to be labouring under the assumption that vector units are only good for media processing. This is untrue, you can use them for all sorts of things.
     
  8. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,436
    Likes Received:
    193
    Location:
    0x5FF6BC
    undoubtedly, but it'll be better that current gen 3GS AND likely to market before Tegra2. Saying Tegra2 is better that 3GS Soc is like saying Tegra2 is better than Tegra1....its would be awful if it wasn't.
     
  9. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    714
    Likes Received:
    33
    I suddenly feel less lonely :grin:

    A nice example is the pixman library.
     
  10. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    The 3GS comparison in intended as a comparison with existing good hardware.
    I have no doubt other Soc for other vendors will be excellent as well, that's what you get with an ultra small process and Cortex cores.
     
  11. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    What differenciates nvidia from qualcomm or powervr is the software support I expect for the GPU side. i.e. what differentiate them on the PC market too.

    sure Tegra 2 isn't there for the physx/cuda/opencl trend, but as I said I expect 1st class linux driver, similar to the binary driver for their GPUs (the only one that goes as far as replacing many parts of the X11 server). Not needed for a phone or a console, but brings credibility for a multipurpose platform.

    nvidia has been pushed out of PC chipsets, and don't have a CPU of their own ; but Tegra is a platform of their own. It may be quite strategic for them, and be used on the mobile market but also low end, low power and 3rd world.

    Tegra 3 may turn out to have a single precision 32SP Fermi, and dual or quad ARM cores (probably two variants). In essence, a CPU similar to an AMD Fusion or dual core Sandy Bridge, but on low power market segment.

    I don't know if it's wishful thinking (power-wise). that would be a new twist in the history of GPU, an architecture scaling down to a new, lower market segment.
    or maybe the Fermi would be simplified (less SP, no dual scheduler).. I don't know.
     
    #71 Blazkowicz, Jan 15, 2010
    Last edited by a moderator: Jan 15, 2010
  12. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    So uhh, I haven't replied in a while and I see a lot of very good arguements and at least as many incredibly dubious ones (I'd say probably a bit more but that wouldn't be nice ;)) - anyway this took quite some time to write and I don't expect anyone to read it, but here goes!

    BTW, for what it's worth once again: I don't know for sure whether this Tegra2 chip doesn't have NEON. I'm pretty sure it doesn't, but theoretically what I heard from multiple people might only have been for the other chip (I'm not sure about the codenames, but I assume AP20 vs T20).

    0.45-0.55mm² depending on the target frequency etc. if I based myself on a recent ARM presentation with a Osprey floorplan that I found a few weeks ago - and yes, that's not very big, it seems to have gotten strangely more efficient since the A8 where I've seen figures as high as >3mm² on 65nm. We're literally talking 2mm² on 40nm for a full implementation versus 9mm² for the A8 in OMAP3 on what is, frankly, a surprisingly low-density 65nm process (but still!)

    I really need to poke around a bit to figure out how such a massive gap is possible. Maybe the new NEON, unlike the old one, has many fewer truly separate paths (i.e. multi-purpose ALUs versus many specializes ones), which on a low-leakage process would make it a bit less power efficient but given the huge area savings would clearly be worth the cost.
    It's not a cache per-se AFAIK; think of it more as a buffer. I've always speculated that it's dual-mode: it either owns Hier-Z info for a tile or, if the compression ratio is good enough, the entire Z data which would allow the chip to not even *write* to external memory. That's just how I'd use such a buffer, which probably means it doesn't work that way at all though :D
    There is still this bizarre misconception that Tim Sweeney's claims are anything more than marketing... Apparently this is a 240MHz 2xTMU chip; that's less raw texrate than a GeForce2 MX! (2x2 TMUs @ 175MHz) - which also had 2.7GB/s bandwidth for the top SKU, but with no framebuffer compression or other bandwidth saving techniques.
    Wow Ail, I guess you answered yourself there... ;) I really don't know how you got to 10mm² for the GPU part. There's very little to base any estimate there except our own subjectivity. My own biases led me to three possible partitionings for the GPU, with die size going from 3.5 to 6mm² iirc. 10mm² really seems like a massive stretch to me!

    I suspect the less efficient blocks are video decode & encode, personally. Plus there's some weird I/O on the chip (e.g. vertical bar on the left) that I can't understand the meaning of at all, and I'm sure some of the things like LVDS and IDE didn't scale from 65 to 40nm - both of which are on the chopping block for the smaller chip. We'll see how that one turns out area-wise, since it'll be the high-volume part for phones in theory.
    28LP/LPT has had advanced test chips taped-out from lead partners (i.e. Qualcomm, NVIDIA, etc.) and should have real tape-outs in Q2 or Q3.
    Given that it's in some ways more smartphone-like than OMAP4 (e.g. 32-bit memory vs 64-bit), good to know that OMAP4 won't compete in that market either then ;) More seriously, it's obviously a flagship product for phones, which will attract various design wins but maybe not that much volume. The hope is probably that customers that have experience with their driver/software stack decide to use the smaller chip on more mainstream models.

    There's also the fact that baseband/connectivity/touchscreen/etc. costs have gone down quite a bit, so for a given price segment it's possible to afford a more expensive application processor. In terms of power consumption, there's little reason for this to take a lot more power for doing *the same task* (i.e. VGA video decoding) than a much smaller chip. Certainly the peak power would be high, but that's not unprecedented; OMAP3 HD video encode via DSP+A8, anyone?
    Only a complete fool would have believed that NV would put an entire GPU architecture to the trash after a single generation; this would be unprecedented in the company's entire history, if not the industry's history. We both know that's not a fair basis to judge things on... :)
    Okay, I think that's a very good point, and my understanding of what the answer is should also answer a lot of things.

    NV's strategy is velocity, rapidly moving from generation to generation. That means they don't feel they need to be exceedingly future-proof; they can be just in time (no pun intended!) - so the answer (assuming once again that they do not have NEON in Tegra2 as I'm pretty sure but not entirely certain is the case) is that they will have CUDA for their next-gen embedded GPU arch, probably for Tegra3. Obviously they'd much rather push to devs than NEON given their background.

    Of course there's the problem that design win cycles are long and customers don't change their phones every month; so during the chip's lifecycle in users' hands, it's likely to become a limitation. Probably they believe it'll remain a niche thing and practically nobody will realize anything so it doesn't matter, and clearly many people here don't believe that. Both positions are a bit too extreme to my tastes, but heh.
    I think even IMG would gladly set you straight on the GPU not being useful for web browsing. Of course, if you literally meant 'necessary', I guess maybe you don't need more than an old microcontroller... :twisted:
    Intriguing, I don't know. My guess is it wouldn't save much given that the GPU is so close to the CPU on a SoC, especially on the newer ones (Tegra1 was the first) that have a L2 cache controller that allows you to bypass external memory to send GPU commands. On SGX the context switch would be very cheap, on Tegra probably not so much although I don't know the specifics. Either way I doubt it'd be worth the software complexity & possible bugs.
    Nice examples, cheers. The standard C lib functions is something I don't remember as often as I should! For image decode, I guess on the web that's mostly jpeg & png. The former has a dedicated block on any modern SoC, the latter is mostly zlib I think; out of curiosity, is there actually any worthwhile SIMD implementation of that out there? Also I'm curious what you mean by 'network stack' - I assume processing the IP packets themselves? Obviously the entire MAC for not only 3G but also WiFi (unlike on PCs) is done in the connectivity chips for handhelds/smartbooks so I assume you meant IP, is that really still significant nowadays? (I'd assume not but I'm genuinely ignorant)

    Also photo stitching and pixman are two nice/worthwhile ideas, cheers Simon/Laurent.
    The HTC thing is a fair bit more complicated than what even some industry insiders seem to believe; either way, hopefully Qualcomm GPU drivers will improve now that they've fully integrated the AMD Handheld GPU team.
    32 SP Fermi clearly seems too optimistic to me, but we'll see!
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,416
    Likes Received:
    178
    Location:
    Chania
    Intel's GM500 has problematic DX 3D drivers, but to be fair it's probably both Intel's as IMG's fault that they haven't catered for better drivers. Indications whether they could be better or not here:

    http://www.mitrax.de/?cont=artikel&aid=36&page=1

    With decent drivers it should at least be in the KYRO performance ballpark.

    http://beagleboard.org/ Just an example.

    NVIDIA has been "pushed out" by AMD/INTEL those platforms, but as a SoC manufacturer they're still competing against Intel, Samsung, Renesas+NEC etc and I figure probably in the longrun against AMD also (Ontario?).

    Ok that one cracked me up. You're too damn optimistic. Make that a g9x class grandchild and we'll get an agreement and there I don't see more than 8SPs but that's just me. Take the Fermi die size and divide it by 16 (which is rather dumb but it helps getting an idea) what each cluster accounts for.

    32SPs will be Fermi's future ION replacements and I think you're quite confusing the markets Tegra and ION are supposed to address. And that's not even tomorrow either but with future manufacturing processes.

    More than a handful wishful thinking area and power wise. TDP for high end GF100 should be just one notch below that of a 5970. The only thing a hypothetical single cluster of GF100 would do in the 2010/11 in the embedded market is break every imaginable record in terms of power consumption.

    And just to give you an idea (because I'm afraid many of you still haven't realized how "low" even theoretical figures are for GPUs like in Tegra2, the maximum theoretical GFLOP count from the PS ALUs should be a tad south of 9 GFLOPs. And while FLOPs are another quite meaningless measurement since there are vast differences for floating point throughput on different architecture, that's still a single digit value.

    No those GPUs will scale into the desktop and some sort of magic wands will turn the drivers to cover all possible shortcomings of the hardware. Irrelevant of graphics developer performance is tuned for ultra low resolutions and adapted to those. It simply isn't possible to squeeze more into 5-10 square millimeters for "high end" embedded GPUs than they already have.
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,416
    Likes Received:
    178
    Location:
    Chania
    I hardly grasp things out of thin air especially since I'm very unsure in estimating from die shots. I had some help but yes that someone could be wrong too. For the time being (and considering there are other alike mysteries with die area and Intel's stuff we talked about) I'll leave it blank and on a completely OT note I may or may not owe you a big apology for Fermi and you know that I don't have a hard time either apologizing for my mistakes. In any case if yes and considering the flak I gave you it'll end up into a phat apology :p

    Well their stuff on their homepage seems still quite vague (Tegra2 = Tegra 250). I am expecting that they might differentiate themselves just as with T1 for APX and 650/600.

    *ahem* http://www.imgtec.com/News/Release/index.asp?NewsID=498
    http://www.imgtec.com/News/Release/index.asp?NewsID=496
     
  15. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Fair enough, this stuff certainly isn't easy to estimate - well unless you have a chip and a Very-Precise-Thermometer(TM) at your disposal! :)

    Hehe. Well given how many things I got wrong in my pre-release/announcement speculation (including a few non-speculation bits actually) for Fermi... don't make too big a deal out of it since it'll just make me look bad too ;) jk

    Oh yeah, I was responding in terms of the different chips, but I'm sure there will be different SKUs for this chip too... just like I expect the main OMAP4 SKU for smartphones will be the OMAP4430 at 720MHz, not the 1GHz OMAP4440. NV seems to claim there's still some headroom so there could be a low-volume even higher-clocked SKU down the road (like TI eventually released the OMAP3440) so hopefully the gap between the main tablet and phone SKUs won't be too massive.

    Hehe, yeah, forgot that for a second :D But those are hardly 'old microcontrollers' are they? *grins* Either way IMG does claim SGX can help a lot for the rendering part of web browsing, especially making things like scrolling smoother. The same claims are made by NV, ARM (for Mali), etc. - so it's hardly a controversial thing to say as I'm sure you realize.
     
  16. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Lot's of answers, lots to ask.

    For a dual core, NEON area is ~1 mm2. :shock: Why bother removing this from the chip that will likely be used in gaming oriented devices? If indeed it is so small, I'd expect it to be there in T3.

    Do you expect the video decode/encode blocks to take less fraction of area in T3 than T2? T2 pretty much covers the video use cases, so I am expecting an area budget bump for the rest of the blocks next-gen.

    BTW, since a dual core A9 + 1MB L2 only costs ~8mm2 on 40 nm, I am fully expecting a few of these babies in GF200. When your area budget is ~550mm2, why not slap these things in to avoid the memory roundtrip across PCIe bus? At 2GHz, a dual core A9 with 4MB L2 is going to cost <10mm2 on 28nm and will remove the final argument against gpu chips being very inefficient at control-oriented and branchy tasks? Also the driver latency will go way down.

    CUDA on mobile phones, interesting end exciting.:wink:
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,416
    Likes Received:
    178
    Location:
    Chania
    No they aren't of course old microcontrollers and while the META processor isn't a recent invention of IMG, they only recently decided to market it outside the so far domains. If you cut out the heart of a SGX (ie the USSE) and leave the surrounding enchilada out (yes oversimplyfied) what you'll have in the end is a general purpose processor or else a META.

    If you have in a SoC already a GPU such a co-processor like the META will be redundant. If you have a device though that concentrates exclusively on web browing, media player or even a camera (and there's no fancy 3D GUI on the device's screen that is) then you might be better off with something far more simplistic and cheaper like a GPP.
     
  18. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Ok I guess I overestimated performance of this thing slightly. Still, those 2.7GB/s are shared between cpu and gpu - assuming the chip has a theoretical pixel fill rate of 2 pixels / clock (?) it would still be enough for full rate including blending (at even 4 bytes per pixel). Not including z reads/writes but if it really has advanced compression schemes (and that large L2 might help there) I guess it shouldn't be too bad. Textures will be compressed almost always I guess too so don't consume that much bandwidth neither.
    NV1?
     
  19. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,416
    Likes Received:
    178
    Location:
    Chania
    It even turns my guts if TC gets skipped on the PC; a gazillion times more on an embedded device where memory amount is so limited.

    You may excuse the exaggeration: but they weren't the giant back then they are today addressing that many markets but something more in the direction of a couple of friends glueing together a GPU in a garage :lol:
     
  20. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    Are you including that 256KB of RAM in that area? It seems to pretty integral to their performance so I don't think you can seperate it from the GPU core itself.

    John.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...