NVIDIA Tegra Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by french toast, Jan 17, 2012.

Tags:
  1. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    With TK1 as is, they have something that nobody else has. A TK0.5 with dumbed down CPU and GPU would be an also ran among many others where you compete only on price. How would that be more successful?
     
  2. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    2,003
    Likes Received:
    1,053
    And what exactly did they achieve in the mobile phone market by having "something that no else has"? Almost nothing. As far as I know, nVIDIA is retreating completely from mobile phones precisely because they cannot compete on price with TK1 :roll: Competition on the mainstream mobile phone market is done through pricing mostly, features are barely relevant there, excluding niches like photography.
     
  3. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    The mainstream "smartphone" market has commoditized to the point where low pricing for SoC+LTE modem is more important than anything else. It makes zero sense for NVIDIA to pursue this and devote engineering resources to this. Note that Tegra K1 is expected to find it's way into some high smartphones. On the other hand, the "tablet" market has not commoditized in the same way, nor is LTE modem needed or desired for base tablet models. The Xiaomi Mi Pad alone will sell millions of units per year, and I'm sure there are various other resourceful companies that will use Tegra K1 too. In the markets and segments that Tegra K1 targets (ie. tablets, high end smartphones, micro gaming devices, automotive, high res monitors and TV's), the GK20A GPU performance and feature set is most certainly not overkill. In fact, the opposite is true, because it's stellar performance/feature set/perf. per watt is very desireable in these target areas. And having the Tegra GPU architecture in step with the main desktop architecture is huge because resources can be much better leveraged throughout the company. NVIDIA absolutely made the right decision here.
     
    #2503 ams, Jun 18, 2014
    Last edited by a moderator: Jun 18, 2014
  4. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    Did I deny that?

    With TK1 they have something to differentiate other than price. With TK0.5 they have nothing. So why bother if you don't want to participate in this race to the bottom?

    It's too early to see if TK1 will be popular in the mobile space, but at least it has a chance in the automobile space (though we'll only know 5 years from now.) With TK0.5 much less so.

    Clearly Nvidia hasn't found its profitable niche yet, but beating the dead horse of mid-end SOCs didn't work, so why would they continue to try?
     
  5. wco81

    Legend

    Joined:
    Mar 20, 2004
    Messages:
    6,694
    Likes Received:
    544
    Location:
    West Coast
    Really? iPhone and Samsung Galaxy and Note are not commoditized. These models move well over 100 million units a year.

    Of course the lower priced phones are commoditized but the sheer volume of the market dwarfs the tablet market.

    Meanwhile, tablets start from $99 up.
     
  6. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    You know the world doesn't spin around 3D only especially in that market; now make the same exersize again with something compute oriented for instance and try quirking around endlessly with weird compiler optimisations to keep half of those Vec4 ALUs in Malis busy, while GK20A due to its scalar ALUs will come as close as possible to its peak FLOP values.

    Besides it was just a case example; no IHV would be as idiotic to develop such a GPU as wide to clock it at just 150MHz under 28HPm. Likelier case would be a fraction of the 192SPs the GK20A right now has with an =/>700MHz frequency, which of course doesn't change a bit the above.

    For the record's sake Xiaomi seems to be experimenting with different frequencies since the results are constantly bouncing up and down; look at those here http://gfxbench.com/subtest_results_of_device.jsp?D=Xiaomi+MiPad&id=555&benchmark=gfx30

    At theoretical 150MHz GK20A gives 57.6 GFLOPs FP32; how many GFLOPs has the MaliT628MP6 in the Galaxy S5 exactly?

    http://community.arm.com/thread/5688 (no it isn't clocked at "just" 533MHz afaik)

    Melodramatic? How about you give me a viable persentage of how many smartphones with Exynos Samsung actually sells in total and how many with Qualcomm's SoCs. They're not using less S8xx SoCs lately but increasingly more and it sure must be because of their engineering marvels no one wants to have not even Samsung mobile.


    There's a comment from JohnH here in the forum about it and him as an experienced hw engineer I trust more than you, for reasons I hopefully don't have to explain.

    ***edit: to save you from searching

    http://forum.beyond3d.com/showpost.php?p=1834992&postcount=23
    http://forum.beyond3d.com/showpost.php?p=1835124&postcount=26

    Besides that I mentioned die area for the tessellation unit alone, which obviously flew completely over your head.

    Good for them. For windows I'll personally take a real windows machine with a proper desktop GPU (even low end) instead. The S805 doesn't sound like a solution I'd want for such a case.
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    Even if T4i would had been successful its GPU would had been criticised in the longrun for the lack of compute capabilities.

    You have to think that between taking the Aurora/T4 GPU change the entire thing to USC ALUs, give it extensive compute capabilities that OGL_ES3.1 and future relevant APIs would support etc etc and taking an existing design like Kepler and shrinking it down to ULP mobile SoC levels sounds times easier and cheaper to me. One could argue that they could had removed tessellation from the Kepler design, but I as a layman suspect that it's too "deep" integrated in the pipeline that removing it would cost more resources than leaving it be.

    Besides they won't have to change much at the featureset in future iterations and can freely mostly concentrate on efficiency increases; eventually there's no escape for everyone from DX11 either.
     
  8. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    2,003
    Likes Received:
    1,053
    Ailuros and silent_guy, my point is not around what TK1 is or should have been. My point is that they bet the house on one design per generation only (only exception being the T4i) all the while shooting for the roof. I'm probably simplifying it, but they used mostly off the shelf ARM designs if we don't count the companion core. Would it have been so hard to make a SoC without it, less beefier GPU, all the while still supporting Tegra enhanced titles?

    In any case they should have known from the start that phones are a low price market compared to desktop. What were they expecting? Mobile phones for 1K dollars/euros? Selling like hot cakes? The mobile phone market could never command the sort of prices nVidia is used to. It was blatantly foolish if they believed so. That's why I stand by my opinion that if they were expecting to make a dent in the market they needed a lower cost, more humble option. Don't blame me, I'm just poiting the obvious flaw in their initial strategy :p

    ams, of course it doesn't make sense now and they did the right thing in admitting defeat. I'm talking about years ago when they started Tegra. Their vision of the future was too optimistic for themselves and they under evaluated the market evolution.
     
  9. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    Keep in mind that the only area in ultra mobile that NVIDIA is not pursuing right now is mainstream smartphones (for a variety of reasons including price sensitivity, time to market for certifications, bundled modems available from competitors). Tegra's focus and breadth has shifted and evolved as the ultra mobile market has evolved. A very low cost "mainstream" SoC such as T4i was envisioned many years ago (and the existence of that SoC was likely due in part to Samsung and Apple absolutely dominating the higher end smartphone space), but that strategy just didn't pan out for the reasons mentioned above. Anyway, Tegra is about much more than simply smartphones, and Tegra as a whole is growing again after the lull with the T4 generation.

    And FWIW, just because Tegra K1 is an ambitious SoC design (with 32-bit Cortex A15 R3 and 64-bit fully custom Denver CPU variants to boot!) doesn't mean that it is not a cost-effective SoC design (remember that the SoC die size is no bigger than Snapdragon 800, and there is no additional cost associated with a built-in LTE modem, and the fabrication is done on a now mature and high yielding 28nm HPM process). The Xiaomi Mi Pad will be selling in China for less than $300 USD. Tegra is obviously not in any way targeted at $1000 USD devices (other than some super premium and limited edition 128GB Google 3D Dev Tablet with advanced sensors on board).

    Anyway, creating two or more totally different SoC's solely for use in the ultra mobile space is easier said than done, and is a very resource-intensive and time-intensive task that is only justifiable if there is a high probability that volumes will be consistently high.
     
    #2509 ams, Jun 18, 2014
    Last edited by a moderator: Jun 18, 2014
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    Yes it would had been possible, however I was and am amongst those that was striving all those years up to Logan for NV to concentrate more on the GPU side of the things.

    When they started with Tegra1 around half a decade ago they could have tried a Mediatek alike business scheme from the get go that much is true. And yes they might had been successful with it. Tegra4i was way too late for that, while in the meantime low cost SoC manufacturers like Rockchip, Allwinner, Mediatek and Samsung amongst others established themselves in the Chinese market first above all.

    Now the recepies for possible past success or a bunch of "might have beens" could be many, reality is that there and today they're still struggling for marketshare. If something like GK20A cannot turn any heads in any of the consumer markets then I'm not so sure they'll ever manage to gain any worthwhile marketshare there.

    Take the MiPad as just one example; if it ends up at =/>$240 as its rumored, then you have next generation tablet performance at a much more affordable price then from Apple or any other tablet vendor.

    By the way if part of your reasoning should also affect the GK20A frequencies: NV never stated that we'll get a 950 or 850MHz GPU in an ultra thin tablet form factor. Under conditionals even 950MHz should be reachable, in markets where extensive active cooling measures aren't taboo.
     
  11. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,492
    Likes Received:
    471
    You shouldn't assume QCOM added tessellation for phones. They thought WinRT would be more successful and were not happy when Microsoft released the Surface. Depending on the implementation supporting tessellation isn't a huge area adder. It's something, but a small percentage of the overall GPU.
     
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    Compared to DX10 or even DX9L3? I severely doubt that, especially for the latter case. Except for Qualcomm and NVIDIA I'm not aware of any other ULP mobile DX11 GPU shipping yet. Considering Qcom managed to give the Adreno420 barely 40% more arithmetic efficiency compared to Adreno330 I'd have to wonder where all the additional die area went for.

    Speaking of, Microsoft should skip the nonsense and lighten up a bit with whatever GPU requirements they have in mind for the future and the ULP SoC space; Tegras were able to make it into winRT devices with barely DX9L1 "only" after all. It's Microsoft that needs a ticket to establish itself in that market and not the other way around.

    That said windows mobile is an excellent mobile OS; tried it and like it as a sw platform. If it would have more extensive support from 3rd parties I wouldn't mind going for it at all.
     
  13. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,492
    Likes Received:
    471
    Compared to DX10 or even other DX11 features. If you don't have LDS and the ability to run sophisticated shaders you shouldn't bother with DX11 tessellation yet. A hardware tessellator is minimal additional area for a GPU greater than 50 mm^2
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    Where do we have any GPU blocks in SoCs yet that account for even as much as 50 square mms? And no I still can't believe that the difference between a DX10.0 and DX11.x ALU is anywhere close as being "miniscule".

    Or else IMG is simply lying that even "just" improved rounding support has a significant hw cost: http://blog.imgtec.com/powervr/powervr-gpu-the-mobile-architecture-for-compute

    I as a layman understand under "significant" at least 10%. Now if my original estimate of at least +50% from 10.0 to 11.x should be in your book "minimal" additional area so be it.
     
  15. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    I always thought that tessellation in DX11 was mostly a shader affair and that the pure tessellation hardware functionality is actually very minimal: you have all these hull and (name escapes me) shaders with some HW expander in the middle that creates indexes and some attributes. Basically, HW that may be quite tricky from a design point of view, but not in terms or area.
     
  16. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    And who says that hull & domain shaders and all the other logic you need for programmable tessellation are for free?
     
  17. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,492
    Likes Received:
    471
    If you support compute shaders the additional area for tessellation is a fraction of a percent. I was not referring to modifying the alus which is why I'm saying if you already support compute shaders. If you're talking about a 5 mm^2 GPU then anything can be considered significant.
     
  18. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    With most high end ULP SoCs ranging from 100 to 130mm2 how can GPU blocks in those be anywhere near 50mm2? Not that's it's impossible, but I wouldn't imagine that the worst case analogy for a GPU block is higher than 1/4th the entire SoC area estate even on K1.

    As for the rest I wish one of the ULP GPU hw engineers would clarify that story, but hardly likely since they'd reveal too much they probably shouldn't.
     
  19. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,492
    Likes Received:
    471
    I just threw 50 mm^2 out there thinking about SoCs that are over 100 mm^2. I'm not going to bother since I'm on a phone in an airport but if someone is interested in a comparison they can look at the ATI rv610 for an example of when ATI added a fixed function tessellator.

    There have been other AMD designs that are tablet sized which contain multiple tessellation blocks. It's hard to remove features one they've been added...

    Anyway my point in entering the conversation was to state my understanding of Qualcomms mindset in supporting DX11 as learned from talking to Qualcomm employees.
     
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    Not in the least relevant since it was never designed for a ULP SoC.

    That's what I told a few posts above Picao about the GK20A in K1.

    Well ask them then off the record what the hypothetical result would had looked like if they would had designed an Adreno330 successor only OGL_ES3.1 compliant but with the exact same die area as the Adreno420. Something tells me that the difference between those two wouldn't had just been at 40% arithmetic efficiency peak.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...