NVIDIA Tegra Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by french toast, Jan 17, 2012.

Tags:
  1. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    They're already ARM. The internal instruction format they use isn't really any more material to the CPU architecture than the format of the uops in Intel's uop cache. Besides, they can already decode and execute ARM code without caching it as something else first.
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    If they'd use only custom CPU cores based on ARM ISA (Denver-whatever) in their SoCs then I would understand better the point of going through the quite high added cost for a custom core. Apple has its own custom cores based on ARM ISA, however they use them exclusively in their SoCs (no mix & match with garden variety ARM CPU IP) and they also have their own OS. I'm not sure, but I've heard that Qualcomm might plan to abandon KRYO down the line again....I wouldn't be in the least surprised if that's true....

    For the record (and that's only hearsay) the original Denver core supposedly could not get ISO26262 & ASIL certifications, hence the former question if the most important changes for Denver2 are in that direction.

    Overall I agree with you: there's no SOUND reason for NV to have a unique custom CPU core if they're going to use it only for the minority of demanding CPU tasks and since there's no unique underlying software platform present either. And before someone says it: yes NV's diagrams show Parker to be roughly 35% ahead in efficiency in one synthetic benchmark against other SoCs, but fortunately for marketing they didn't have to include future SoCs that will appear in devices at the same time Parker will in automotive implementations.

    http://www.anandtech.com/show/10596/hot-chips-2016-nvidia-discloses-tegra-parker-details

    I don't see why HMP shouldn't work as advertised, but I can also understand the scepticism.

    Good news: the GPU actually clocks at 1.5GHz, meaning =/>1.5 TFLOPs FP16 or =/>750 GFLOPs FP32.

    Same writeup at anandtech for the GPU efficiency:

     
    #3762 Ailuros, Aug 26, 2016
    Last edited: Aug 26, 2016
  3. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    Denver 2 should be a less expensive product for Nvidia in ARM licensing. Also it should not have cost that much to port Denver 1 from 20nm.

    The DCO of Denver 2 should really work well with the static code base that would be in automotive.
     
    #3763 A1xLLcqAgt0qc2RyMz0y, Aug 26, 2016
    Last edited: Aug 26, 2016
  4. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,742
    Likes Received:
    152
    Eh..... It is more like "they're already ARM and anything else they could potentially want to be"...

    Yes, and performance goes down when it does that. I mean if the ARM decoder was already optimal in the first place then....

    Well, if you ignore R&D costs maybe....
     
    #3764 ninelven, Aug 26, 2016
    Last edited: Aug 26, 2016
  5. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    A couple things:

    1) The ARM decoders are an important part of the design and can't just target something else.
    2) They almost certainly steered many aspects of the design to allow as lightweight transformation to ARM as possible.

    For example, there are a lot of features in NEON that are not in various other SIMD instruction sets that they absolutely need to support at a low level in order to provide a sane mapping.

    This is the difference between binary translation targeting some random other ISA and designing a translation platform with one architecture in mind. Which is why something like Houdini tanks performance vs native code, while Denver can at least pull off respectable performance most of the time. I know they're said to have targeted x86 originally but they a lot of time to redesign things to be better suited for ARM. If they tried to target something else now it wouldn't work very well, even ignoring the lack of decoders.

    The whole translation design is integral to many parts of the uarch, there are good reasons why they don't just run ARM. You can't really extract that much ILP from an in-order uarch without doing run-time translation that heavily reorders and renames the code, while also providing low-cost speculation and assertions. It also helps that they can expose the uarch more in the ISA and evolve it with time.
     
  6. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,742
    Likes Received:
    152
    1) ????

    2) Ok.... but real-world performance results are real-world performance results

    I understand that, and that was more or less what I was getting at. If they want to do a custom core, fine. But just go ARM only and OoO. I mean I'm sure that DenverX is very good and perhaps unbeatable some of the time, but that is generally not what one wants or needs out of a CPU. Basically, you want the highest IPC + lowest latency arch possible. And Denver (in its current form) gives you that sometimes.
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Must be the reason why they hired additional engineers for Denver2 development.

    --------------------------------------------------------------------------------------------------------------------------------

    Exophase,

    That's all probably true for Parker/PX2/Denver2 and onwards only; if the story for Denver1 failing certain certifications is true, the Denver cores won't be used in any Tegra K1 64bit automotive module for probably more than infotainment. Bears the question really if Denver is really a necessity today for automotive needs or if they've just kept on investing in it because they had no other choice. Because if anyone tries to convince me that Denver was only meant for automotive from the get to, don't expect me to believe it. If the story of Denver1 failing certifications is even true, then it was originally meant for anything BUT automotive.

    That said I would figure (for which I'd like to stand corrected) that margins for Parker or Parker + Pascal modules should be as high to justify the investment in a custom CPU core. I don't know what they're selling the 2+2 module at, but I can easily imagine it's a quite obscene price per module.

    ---------------------------------------------------------------------------------------------------------------------------------

    OT but since I just read it, a few interesting indicative perf/W figures per SoC and the according manufacturing process used:

    http://www.anandtech.com/show/10545/the-meizu-pro-6-review/7
     
    #3767 Ailuros, Aug 26, 2016
    Last edited: Aug 26, 2016
  8. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Basically what you're saying is their approach failed, scrap everything and start over doing a core that's more like the other custom ARM cores.

    But I don't think the verdict is really out yet as to whether or not their approach makes sense. There isn't good data on what the actual power consumption is like, and we don't really know how far they can improve what they have. I would say that as far as performance is concerned Denver (in the like, single device we got it in) did very well almost all of the time. When it didn't do as well it still did pretty good. It's certainly not the only custom ARM core you could describe this way, I would say it tended to do better than Krait which didn't do so hot on a variety of things.

    Most benches Denver didn't do as well on were well threaded, so it suffered from having only two cores. The only single threaded one it did badly on that I can remember was Sunspider, but that's a useless set of microbenches.
     
  9. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    Why the ninja edit on my response:

    See I did NOT ignore R&D just stated it would be minimal.
     
  10. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    I see you forget to include a link.
     
  11. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    Negative conclusion jumping seems to be rampart here.

    And as you state "no data" to back it up.
     
  12. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,742
    Likes Received:
    152
    What the actual fuck? No. I did not call Denver 1 a failure. Please do not put words in my mouth or engage in strawmen. If you do that, I will not have a conversation with you.

    For the record, I think Denver is/was pretty good. Not outstanding, but certainly not bad. In fact, it is/was pretty unique and a remarkable achievement, IMHO. It is certainly no P4 or Bulldozer. Maybe a good comparison would be Core2. But as good as Core2 was Nehalem was that much better. Usually in the tech industry you are either moving forward or being left behind. And in business in general, the bottom line comes before ideals or else you won't be in business very long to practice your ideals.

    If Nvidia is able to improve upon Denver and make an in-order CPU competitive or even better than out-of-order ones, then great. But would you really bet on that being the case? Zen is coming, whateverLake from Intel is coming, super-mega-typhoon from Apple is coming, the A72 is already pretty damn solid and I doubt ARM is going to stop there. Winter is coming. And I don't have faith in in-order to keep me warm. But that this just my opinion. You don't have to like it or agree with it.
     
  13. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    You said that they need to start executing ARM directly all of the time (what I guess you must mean by "ARM only") and go OoO.

    You might not be aware of this but that means a total scrapping of the design of they have and practically starting over from scratch. It's nothing like the difference between Core 2 and Nehalem, it is not such an evolutionary, incremental change. It is closer to the difference between Netburst and Core 2, although I'd say probably greater. That was a total scrapping of a uarch family and probably not something they should do for anything less than a failed approach - something I would also say applied to Netburst.

    So while you weren't actually saying that they need to scrap the design because it was failed, those are the implications of your recommendations, hence why I said it's basically or effectively what you're saying.
     
  14. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,742
    Likes Received:
    152
    And you continue to be condescending, and repeat the same strawman. We are done. Bye.
     
  15. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I don't need one. You don't have any obligation to believe others either; you wouldn't anyway.

    --------------------------------------------------------------------------------------------------------------------------

    On the flipside of things irrelevant if Denver is a masterpiece or just another garden variety design on its own merits, it should be noted that Tegra X1 was already qutie a bit ahead many competing solutions for automotive, so there's really no BIG need to over-design anything, NVIDIA has the luxury now to even slow down its roadmap for Tegra and create it less aggressive, because updates aren't as frequent or cut throat as in the consumer markets.

    Need more FLOPs? Add another SoC oin the module or go up to 4 chips from which two dedicated GPU chips. NVIDIA is well positioned for the high end automotive market and that thanks to its GPU IP primarily; the competitiveness of their solutions doesn't change dramatically whether they've "just" A57 cores in there or Denver cores on top of those.
     
    #3775 Ailuros, Aug 28, 2016
    Last edited: Aug 28, 2016
  16. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    No I didn't make anything up, but as I said you are free to believe anything you want, always within your highly biased perspective. Just because I make things up if my phantasy serves well Parker doubles Manhattan 3.0 and TRex scores compared to X1 and while in geekbench its by >20% ahead of the Exynos8 for single threading, tables turn by a much higher persentage for the latter for multicore.
     
    #3777 Ailuros, Aug 29, 2016
    Last edited: Aug 29, 2016
  18. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
  19. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I might be wrong, but given the data provided by the author it sounds more like a Shield TV successor.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...