NVIDIA Tegra Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by french toast, Jan 17, 2012.

Tags:
  1. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
  2. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    I was not criticising you. It was merely an observation following my point that Android is a terrible gaming platform apart from casual titles, since there is not enough money to be made there. No one wants to pay full price for a mobile game be it a port or not.
     
  3. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    My guts have me thinking that they will actually spend a lot more silicon on CPU and caches than the GPU cores. I supect 512 cores organized as in the GP100 (SM of 64 cores hence 8 SM with full FP16 support).
    I find interesting that Nvidia speak of custom CPU and cores without any reference to Denver. Knowing Nvidia standard practicies, I suspect the "Custom ARM" refers to the SIMD units of the CPU they re going to use.
    I suspect Nvidia will go with either A72 or A73 backed with custom SIMD units (to match the proprietary API, software, etc.). I suspect they will spent lots of silicon on the L2 and L3 and on the GPU register files an cache too.
     
    iMacmatician likes this.
  4. And/Or the GPU proportion of the new SoC simply got smaller, like what happened with Parker. And/Or this particular version of Volta has additional hardware exclusively dedicated to INT8 operations (akin to PowerVR's FP16 units).
    If the Denver cores were huge, Denver's successors are probably pretty big on transistor count, too.

    News of Google's TPUs may have put a lot of pressure on getting dedicated hardware for neural networks. Repurposing ALUs that were originally made for floating point calculations may just not be competitive enough.


    If that was the case, I don't think they would call it "custom cores".
     
  5. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    Well if Nvidia sticks to Denver all the better as the CPU space is growing boring, diversity in approch keeps geeks entertained :)
     
  6. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    https://blogs.nvidia.com/blog/2016/09/28/xavier

    This is a pretty crazy chip:
    • 7 billion transistors
    • 512 cores
    • 20 Tera ops
    • 16nm FF
    • 20 watts
    • Due end of 2017
    NVIDIA claims this one chip backs all the power of a Drive PX 2 computer (2 x Parker + 2 x Parker).

    I haven't figured out how this is possible, given that this is still on the 16nm process.

    The power is a mystery. The GTX 1080 @ 7B transistors is 180 watts. Xavier is same number of transistors at 20 watts. I assume the latter uses LP process. But can it make that much difference?

    As for perf—there's no sane way to get to 20 TOPS based on the existing arch. It would take 512 cores clocked at 5 GHz + INT8 to get there. But that's obviously absurd. Best guess is the computer vision accelerator has some kind of programmable low cost INT8 units that boosts performance.

    Thoughts?
     
    #3846 JF_Aidan_Pryde, Sep 28, 2016
    Last edited: Sep 28, 2016
  7. Tegra is being discussed here:

    https://forum.beyond3d.com/posts/1945831/

    Just some quick tidbits:

    - Number of transistors alone doesn't dictate power consumption. Skylake Y is probably around 1.5B transistors and it has a 4.5W TDP.
    - INT8 throughput in Xavier may not be entirely done on the GPU's "CUDA cores". In fact, there's a good chance they're not, since the same presentation said the SoC would have a GPU with 512 cores.
     
    JF_Aidan_Pryde likes this.
  8. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    yeah, the majority of those TOPS don't likely come from the normal shader (or arm) cores. Do we have any kind of TOPS or Watt rating for the Google TPU? (which obviously lacks the more general cores, but..)
     
  9. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    Some indirect measurements of Google's TPU from Google's paper on neural machine translation.
    They note that because of workload mix / CPU-GPU transfer not being optimal, GPU is not performing optimally in these measurements.
     

    Attached Files:

  10. itaru

    Newcomer

    Joined:
    May 27, 2007
    Messages:
    156
    Likes Received:
    15
    Xavier has CVA(Computer Vision Accelerator).
    Maybe DL 20TOPs is spec of CVA.

    maybe CVA is like a Eyeriss.
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    OT but I could imagine that Series7XT Plus https://imgtec.com/blog/powervr-series7xt-plus-gpus-advanced-graphics-computer-vision/ has additional dedicated INT logic. Everything else before that was capable only of INT32; the Plus IP cores expand to up to 4* INT8 per INT32. That's why I asked Ryan why he thinks that 20 TOPs in a 20W power portofolio would be impossible. The pipelines would just need to be wide enough to reach a high enough throughput and yes I'd also consider it possible that other blocks of the SoC like the CVA mentioned above might contribute to those 20 TOPs.

    Either way and even apart from the INT pipeline I'd expect Volta ALUs to be significantly wider than we've seen so far in green architectures.
     
  12. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,894
    Likes Received:
    4,548

    https://techcrunch.com/2016/09/28/nvidias-new-xavier-soc-is-an-ai-supercomputer-for-cars/
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I find that last sentence particularly amusing from the author:

    Not really....but hey whatever floats anyones....errr autonomous boat.... *cough*
     
    #3853 Ailuros, Sep 29, 2016
    Last edited: Sep 29, 2016
    Picao84 and pharma like this.
  14. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    I would think that the Neural Net computation is done with a new kind of special function block (that can be optionally added to a core)
    There is a lot of efficiency to be gained compared to doing dot products via registers, the neuron inputs and accumulated values can be kept internally in those units reducing the amount of moved data and thus power consumption.
     
  15. itaru

    Newcomer

    Joined:
    May 27, 2007
    Messages:
    156
    Likes Received:
    15
    #3855 itaru, Sep 29, 2016
    Last edited: Sep 29, 2016
    xpea likes this.
  16. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    Given that its going to be in mass production only in 2018, I'm surprised that Xavier is not on 10nm. I was also expecting Nvidia to use some ARM R52 cores as well but looks like they have certified the Denver cores for ISO 26262.
    Its a huge market. The potential revenues from it could far exceed those from the GPU market.

    ModEdit: Irrelevant bits removed & copied to spin-off
     
    #3856 Erinyes, Sep 29, 2016
    Last edited by a moderator: Sep 30, 2016
  17. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    552
    Likes Received:
    787
    Location:
    EU-China
    Bingo ! That's a very good assumption. I don't see how you gain 4 times the power efficiency at the same node without a totally different uarch. Especially since we are talking about a very specific kind of mathematical problem. GPUs generic ALUs are not the most efficient to solve this computation need. This Eyeriss accelerator (or co-processor) is the only way to keep competitive against the dedicated deep learning ASICs that are under development. And it also proves how much Nvidia wants this market ...
     
  18. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    How quickly you seem to have forgotten Maxwell.

    Maxwell was on the same 28nm process as Kepler yet made vast uarch improvements.
     
  19. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,516
    Likes Received:
    24,424
    He hasn't forgotten Maxwell. It sounds like you're in agreement with the second part of his sentence.
     
  20. cheapchips

    Veteran

    Joined:
    Feb 23, 2013
    Messages:
    2,493
    Likes Received:
    2,665
    Location:
    UK
    Is it realistic to expect Maxwell level gains again? You surely don't get to make efficiency/power optimisations on that scale twice?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...