NVIDIA Tegra Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by french toast, Jan 17, 2012.

Tags:
  1. xpea

    xpea Regular

    Good article at eetimes:
    http://www.eetimes.com/document.asp?doc_id=1331727
    this chart particularly is very impressive. Well done Nvidia !
    [​IMG]
     
    Florin and pharma like this.
  2. pharma

    pharma Veteran

    NVIDIA Gives Xavier Status Update ....
    https://www.anandtech.com/show/1187...orrt-3-announcement-at-gtc-china-2017-keynote
     
    Last edited: Sep 27, 2017
  3. wco81

    wco81 Legend

    pharma likes this.
  4. Picao84

    Picao84 Veteran

  5. OlegSH

    OlegSH Regular

    With 2700 SPECint2K score, Carmel is a 50% performance uplift from Denver in Nexus9 (based on these results).
    Carmel is also up to 43% wider with 10-wide architecture, in comparison, Denver is capable of 7+ instructions per cycle.
    30 claimed DL TOPS are 20 GPU's TOPS combined with 10 DLA's int8 TOPS.
    GPU frequency is 1300 MHz (based on FP32 CUDA flops and 512 CC).
    With 1.3 GHz frequency and GV100's tensor cores configuration (8 TCs per SM), it would result into 10.6 teraflops.
    For int8 TOPS, TCs throughput might be doubled to ~21.2 TOPS with GV100's register file bandwidth.
     
    Laurent06 and pharma like this.
  6. OlegSH

    OlegSH Regular

    It emulates ARM instructions in the same way as every other ARM processor does - by decoding them into internal µ-ops format. There are 2 hardware decoders in Denver to convert ARM instructions into µops.
     
  7. Picao84

    Picao84 Veteran

    But wasn't there something widely different in Denver compared to other ARM CPUs?
     
  8. OlegSH

    OlegSH Regular

    Instead of using OoO µops scheduling, Denver relies upon software optimization layer (Dynamic Code Optimization), which monitors perf counters and performs instructions reordering, loops unrolling, registers renaming and so on for frequently used "hot" parts of code, then it saves optimized µops code into RAM for reuse. More info on DCO can be found here.
     
    Grall and Picao84 like this.
  9. wco81

    wco81 Legend

    So what would this be for? They're pretty much out of the mobile devices market aren't they?

    Self-driving cars or auto manufacturers trying to develop SDCs?

    Or maybe Nintendo Switch 2?
     
  10. Entropy

    Entropy Veteran

    At 350 mm2 and 30W on what tsmc calls 12nm process, hardly.
    At 7nm, well, it might not be impossible. At 7nm with EUV or 5nm, it could well be possible. However, with the sales volumes of the Switch, a custom SoC may be the better and no longer as risky option.
     
  11. Laurent06

    Laurent06 Veteran

    It's a 60% uplift. Quite a nice speedup.
     
  12. OlegSH

    OlegSH Regular

    Yep, CES2018 presentation deck and other materials explain this pretty clearly
    This CES2018 presentation deck also contains some pretty cool Xavier die shot (thankfully, not artistic this time:-D)

    Yep, somehow miscalculated this :oops:
     
    Laurent06 likes this.
  13. 3dilettante

    3dilettante Legend Alpha

    Architecturally the processor's behavior is externally different with respect to the hardware and optimization software.
    For example, the uops written out in the optimized code are a VLIW format that is not 1:1 to the internal format. A simpler decoder than is present for the ARM instructions is needed to expand the in-memory format, which multiplexes fields and lacks signals for which unit will execute a uop for the purposes of code density.
    The pipeline is also skewed so that fetched bundle can contain a read-operation-write chain in one bundle, which at least from an ISA level is not generally matched with standard cores outside of specific RMW instruction.

    The skewing itself is not entirely without precedent, although I'm not aware if any ARM microarchitectures chose to skew sufficiently for both source memory reads and destination writes. More significant is how the architecture commits results to the architectural state and to memory in a variable-length "transaction" at the granularity of a whole optimized sub-routine, which neither ARM's ISA or uops permit.

    That transactional nature is something that I'm curious about in light of the recent Meltdown and Spectre disclosures, since Denver's architecture has an aggressive "runahead" mode for speculating data cache loads and an undisclosed method for tracking and unwinding speculative writes in the shadow of a cache miss. Per Nvidia circa 2014, the philosophy was to load whatever it could then invalidate any speculative operations and queued writes, thus specifically relying on cache side effects to carry over from a speculative path.
    Also unclear is how Denver tracked its writes, since its transactional memory method might have meant an in-core write buffer, or potentially updates to the L1 cache that could be undone. The latter case might mean additional side effects.

    The prefetch path alone seems like it could be susceptible to a Spectre variant, and even if the optimizer were changed to do more safeguards there's a lag before it is invoked for cold code.
    Denver was also quoted as having the full set of typical direct, indirect, and call branch prediction hardware that could be leveraged for Spectre variant 2.
    Meltdown might depend on what's effectively a judgement call for when permissions are checked and transactions are aborted, and whether the pipeline's speculative writes affect the L1 in a way other ARM cores wouldn't.

    Unfortunately, I think the one Nexus device that used Denver aged out of security updates just shy of finding out what, if any mitigations might be needed.
    According to the following, at least some of the above apply to Xavier.
    https://www.morningstar.com/news/ma...ystem-is-also-affected-by-security-flaws.html
     
  14. Xavier seems like an even greater departure from a gaming-oriented SoC than Parker.
    The GPU should be less than 100mm^2, so those CPU cores must be THICC.
    Did they take away the 2*FP16 capabilities of the previous iGPUs, since they now have the tensor units for deep learning?

    I also wonder why the PX Pegasus needs two Xavier SoCs. One would think the souped up CPU and I/O from Xavier would enable it to drive two dedicated GPUs.


    One thing we can count on is that Xavier is never going into a Switch 2, ever.
     
    Picao84 likes this.
  15. Laurent06

    Laurent06 Veteran

    Perhaps to enable some lockstep mode or some variant of it?
     
  16. CSI PC

    CSI PC Veteran

    I thought it was with regards to how they split/balance the functionality and HW between both SoCs, which also includes the two additional GPUs for the full tier product.
     
  17. Picao84

    Picao84 Veteran

    At this point a Switch 2 will for sure include a real custom SoC. I bet nvidia is working on it as we speak, even if at only stages of defining requirements. The Switch sucess in sales and finally in third parties getting involved, all but assures a second iteration.

    Unless, of course, nvidia deems such an endeavour not worthy and / or Nintendo unwillingness to pay as much as nvidia wants for it. Tegra X1 was an existing part after all, so R&D had been done and they needed to cover the cost. Will Nintendo pay nvidia for a custom SoC? Remains to be seen.
     
  18. Absolutely. Would you really want to be in an autonomous vehicle that could have one part of the hardware fail and end up dead.
     
    Grall likes this.
  19. wco81

    wco81 Legend

    Probably one of the coolest application of these SOCs. It uses a Jetson with 4 GB RAM:

    https://www.skydio.com/technology/

    Overpriced for what they're asking for it. They use a tiny sensor so while you can probably recreate that Star Wars race in the forest sequence with this thing, the video won't be cinematic quality.

    But they use all the AI buzzwords so some people might bite.
     
    pharma likes this.
  20. Pressure

    Pressure Veteran

    Shame it doesn't feature a 3-axis gimbal, that would really give it an edge. Although the fact that it can follow someone without something strapped to them is really nice. Featurewise it looks great.
     
    Last edited: Mar 26, 2018
Loading...

Share This Page

Loading...