NVIDIA Tegra Architecture

Voxilla · Sep 28, 2016

Ryan Smith said:
Xaiver: Volta + custom ARM CPUs, 7B transistors, sampling in late 2017.

http://www.anandtech.com/show/10714/nvidia-teases-xavier-a-highperformance-arm-soc

These Volta cores must be hugely larger as the Pascal cores as for 7B transistors you get 512 for Volta vs 2560 for Pascal.

Picao84 · Sep 28, 2016

Ailuros said:
See the post above; give me one good reason why I should pay as much for as antiquated material. Give me something that I wouldn't consider paying for, even if the sum is diametrically higher and then I'll let you know why gaming never has picked up until now on any mobile platform.

I was not criticising you. It was merely an observation following my point that Android is a terrible gaming platform apart from casual titles, since there is not enough money to be made there. No one wants to pay full price for a mobile game be it a port or not.

liolio · Sep 28, 2016

Voxilla said:
These Volta cores must be hugely larger as the Pascal cores as for 7B transistors you get 512 for Volta vs 2560 for Pascal.

My guts have me thinking that they will actually spend a lot more silicon on CPU and caches than the GPU cores. I supect 512 cores organized as in the GP100 (SM of 64 cores hence 8 SM with full FP16 support).
I find interesting that Nvidia speak of custom CPU and cores without any reference to Denver. Knowing Nvidia standard practicies, I suspect the "Custom ARM" refers to the SIMD units of the CPU they re going to use.
I suspect Nvidia will go with either A72 or A73 backed with custom SIMD units (to match the proprietary API, software, etc.). I suspect they will spent lots of silicon on the L2 and L3 and on the GPU register files an cache too.

Deleted member 13524 · Sep 28, 2016

Voxilla said:
These Volta cores must be hugely larger as the Pascal cores as for 7B transistors you get 512 for Volta vs 2560 for Pascal.

And/Or the GPU proportion of the new SoC simply got smaller, like what happened with Parker. And/Or this particular version of Volta has additional hardware exclusively dedicated to INT8 operations (akin to PowerVR's FP16 units).
If the Denver cores were huge, Denver's successors are probably pretty big on transistor count, too.

News of Google's TPUs may have put a lot of pressure on getting dedicated hardware for neural networks. Repurposing ALUs that were originally made for floating point calculations may just not be competitive enough.

liolio said:
I suspect Nvidia will go with either A72 or A73 backed with custom SIMD units (to match the proprietary API, software, etc.).

If that was the case, I don't think they would call it "custom cores".

liolio · Sep 28, 2016

Well if Nvidia sticks to Denver all the better as the CPU space is growing boring, diversity in approch keeps geeks entertained

JF_Aidan_Pryde · Sep 28, 2016

https://blogs.nvidia.com/blog/2016/09/28/xavier

This is a pretty crazy chip:

7 billion transistors
512 cores
20 Tera ops
16nm FF
20 watts
Due end of 2017

NVIDIA claims this one chip backs all the power of a Drive PX 2 computer (2 x Parker + 2 x Parker).

I haven't figured out how this is possible, given that this is still on the 16nm process.

The power is a mystery. The GTX 1080 @ 7B transistors is 180 watts. Xavier is same number of transistors at 20 watts. I assume the latter uses LP process. But can it make that much difference?

As for perf—there's no sane way to get to 20 TOPS based on the existing arch. It would take 512 cores clocked at 5 GHz + INT8 to get there. But that's obviously absurd. Best guess is the computer vision accelerator has some kind of programmable low cost INT8 units that boosts performance.

Thoughts?

Deleted member 13524 · Sep 28, 2016

Tegra is being discussed here:

https://forum.beyond3d.com/posts/1945831/

Just some quick tidbits:

- Number of transistors alone doesn't dictate power consumption. Skylake Y is probably around 1.5B transistors and it has a 4.5W TDP.
- INT8 throughput in Xavier may not be entirely done on the GPU's "CUDA cores". In fact, there's a good chance they're not, since the same presentation said the SoC would have a GPU with 512 cores.

Psycho · Sep 29, 2016

yeah, the majority of those TOPS don't likely come from the normal shader (or arm) cores. Do we have any kind of TOPS or Watt rating for the Google TPU? (which obviously lacks the more general cores, but..)

JF_Aidan_Pryde · Sep 29, 2016

Some indirect measurements of Google's TPU from Google's paper on neural machine translation.
They note that because of workload mix / CPU-GPU transfer not being optimal, GPU is not performing optimally in these measurements.

itaru · Sep 29, 2016

Xavier has CVA(Computer Vision Accelerator).
Maybe DL 20TOPs is spec of CVA.

maybe CVA is like a Eyeriss.

Ailuros · Sep 29, 2016

ToTTenTranz said:
And/Or the GPU proportion of the new SoC simply got smaller, like what happened with Parker. And/Or this particular version of Volta has additional hardware exclusively dedicated to INT8 operations (akin to PowerVR's FP16 units).

OT but I could imagine that Series7XT Plus https://imgtec.com/blog/powervr-series7xt-plus-gpus-advanced-graphics-computer-vision/ has additional dedicated INT logic. Everything else before that was capable only of INT32; the Plus IP cores expand to up to 4* INT8 per INT32. That's why I asked Ryan why he thinks that 20 TOPs in a 20W power portofolio would be impossible. The pipelines would just need to be wide enough to reach a high enough throughput and yes I'd also consider it possible that other blocks of the SoC like the CVA mentioned above might contribute to those 20 TOPs.

Either way and even apart from the INT pipeline I'd expect Volta ALUs to be significantly wider than we've seen so far in green architectures.

Deleted member 2197 · Sep 29, 2016

The Xavier SoC manages 20 trillion operations per second, while only using 20 watts of power.

Because it’s used in cars, Xavier was designed to meet the ISO 26262 functional safety spec, which is an international standard that sets expectations for electronics used in cars designed for road use. The SoC uses a 16nm manufacturing process, and just one can replace Nvidia’s current DRIVE PX 2 in-car computer, including a configuration of said component that includes two mobile SoCs and two discrete GPUs, while also using less power.

Xavier is intended for use by carmakers, suppliers, research organizations and startups looking to field and test their own self-driving cars. You won’t see it in any cars in the near future, however — Nvidia says it will start shipping the first samples in the fourth quarter of next year.

Meanwhile, Nvidia also teamed up with TomTom in a partnership that will see the two companies combining Nvidia’s AI platform and TomTom’s mapping data to provide real-time, localized mapping data for use in highway and freeway driving situations. Nvidia also demonstrated its own AI-based self-driving research vehicle, which learned how to drive itself based entirely on observing human driving behavior.

https://techcrunch.com/2016/09/28/nvidias-new-xavier-soc-is-an-ai-supercomputer-for-cars/

Ailuros · Sep 29, 2016

pharma said:
https://techcrunch.com/2016/09/28/nvidias-new-xavier-soc-is-an-ai-supercomputer-for-cars/

I find that last sentence particularly amusing from the author:

Clearly, Nvidia’s ambitions lie with autonomous driving, and it’s starting to look like its work in that area could eclipse its identity as a graphics hardware provider pretty quickly.

Not really....but hey whatever floats anyones....errr autonomous boat.... *cough*

Voxilla · Sep 29, 2016

Psycho said:
yeah, the majority of those TOPS don't likely come from the normal shader (or arm) cores. Do we have any kind of TOPS or Watt rating for the Google TPU? (which obviously lacks the more general cores, but..)

I would think that the Neural Net computation is done with a new kind of special function block (that can be optionally added to a core)
There is a lot of efficiency to be gained compared to doing dot products via registers, the neuron inputs and accumulated values can be kept internally in those units reducing the amount of moved data and thus power consumption.

itaru · Sep 29, 2016

Eyeriss: An Energy-Efficient Reconfigurable Accelerator
for Deep Convolutional Neural Networks

http://www.rle.mit.edu/eems/wp-content/uploads/2016/02/eyeriss_isscc_2016_slides.pdf

CVA is Eyeriss,maybe

Erinyes · Sep 29, 2016

Given that its going to be in mass production only in 2018, I'm surprised that Xavier is not on 10nm. I was also expecting Nvidia to use some ARM R52 cores as well but looks like they have certified the Denver cores for ISO 26262.

Ailuros said:
I find that last sentence particularly amusing from the author:

Not really....but hey whatever floats anyones....errr autonomous boat.... *cough*

Its a huge market. The potential revenues from it could far exceed those from the GPU market.

ModEdit: Irrelevant bits removed & copied to spin-off

xpea · Sep 30, 2016

itaru said:
Eyeriss: An Energy-Efficient Reconfigurable Accelerator
for Deep Convolutional Neural Networks

http://www.rle.mit.edu/eems/wp-content/uploads/2016/02/eyeriss_isscc_2016_slides.pdf

CVA is Eyeriss,maybe

Bingo ! That's a very good assumption. I don't see how you gain 4 times the power efficiency at the same node without a totally different uarch. Especially since we are talking about a very specific kind of mathematical problem. GPUs generic ALUs are not the most efficient to solve this computation need. This Eyeriss accelerator (or co-processor) is the only way to keep competitive against the dedicated deep learning ASICs that are under development. And it also proves how much Nvidia wants this market ...

A1xLLcqAgt0qc2RyMz0y · Oct 2, 2016

xpea said:
I don't see how you gain 4 times the power efficiency at the same node without a totally different uarch.

How quickly you seem to have forgotten Maxwell.

Maxwell was on the same 28nm process as Kepler yet made vast uarch improvements.

BRiT · Oct 2, 2016

xpea said:
I don't see how you gain 4 times the power efficiency at the same node without a totally different uarch.

A1xLLcqAgt0qc2RyMz0y said:
How quickly you seem to have forgotten Maxwell.

Maxwell was on the same 28nm process as Kepler yet made vast uarch improvements.

He hasn't forgotten Maxwell. It sounds like you're in agreement with the second part of his sentence.

cheapchips · Oct 2, 2016

Is it realistic to expect Maxwell level gains again? You surely don't get to make efficiency/power optimisations on that scale twice?

NVIDIA Tegra Architecture

Voxilla

Picao84

liolio

Aquoiboniste

Deleted member 13524

Guest

liolio

Aquoiboniste

JF_Aidan_Pryde

Deleted member 13524

Guest

Psycho

JF_Aidan_Pryde

Attachments

itaru

Ailuros

Epsilon plus three

Deleted member 2197

Guest

Ailuros

Epsilon plus three

Voxilla

itaru

Erinyes

xpea

A1xLLcqAgt0qc2RyMz0y

BRiT

(>• •)>⌐■-■ (⌐■-■)

cheapchips

Similar threads