NVIDIA Tegra Architecture

http://emit.tech/EMiT2016/Ramirez-EMiT2016-Barcelona.pdf
px2
2x Tegra X2
 4x ARM Cortex-A57
 2x NVIDIA Denver2
2x 3840-core Pascal GPU
 8 TFLOPS (64-bit FP)
 24 TFLOPS (16-bit FP)
1 GbE cluster interconnect

That must be a typo, or a different product than the PX2 presented at GTC2016 (Perhaps 3840-cores is a combined figure for all GPUs and iGPUs).

NVDA_DRIVEPX2_1_Computation.jpg


NVDA_DRIVEPX2_4_Compute.jpg


34bmezbo.3yx.jpg


http://vrworld.com/2016/04/05/nvidia-drive-px2-next-gen-tegra-pascal-gpu/
http://www.techpowerup.com/218922/nvidia-announces-drive-px-2-mobile-supercomputer
 
Thanks, my head was hurting!

Back to something sane :

24 Deep Learning Tera-OPS
8 TeraFLOPS (FP32)

if this is the full fat GPU (only GP100) that'd do 4 TFLOPS of FP64, else 0.25 TFLOPS of FP64.

How are the 24 trillions cooked exactly...
Btw the pdf paper had several typos here and there.
 
Last edited:
64bit of PX2 will be a mistake.
Probably it should be 8tflops in 32bit.
3840 cores are not 32bit; in the case of 16bit.

Probably it is 1280 cores and 2560 cores in 16bit.

Probably it is like that

FP16
Tegra X2:2560 core(16bit),dGPU:1280 core(16bit)
2*(2560+1280)*2*1.56=23962Gflops

FP32
Tegra X2:640 core(32bit),dGPU:640 core(32bit)
2*(640+640)*2*1.56=7987Gflops
 
64bit of PX2 will be a mistake.
Probably it should be 8tflops in 32bit.
3840 cores are not 32bit; in the case of 16bit.

Probably it is 1280 cores and 2560 cores in 16bit.

Probably it is like that

FP16
Tegra X2:2560 core(16bit),dGPU:1280 core(16bit)
2*(2560+1280)*2*1.56=23962Gflops

FP32
Tegra X2:640 core(32bit),dGPU:640 core(32bit)
2*(640+640)*2*1.56=7987Gflops

So why would they show the DGX-1 in the same context several pages later also only with its FP64 figure?
Page 15.
8xP100=40Tflops FP64.

Regarding Cuda cores, the FP16 should be part of the FP32 cores like Tesla P100.
Even if you just look at Tegra X1, the FP16 was part of the FP32 Cuda core with single Vec2.
So Cuda cores should only be shown as FP32 or FP64 or combination, and not taken down to FP16.

But whichever way you look at it, something is strange with the figures.
Edit:
Figures may be TeraOps with Int8 for DL, along with FP32.
So no idea how Alex Ramirez mixed a lot of this up as his background is the scientific-engineer research and not marketing.
Cheers
 
Last edited:
Whatever it is, it's not what they've been showing around (not to mention being leaps and bounds too powerful for Drive-PX2, heck, that thing is meant for cars)
The one they've been showing around has 2 small'ish GPUs, probably GP106s
I got a feeling it is in the context of implementing "mobile solutions" in HPC and supercomputers (where they may focus on DP64 for a fair few presentations), but even then it is wrong because the two figures are straight out of the live event revealing PX2.
Quite a few presentations suggest this is potentially the path to exascale.
I need to read up again what was done with Tegra as a concept in the HPC area , been tested a few times.
Background: http://www.electronicsweekly.com/news/products/micros/tesla-nvidias-hpc-processor-2014-11/

Edit:
And Alex Ramirez background is HPC research at Nvidia, I think.
*Shrug*.
Cheers
 
Last edited:

At 1:58 into the presentation you see this:

12 CPU cores | Pascal GPU | 8 TFLOPS | 24 DL TOPS | 16nm FF | 250W |Liquid Cooled

That is for the combined (2x) Tegra and (2x) discrete Pascal GPU so divide those numbers by 2 to get:

6 CPU cores | Pascal GPU | 4 TFLOPS | 12 DL TOPS for each Tegra X2 with integrated Pascal and Discrete Pascal

Each Tegra X2 contains (4) A57 and (2) Denver2 cpu cores and an integrated Pascal GPU with unknown number of cores.

I also believe that the 3840 cores is the combined internal Pascal + the discrete Pascal cores time 2 not 2x 3840 listed earlier.

So 1920 cores from the combined internal + discrete Pascal and you are back to having a GP106 as the discrete Pascal.
 
I also believe that the 3840 cores is the combined internal Pascal + the discrete Pascal cores time 2 not 2x 3840 listed earlier.

So 1920 cores from the combined internal + discrete Pascal and you are back to having a GP106 as the discrete Pascal.
You are probably right but the presentation recently presented 2nd June at an HPC-supercomputer event explicitely shows it as 2x3840.
What makes it more confusing is that the author of the presentation is a research engineer for Nvidia with a focus in HPC-supercomputers, not the sort of person you would expect to make such a list of mistakes.
And this was presented at a moderately senior engineer event.

As Kaotik says the context must be different, but it does not help the figures supplied in that presentation are also straight from the much earlier Pascal/PX2 announcement event, so what gives *shrug*.
Cheers
 
Last edited:
That would be the weirdest typo-chain ever if true. On page 10 they mention clearly:

2x
Tegra X1
.......
2.3 TFLOPS (32-bit FP)

So far so good, no typo here.

2x 3840-core Pascal GPU
8 TFLOPS (64-bit FP)
24 TFLOPS (16-bit FP)

IF the highlighted are typos then it's most likely not one but two. Now either the author was consuming something hallucinogenic (to quote Jensen) or there's something essential I'm missing.
 
IF the highlighted are typos then it's most likely not one but two. Now either the author was consuming something hallucinogenic (to quote Jensen) or there's something essential I'm missing.
And then it would be three mistakes when you then add the FP64 continuation Page 14 "This is what 40TFLOPS looks like", and shows the DGX-1 page 15 (8xP1000 is 40Tflops with FP64), clearly a continuation of the This is what 8TFlops looks like.

Not a mistake but page 16 he is very explicit about HBM/NVlink with * Diagram does not imply any current or planned NVIDIA products
Seems strange to make so many casual mistakes and then be tight about this.
Ah well this is about as confusing with information as recent Polaris rumours-leaks :)
Cheers
 
Not a mistake but page 16 he is very explicit about HBM/NVlink with * Diagram does not imply any current or planned NVIDIA products
Seems strange to make so many casual mistakes and then be tight about this.
Ah well this is about as confusing with information as recent Polaris rumours-leaks :)
Cheers

That diagram is interesting in that Tegra's will also have NVlink.

But why the question mark on "For GPU's only?"
 
That diagram is interesting in that Tegra's will also have NVlink.

But why the question mark on "For GPU's only?"
That would be for Tegra-to-Tegra SoC and not just Pascal/Volta.

I am pretty sure he is talking about using mobile solution Drive-PX2 as an HPC/Supercomputer implementation and not for cars.
But comes back to the figures Alex Ramirez is quoting directly that match the automobile PX2, one would potentially expect them to be different.

Cheers
 
Just to add Nvidia has been building that "mobile solution" used as an HPC/Supercomputer/Exascale foundation going back to around 2011 with hybrid solutions - Alex Ramirez at Nvidia was originally involved I think in 1st Hybrid design.
They employed various senior engineers involved with the Cray Aries project, where some of these also had the view of using a Tegra+GPU as a scalable supercomputer-hyperscale solution.
Of course their primary focus was growing Tesla, but did also include Tegra.
Cheers
 
Last edited:
The simplest answer to this confusion is simply that Drive PX2 is a configurable hosting platform. It's a motherboard hosting discrete swappable GPUs. The DPX2 boards they showed at GTC have the discrete GPUs mounted on MXM boards, so we know it's true. The base SKU of DPX2 has GP106, like the ones seen in person at GTC. A top end DPX2 might swap in MXMs with P100 chips, explaining the high specs in Ramirez's talk. This also extends the DPX2 platform lifetime for designers (later car models ship with even better Volta GPUs, for example) as well as end consumers (Tesla Motors services your GP106 DPX2-equipped car by swapping the MXMs to GV104, unlocking fancier optional features for AutoPilot 2019.)

The current top-end Tesla Motors car is the Tesla S P90D. P for performance, D for dual motors. It's equipped with an NVidia Drive CX. It is quite plausible, even likely, for the successor car model, probably called the Tesla P100D, to have an upgrade to the high end SKU Drive PX2.

xhibit.png
 
Here is one of the earliest news articles I found about the Tega+Tesla "mobile solution" HPC-Hyperscale-Exascale concept with regards to resolving power issues.
http://www.montblanc-project.eu/pre...-centre-build-arm-and-gpu-based-supercomputer
This project I think would had heavily involved Alex Ramirez, whose recent presentation we are talking about.
Very early days of development, back in 2011.
Anyway parts of that presentation fits with this concept.
Cheers

Edit:
Doh just realised it even quotes Alex Ramirez in that news brief.
He became a senior research scientist at Nvidia around 2013-2014.
 
So it looks like Tegra gets another lease on life as the processor to power the Nintendo NX.

Thus from phones, to tablets, to automotive and finally a handheld gaming device, a niche category.

But it's not like device makers are beating down NV's doors ...
 
So it looks like Tegra gets another lease on life as the processor to power the Nintendo NX.

Thus from phones, to tablets, to automotive and finally a handheld gaming device, a niche category.

But it's not like device makers are beating down NV's doors ...
Seriously though, that all spawned from one single rumour, and at least TweakTown has been praising it as gospel ever since. Is there any actual evidence on Tegra being in NX?
I just find it really unlikely, considering the fact that it's supposedly "really easy to port PS4/XB1 games to NX" and the fact that Tegras just aren't that powerful, or at least weren't in last gen - can they more than double the performance in one gen, and keep it "easy to port"?
 
Seriously though, that all spawned from one single rumour, and at least TweakTown has been praising it as gospel ever since. Is there any actual evidence on Tegra being in NX?
I just find it really unlikely, considering the fact that it's supposedly "really easy to port PS4/XB1 games to NX" and the fact that Tegras just aren't that powerful, or at least weren't in last gen - can they more than double the performance in one gen, and keep it "easy to port"?

Nintendo NX Is A Portable Console With Detachable Controllers Powered By Nvidia Tegra Processor

http://wccftech.com/nintendo-nx-por...le-controllers-powered-nvidia-tegra-processor
 
nvidia to detail the 16FF Tegra, Parker (probably the one that goes into PX2), during Hot Chips in August:

http://www.pcworld.com/article/3097...neration-tegra-mobile-chip-is-on-its-way.html

To give you an idea of how long it takes from tapeout to final product..my post from May last year is below. This isnt always the case..and this looks like it has taken a bit longer than it should have..but you get the idea.
Some news on Parker. My information is that it is still on TSMC 16nm(FF/FF+?) and they have not shifted to Samsung as rumoured. It is also expected to tape out sometime this month..or has possibly already taped out since my information is slightly dated. If this is correct, we're looking at availability in late Q1'16, at best. My source indicated that it has Denver cores and not Cortex A72.
 
Last edited:
Back
Top