NVIDIA Tegra Architecture

Weren't the Denver cores originally thought as distributors for GPGPU cores, to be used inside high-performance GPUs for desktop graphics cards?
That contributes to kalelovil's hypothesis.
 
GPU are traditionally not good at control code, so having CPU (working in the same memory space) might help keeping the GPU fed. Just an hypothesis :)

And there isn't any alternative hypothesis where it could be solved inside any GPU with an extra block with some dedicated extra logic for control code? My reasoning would be that IHVs these days don't have exactly endless die area or transistor budgets especially if they want to remain competitive against their major competitor. Denver cores don't strike me as small (especially if Parker contains an updated variant of the original cores), even more so 2 of them and I wouldn't suggest that IHVs would devour all possible resources today's FinFET processes can deliver in one breath. For that 10FF isn't exactly around the corner.
 
Well, remember it would be one Denver core per GPU, not 2.

edit: LOL. I'm an idiot, of course it's two. For a second I forgot there's also 2 Tegra's.

There's no need to complicated things as much; they could have also gone for a 2+2 combo. In reality with their CEO stating that the Denver cores are meant for demanding single threaded tasks that's what they'll mostly used for, with the majority of work falling either way on A57 cores. I doubt it's as much an issue for the automotive market. When Parker makes it into mobile consumer devices we'll see then if and how it actually works, I sure hope though they aren't stuck again with cluster migrating.

You're right, no much need for low power cores in a 250W device, I guess.
 
Perhaps the Denver cores are part of the Pascal GPU dies, leaving the 2 Tegras with 4xA57 (and possibly inactive 4xA53) each.
(Credit for this possibility to Exophase on the RWT forum)

Turns out a reliable source told me I was wrong about this. Too bad, it seemed to explain a lot, but I guess it doesn't really take much imagination to see why nVidia would put A57 + Denver in Tegra.
 
Drive px2

FP32:8Tflops
FP16:24Tflops

dGPU=a,Tegra P1=b(FP32 Tflops)

2*a+2*b=8
2*2a+2*4b=24

a,b=2

FP32
dGPU=2Tflops
Tegra P1=2Tflops

FP16
dGPU=4Tflops
Tegra P1=8Tflops
 
Is that the real surprise or that the PX2 contains most likely 2*GP102 cores and not as we all thought up to now 2*GP106? :p

The X1 GPU already had 2*FP16 capabilities; do they really need a different name for the X2 GPU? ;)
 
Is that the real surprise or that the PX2 contains most likely 2*GP102 cores and not as we all thought up to now 2*GP106? :p

The X1 GPU already had 2*FP16 capabilities; do they really need a different name for the X2 GPU? ;)
Yeah and possible due to the separate rumours coming out that there is a GP102 earlier than expected.
Maybe this along with Tesla HPC and research are pushing the early production of this GPU, benefit being it would also be implemented as a Titan/ti product as well.

Was the GP106 a smokescreen or is there some kind of fundamental mistake.
From a design perspective I think they would had known if they needed to replace the GP106 with some kind of GP102 quite some time ago.
For those thinking P100, just remember that GPU only has 3584 cores and presentation has 3840, and also the Pascal mentioned in the presentation has much less DP as it no longer is a 1:2 relationship between half/single/double precision, which one gets with the Tesla P100.

Anyway great catch Itaru, Kudos.
Sadly most will probably ignore this thread because of its title being about Tegra :)
Cheers
 
Yeah and possible due to the separate rumours coming out that there is a GP102 earlier than expected.
Maybe this along with Tesla HPC and research are pushing the early production of this GPU, benefit being it would also be implemented as a Titan/ti product as well.

Was the GP106 a smokescreen or is there some kind of fundamental mistake.
From a design perspective I think they would had known if they needed to replace the GP106 with some kind of GP102 quite some time ago.
For those thinking P100, just remember that GPU only has 3584 cores and presentation has 3840, and also the Pascal mentioned in the presentation has much less DP as it no longer is a 1:2 relationship between half/single/double precision, which one gets with the Tesla P100.

Anyway great catch Itaru, Kudos.
Sadly most will probably ignore this thread because of its title being about Tegra :)
Cheers

The confusion came probably from the 8 TFLOP claim for which everyone assumed it's FP32. If you wrongly consider 4 TFLOPs FP32 your closest candidate would be GP106 out of the Pascal family. By the moment it becomes clear that they meant FP64 it can only be a high end core. It could very well be a GP100 also, it's just that 2*610mm2 cores sound like quite a tall order.
 
The confusion came probably from the 8 TFLOP claim for which everyone assumed it's FP32. If you wrongly consider 4 TFLOPs FP32 your closest candidate would be GP106 out of the Pascal family. By the moment it becomes clear that they meant FP64 it can only be a high end core. It could very well be a GP100 also, it's just that 2*610mm2 cores sound like quite a tall order.
Not sure I follow.
They use FP64 in the presentation as well for the P100 DGX-1 example where they state 40Tflops - that is exactly 8xP100.

Regarding the precision numbers 24Tflops FP16 and 8Tflops FP64, whichever way you look at it, it is a reduced FP64 number of cores along with 3840 cores instead of 3584. [wrong assumption by me for FP32 as 1:2 ratio is improved upon over P100, could still be a GP100 variant]
Bear in mind it is 2xGPUs to get those figures.

Edit:
AH I think I get it, you mean the expectation/assumption of it being GP106, sorry got you now :oops:
Yeah you could be right, would need to check if Nvidia ever mentioned a smaller GPU, or based upon what was held up (so potentially a placeholder or was it later on shown with a real small GPU), or like you say.
Cheers
 
Last edited:
http://emit.tech/EMiT2016/Ramirez-EMiT2016-Barcelona.pdf
px2
2x Tegra X2
 4x ARM Cortex-A57
 2x NVIDIA Denver2
2x 3840-core Pascal GPU
 8 TFLOPS (64-bit FP)
 24 TFLOPS (16-bit FP)
1 GbE cluster interconnect
Whatever it is, it's not what they've been showing around (not to mention being leaps and bounds too powerful for Drive-PX2, heck, that thing is meant for cars)
The one they've been showing around has 2 small'ish GPUs, probably GP106s
 
Yeah in theory you can get 3840 cores with the large die GP100 but in reality this is not currently achievable and why P100 has 3584, also there is the fact whatever GPU is described in the presentation for the PX2 has less FP64 to FP32 ratio.
Reducing FP64 cores and increasing the FP32 cores to 3840 is what is expected of the GP102.

Maybe it is still a P100 but not sure how they would do these changes to it.
Cheers

Edit:
OK I am being a bit dense with my headache lol.
One cannot assume the FP32 is 12 Tflops as that would improve the 1:2 rate over the P100.
Which may explain why they only mention FP64 and FP16.

Maybe this is after all a GP100 variant *shrug*
 
Last edited:
also there is the fact whatever GPU is described in the presentation for the PX2 has less FP64 to FP32 ratio.
Not sure where you are getting this.... I only see 64 and 16 bit performance listed?
 
The confusion came probably from the 8 TFLOP claim for which everyone assumed it's FP32. If you wrongly consider 4 TFLOPs FP32 your closest candidate would be GP106 out of the Pascal family. By the moment it becomes clear that they meant FP64 it can only be a high end core. It could very well be a GP100 also, it's just that 2*610mm2 cores sound like quite a tall order.

Your absolutely right about the confusion caused with the live event and how they presented the figures, oh man no idea how I missed that.
There is no way a GP106 can do what they showed at the Pascal live event all those months ago.
Back then live they showed as you say 8Tflops and the 24DL Tflops.
Shows the spec at 1min59secs


As you say I think not everyone got the 8Tflops figure is actually FP64 cores, while the 24DL Tflops is the FP16, which would still require FP32 of 12TFlops.
This is just not possible for 2xGP106 to do that FP64, and even the FP32 on its own would be a stretch, compounded that the PX2 is probably spec'd with conservative clocking.

Amazing how nearly everyone missed this.
Thanks again for giving that context otherwise I would had missed it.

Edit:
OK I am being a bit dense with my headache lol.
One cannot assume the FP32 is 12 Tflops as that would improve the 1:2 rate over the P100.
Which may explain why they only mention FP64 and FP16.

Maybe this is after all a GP100 variant *shrug*
 
Last edited:
Maybe you guys will care about this but maybe not. ;) NV has released a new Android 6.0.1 update for Shield K1, with improvements to their Vulkan support. Though useless as Vulkan may be thus far. I don't know if there are even benchmarks for that yet. They also support something called Android Professional Audio with low latency, USB/Bluetooth device support, and MIDI support.

https://shield.nvidia.com/support/shield-tablet-k1/release-notes/1

Updates for the original Shield Tablet tend to come a lag time but they seem to continue to support it well also. Which is how it should be considering the hardware is identical (I've run Shield Tablet firmware on my K1).
 
Last edited:
Back
Top