NVIDIA Tegra Architecture

Deleted member 13524 · Jan 13, 2016

Weren't the Denver cores originally thought as distributors for GPGPU cores, to be used inside high-performance GPUs for desktop graphics cards?
That contributes to kalelovil's hypothesis.

Ailuros · Jan 13, 2016

Laurent06 said:
GPU are traditionally not good at control code, so having CPU (working in the same memory space) might help keeping the GPU fed. Just an hypothesis

And there isn't any alternative hypothesis where it could be solved inside any GPU with an extra block with some dedicated extra logic for control code? My reasoning would be that IHVs these days don't have exactly endless die area or transistor budgets especially if they want to remain competitive against their major competitor. Denver cores don't strike me as small (especially if Parker contains an updated variant of the original cores), even more so 2 of them and I wouldn't suggest that IHVs would devour all possible resources today's FinFET processes can deliver in one breath. For that 10FF isn't exactly around the corner.

Benetanegia · Jan 13, 2016

Well, remember it would be one Denver core per GPU, not 2.

edit: LOL. I'm an idiot, of course it's two. For a second I forgot there's also 2 Tegra's.

Ailuros said:
There's no need to complicated things as much; they could have also gone for a 2+2 combo. In reality with their CEO stating that the Denver cores are meant for demanding single threaded tasks that's what they'll mostly used for, with the majority of work falling either way on A57 cores. I doubt it's as much an issue for the automotive market. When Parker makes it into mobile consumer devices we'll see then if and how it actually works, I sure hope though they aren't stuck again with cluster migrating.

You're right, no much need for low power cores in a 250W device, I guess.

Exophase · Jan 14, 2016

kalelovil said:
Perhaps the Denver cores are part of the Pascal GPU dies, leaving the 2 Tegras with 4xA57 (and possibly inactive 4xA53) each.
(Credit for this possibility to Exophase on the RWT forum)

Turns out a reliable source told me I was wrong about this. Too bad, it seemed to explain a lot, but I guess it doesn't really take much imagination to see why nVidia would put A57 + Denver in Tegra.

Picao84 · Apr 9, 2016

Around the time of Shield TV launch I speculated that Nvidia could develop its own OS for their own game console made with Tegra. Well, now it appears there are rumours they are working on their own Linux Distro: http://fossbytes.com/nlinux-nvidia-working-distribution-linux-gamers/

SoreSpoon · May 27, 2016

Sorry for the necro-bump, but I have a question. Does anybody know the actual die size of the Tegra X1?

itaru · Jun 9, 2016

Drive px2

FP32:8Tflops
FP16:24Tflops

dGPU=a,Tegra P1=b(FP32 Tflops)

2*a+2*b=8
2*2a+2*4b=24

a,b=2

FP32
dGPU=2Tflops
Tegra P1=2Tflops

FP16
dGPU=4Tflops
Tegra P1=8Tflops

itaru · Jun 16, 2016

http://emit.tech/EMiT2016/Ramirez-EMiT2016-Barcelona.pdf
px2
2x Tegra X2
　4x ARM Cortex-A57
　2x NVIDIA Denver2
2x 3840-core Pascal GPU
　8 TFLOPS (64-bit FP)
　24 TFLOPS (16-bit FP)
1 GbE cluster interconnect

Deleted member 13524 · Jun 16, 2016

So Parker is not Tegra P1 but Tegra X2?

Ailuros · Jun 16, 2016

Is that the real surprise or that the PX2 contains most likely 2*GP102 cores and not as we all thought up to now 2*GP106?

The X1 GPU already had 2*FP16 capabilities; do they really need a different name for the X2 GPU?

ninelven · Jun 16, 2016

GP102? Looks like two fully enabled GP100s at ~750 MHz to me...

CSI PC · Jun 16, 2016

Ailuros said:
Is that the real surprise or that the PX2 contains most likely 2*GP102 cores and not as we all thought up to now 2*GP106?

The X1 GPU already had 2*FP16 capabilities; do they really need a different name for the X2 GPU?

Yeah and possible due to the separate rumours coming out that there is a GP102 earlier than expected.
Maybe this along with Tesla HPC and research are pushing the early production of this GPU, benefit being it would also be implemented as a Titan/ti product as well.

Was the GP106 a smokescreen or is there some kind of fundamental mistake.
From a design perspective I think they would had known if they needed to replace the GP106 with some kind of GP102 quite some time ago.
For those thinking P100, just remember that GPU only has 3584 cores and presentation has 3840, and also the Pascal mentioned in the presentation has much less DP as it no longer is a 1:2 relationship between half/single/double precision, which one gets with the Tesla P100.

Anyway great catch Itaru, Kudos.
Sadly most will probably ignore this thread because of its title being about Tegra

Cheers

ninelven · Jun 16, 2016

GP100 Whitepaper said:
Each GPC inside GP100 has ten SMs. Each SM has 64 CUDA Cores and four texture units. With 60 SMs, GP100 has a total of 3840 single precision CUDA Cores and 240 texture units. Each memory controller is attached to 512 KB of L2 cache, and each HBM2 DRAM stack is controlled by a pair of memory controllers. The full GPU includes a total of 4096 KB of L2 cache.

https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf

Ailuros · Jun 16, 2016

CSI PC said:
Yeah and possible due to the separate rumours coming out that there is a GP102 earlier than expected.
Maybe this along with Tesla HPC and research are pushing the early production of this GPU, benefit being it would also be implemented as a Titan/ti product as well.

Was the GP106 a smokescreen or is there some kind of fundamental mistake.
From a design perspective I think they would had known if they needed to replace the GP106 with some kind of GP102 quite some time ago.
For those thinking P100, just remember that GPU only has 3584 cores and presentation has 3840, and also the Pascal mentioned in the presentation has much less DP as it no longer is a 1:2 relationship between half/single/double precision, which one gets with the Tesla P100.

Anyway great catch Itaru, Kudos.
Sadly most will probably ignore this thread because of its title being about Tegra
Cheers

The confusion came probably from the 8 TFLOP claim for which everyone assumed it's FP32. If you wrongly consider 4 TFLOPs FP32 your closest candidate would be GP106 out of the Pascal family. By the moment it becomes clear that they meant FP64 it can only be a high end core. It could very well be a GP100 also, it's just that 2*610mm2 cores sound like quite a tall order.

CSI PC · Jun 16, 2016

Ailuros said:
The confusion came probably from the 8 TFLOP claim for which everyone assumed it's FP32. If you wrongly consider 4 TFLOPs FP32 your closest candidate would be GP106 out of the Pascal family. By the moment it becomes clear that they meant FP64 it can only be a high end core. It could very well be a GP100 also, it's just that 2*610mm2 cores sound like quite a tall order.

Not sure I follow.
They use FP64 in the presentation as well for the P100 DGX-1 example where they state 40Tflops - that is exactly 8xP100.

Regarding the precision numbers 24Tflops FP16 and 8Tflops FP64, whichever way you look at it, it is a reduced FP64 number of cores along with 3840 cores instead of 3584. [wrong assumption by me for FP32 as 1:2 ratio is improved upon over P100, could still be a GP100 variant]
Bear in mind it is 2xGPUs to get those figures.

Edit:
AH I think I get it, you mean the expectation/assumption of it being GP106, sorry got you now

Yeah you could be right, would need to check if Nvidia ever mentioned a smaller GPU, or based upon what was held up (so potentially a placeholder or was it later on shown with a real small GPU), or like you say.
Cheers

Kaotik · Jun 16, 2016

itaru said:
http://emit.tech/EMiT2016/Ramirez-EMiT2016-Barcelona.pdf
px2
2x Tegra X2
　4x ARM Cortex-A57
　2x NVIDIA Denver2
2x 3840-core Pascal GPU
　8 TFLOPS (64-bit FP)
　24 TFLOPS (16-bit FP)
1 GbE cluster interconnect

Whatever it is, it's not what they've been showing around (not to mention being leaps and bounds too powerful for Drive-PX2, heck, that thing is meant for cars)
The one they've been showing around has 2 small'ish GPUs, probably GP106s

CSI PC · Jun 16, 2016

ninelven said:
https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf

Yeah in theory you can get 3840 cores with the large die GP100 but in reality this is not currently achievable and why P100 has 3584, also there is the fact whatever GPU is described in the presentation for the PX2 has less FP64 to FP32 ratio.
Reducing FP64 cores and increasing the FP32 cores to 3840 is what is expected of the GP102.

Maybe it is still a P100 but not sure how they would do these changes to it.
Cheers

Edit:
OK I am being a bit dense with my headache lol.
One cannot assume the FP32 is 12 Tflops as that would improve the 1:2 rate over the P100.
Which may explain why they only mention FP64 and FP16.

Maybe this is after all a GP100 variant *shrug*

ninelven · Jun 16, 2016

also there is the fact whatever GPU is described in the presentation for the PX2 has less FP64 to FP32 ratio.

Not sure where you are getting this.... I only see 64 and 16 bit performance listed?

CSI PC · Jun 16, 2016

Ailuros said:
The confusion came probably from the 8 TFLOP claim for which everyone assumed it's FP32. If you wrongly consider 4 TFLOPs FP32 your closest candidate would be GP106 out of the Pascal family. By the moment it becomes clear that they meant FP64 it can only be a high end core. It could very well be a GP100 also, it's just that 2*610mm2 cores sound like quite a tall order.

Your absolutely right about the confusion caused with the live event and how they presented the figures, oh man no idea how I missed that.
There is no way a GP106 can do what they showed at the Pascal live event all those months ago.
Back then live they showed as you say 8Tflops and the 24DL Tflops.
Shows the spec at 1min59secs

As you say I think not everyone got the 8Tflops figure is actually FP64 cores, while the 24DL Tflops is the FP16, which would still require FP32 of 12TFlops.
This is just not possible for 2xGP106 to do that FP64, and even the FP32 on its own would be a stretch, compounded that the PX2 is probably spec'd with conservative clocking.

Amazing how nearly everyone missed this.
Thanks again for giving that context otherwise I would had missed it.

Edit:
OK I am being a bit dense with my headache lol.
One cannot assume the FP32 is 12 Tflops as that would improve the 1:2 rate over the P100.
Which may explain why they only mention FP64 and FP16.

Maybe this is after all a GP100 variant *shrug*

swaaye · Jun 16, 2016

Maybe you guys will care about this but maybe not.

NV has released a new Android 6.0.1 update for Shield K1, with improvements to their Vulkan support. Though useless as Vulkan may be thus far. I don't know if there are even benchmarks for that yet. They also support something called Android Professional Audio with low latency, USB/Bluetooth device support, and MIDI support.

https://shield.nvidia.com/support/shield-tablet-k1/release-notes/1

Updates for the original Shield Tablet tend to come a lag time but they seem to continue to support it well also. Which is how it should be considering the hardware is identical (I've run Shield Tablet firmware on my K1).

NVIDIA Tegra Architecture

Deleted member 13524

Guest

Ailuros

Epsilon plus three

Benetanegia

Exophase

Picao84

SoreSpoon

itaru

itaru

Deleted member 13524

Guest

Ailuros

Epsilon plus three

ninelven

PM

CSI PC

ninelven

PM

Ailuros

Epsilon plus three

CSI PC

Kaotik

Drunk Member

CSI PC

ninelven

PM

CSI PC

swaaye

Entirely Suboptimal

Similar threads