NVIDIA Tegra Architecture

My pet theory is that Google did the X1 kernel and config for N9, turned them off for whatever reason, and that kernel and setup has persisted in X1 Android (X1 Chromebooks share the kernel) ever since because it works and nobody wants to revisit it.

I don't think the cluster is functionally broken. Someone should just test it.
N9 doesn't use Tegra X1. What is this X1 kernel you refer to?
 
Sorry, swap N9 for Shield Android TV. Swap Google for NVIDIA too.
 
My pet theory is that Google did the X1 kernel and config for N9, turned them off for whatever reason, and that kernel and setup has persisted in X1 Android (X1 Chromebooks share the kernel) ever since because it works and nobody wants to revisit it.

I don't think the cluster is functionally broken. Someone should just test it.

My gut feeling also tells me (as I've said a couple of times already) that for the markets/devices the X1 is targetting so far the A53 cluster the power saving benefits it would bring would be questionable. Otherwise I find it hard to imagine that they didn't bother to revisit the config IF you pet theory is correct (which sounds quite plausible).
 
Seems like the tablet of choice to play all those AAA Android games with high-end graphics, like Angry Birds.
 
Don't overestimate what all the hot and exciting SOCs can actually handle. And also don't overestimate the driver quality of Android SOCs. I think you'd need Tegra K1 or X1 to run KOTOR without upscaling. ;) Baytrail and Tegra 4 certainly can't handle that at 1920x1200.

Actually I also noticed that KOTOR is missing some bump mapping effects. Surely done to make it playable on more SOCs.
 
Last edited:
JTX1_block-624x398.png


One of the highlights of the Jetson software ecosystem is an incredible deep learning toolkit built on CUDA, providing Jetson with onboard inference and the ability to apply reasoning in the field. Included is NVIDIA’s cuDNN library, adopted by multiple deep learning frameworks including Caffe.

We ran a power benchmark using the Caffe AlexNet image classifier, comparing Jetson TX1 to an Intel Core i7-6700K Skylake CPU. The table shows the results. Read more about these results in the post Inference: The Next Step in GPU-Accelerated Deep Learning”.

platform img / s Power (AP+DRAM) Perf/watt Efficiency versus i7-6700K
Intel i7-6700K 242 62.5W 3.88 1x
Jetson TX1 258 5.7W 45 11.5x

Kespry Designs, a Silicon Valley industrial drone developer, is using deep learning on Jetson TX1 to provide inference on construction sites for asset tracking of equipment and materials. This takes the tiresome, human-intensive work out of looking after assets and on-site logistical planning. Due to the low SWaP and computational capability of Jetson TX1, Kespry plans to migrate processing onboard Unmanned Aerial Vehicles instead of offline in the datacenter, shortening response times for tasks like inspection and triage.

http://devblogs.nvidia.com/parallel...ext-wave-of-autonomous-machines/#comment-1192
 
I'd be surprised if it's more than an improved refresh of the original Denver core. I also figure that for a workload like in automotive a core like Denver might be more ideal than anywhere else.

Very weird cpu combo for that DRIVE PX 2 thing.

If that's supposed to be "weird" what's the MTK Helio X30 then? :D
 
What if the Denver cores are actually the low power cores too? I mean, appart from the code morphing mumbo jumbo, Denver also had a weak hardware decoder, right? Couldn't it be posible to turn off all the hardware related to code morphing and leave the Denver cores operating very similarly to an A53 or whatever? Then for slightly more demanding or highly threaded workloads they'd use the A57's and the Denver cores would take care of lightly threaded high-performance tasks once the code morphing gets turned on. The A57 would also hide any posible latency from this change of state, by being in the middle.

Maybe this is something so obvious that no one cared to mention, but I thought it was a novel idea and haven't seen it anywhere. What do you think guys?
 
They should have had 2*A57+4*Denver cores then.

Care to elaborate a little bit on that? Maybe it's completely imposible on a technical side, but my suggestion is that Denver acts as both the low power (code morphing portion turned off, no relatively expensive memory access to microcode cache, etc. Just plain and slow in-order execution) and highest performance core. Usually there's more low power/performance cores than high power/performance ones and that's, I think, why logic dictates that there would be only 2 Denver cores, if such a technology was to be used. Yeah you also only get 2 low power cores, but historically Nvidia has been going with just one low power core, except for the X1. Compromises need to be made somewhere.

Denver cores are also probably bigger than A57? So it would probably be a waste to have that combo.
 
Perhaps the Denver cores are part of the Pascal GPU dies, leaving the 2 Tegras with 4xA57 (and possibly inactive 4xA53) each.
(Credit for this possibility to Exophase on the RWT forum)
 
Care to elaborate a little bit on that? Maybe it's completely imposible on a technical side, but my suggestion is that Denver acts as both the low power (code morphing portion turned off, no relatively expensive memory access to microcode cache, etc. Just plain and slow in-order execution) and highest performance core. Usually there's more low power/performance cores than high power/performance ones and that's, I think, why logic dictates that there would be only 2 Denver cores, if such a technology was to be used. Yeah you also only get 2 low power cores, but historically Nvidia has been going with just one low power core, except for the X1. Compromises need to be made somewhere.

Denver cores are also probably bigger than A57? So it would probably be a waste to have that combo.

There's no need to complicated things as much; they could have also gone for a 2+2 combo. In reality with their CEO stating that the Denver cores are meant for demanding single threaded tasks that's what they'll mostly used for, with the majority of work falling either way on A57 cores. I doubt it's as much an issue for the automotive market. When Parker makes it into mobile consumer devices we'll see then if and how it actually works, I sure hope though they aren't stuck again with cluster migrating.

Perhaps the Denver cores are part of the Pascal GPU dies, leaving the 2 Tegras with 4xA57 (and possibly inactive 4xA53) each.
(Credit for this possibility to Exophase on the RWT forum)

Interesting theory; however I would then like to read how that entire enchilada has been exactly connected. If they found a way to connect the GP10b GPUs on the Parker SoCs with the 2 "main" GPU cores then it could be theoretically easy. If not it sounds too complicated and raises a few more important question marks:

What the heck do you need 2 CPU cores in a 3-4 TFLOPs FP32 Pascal GPU core for, for example....in all likeliness those will be mainstream GPU cores like today's GM206. What am I missing exactly?
 
Last edited:
What the heck do you need 2 CPU cores in a 3-4 TFLOPs FP32 Pascal GPU core for, for example....in all likeliness those will be mainstream GPU cores like today's GM206. What am I missing exactly?
GPU are traditionally not good at control code, so having CPU (working in the same memory space) might help keeping the GPU fed. Just an hypothesis :)
 
Back
Top