nVidia Denver discussion

Some (more) Project Denver benchmarks

http://www.realworldtech.com/forum/?threadid=147408&curpostid=147408

By: Gian-Carlo Pascuttog

I couldn't really find the Project Denver / K1-64 benchmarks I was interested in. Most of them seem to jump from Geekbench core tests to comparing outdated JS benchmarks (...in not necessarily the same browser compile etc).

Now that I have one I've played a bit with it. JS benchmarks are all from the current Firefox ARMv7 32-bit Nightly. Dromaeo has a good mix of low-level loops and higher-level DOM manipulation, Octane 2.0 is a decent mix of "real" JS code, sjeng is the latest version of the engine in SPEC2006, 32-bit compile.

Of course I can't eliminate the influence of the SoC and memory on the core. But for example something like sjeng doesn't depend on memory bandwidth, only a little bit on latency.

Higher = better.

Cortex A8 1Ghz (Galaxy Nexus)
sjeng NPS: 96713
Dromaeo: no RAM
Octane: no RAM

Cortex A9 1Ghz (Galaxy Tab 10.1)
sjeng NPS: 103951
Dromaeo: no RAM
Octane: 1873

Krait 300 1.5Ghz (Nexus 7)
sjeng NPS: 116366
Dromaeo: 132 runs/s
Octane: 2935

Denver K1-64 2.3Ghz (Nexus 9)
sjeng NPS: 543403
Dromaeo: 489 runs/s
Octane: 8666

Haswell 3.8Ghz (Dell Desktop)
sjeng NPS: 1142511
Dromaeo: 1327 runs/s
Octane: 33422

Interesting observations for me:

Krait scores are fairly low, hardly outperforming Cortex A9 clock-for-clock. We had some issues in another project with some DSP code that turned out to run faster in FP mode on Krait - very unusual for an ARM chip. We initially thought it was because the Krait had a great FPU but this makes me doubt that a bit. I'm wondering if the chip is thermal throttled in the N7? Maybe I can check on a N4 someday.

Denver core seems excellent, especially compared to the other ARM chips. Krait is more or less demolished. Compared to Haswell, best result is only 22% slower clock for clock if the code is fairly static and small, on workloads with more and varying instruction code it can blow up to 200% easily though. Too bad crosscompiling and running gcc as a benchmark is such a bother on an Android device.
 
He updated his result since then:
Haswell 3.8Ghz
sjeng NPS: 1137868 (gcc 4.9 32-bit)
sjeng NPS: 1634850 (gcc 4.9 64-bit)
Opus: 4.64s (gcc 4.9 32-bit)
Opus: 3.39s (gcc 4.9 64-bit)
Opus: 2.99s (gcc 4.9 float)

Denver 2.3Ghz
sjeng NPS: 551775 (gcc 4.9 32-bit)
sjeng NPS: 735914 (gcc 4.9 64-bit)
Opus: 18.92s (gcc 4.9 32-bit)
Opus: 10.44s (gcc 4.9 float androideabi -mfpu=neon)
Opus: 8.54s (gcc 4.9 float AArch64)
Opus: 7.31s (gcc 4.9 64-bit)
 
Thanks you.
I also found a few things it seems.

I think probably related dynamic code optimization.

https://chromium-review.googlesource.com/#/c/210246/
https://chromium-review.googlesource.com/#/c/210247/
https://chromium-review.googlesource.com/#/c/210248/

rush: enable 128MiB MTS carveout below top of DRAM

The recommended settings for the size of the MTS region is 128MiB.
Therefore, provide this region 128MiB below the top of DRAM for
each configuration.

t132: kick off core complex after loading MTS microcode

Once the MTS microcode is loaded the core complex can be
directed to decode the MTS and start running. The cores,
however, won't start executing until instructed to do so.

t132: load MTS microcode

The armv8 cores need to have microcode loaded before they can
be taken out of reset. Locate and load the MTS microcode at the
fixed address of 0x82000000. The ccplex, once enabled, will
decode and transfer the microcode to the carveout region.
I think it's what they call their Denver microcode files that the boot loader loads. The first github link you provided has mts in the name for the microcode file, and then there's this:

http://www.spinics.net/lists/linux-tegra/msg21009.html

And this:

http://www.spinics.net/lists/linux-tegra/msg18481.html

But I have no idea what MTS actually stands for.
 
nexus9 android 5.1.1 update MTS

shell@flounder:/ $ cat /proc/cpuinfo
cat /proc/cpuinfo
Processor : NVIDIA Denver 1.0 rev 0 (aarch64)
processor : 0
processor : 1
Features : fp asimd aes pmull sha1 sha2 crc32
CPU implementer : 0x4e
CPU architecture: AArch64
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0

Hardware : Flounder
Revision : 0000
Serial : 0000000000000000
MTS version : 33903942

previous versions

https://github.com/NVIDIA/cpu-microcode/tree/master/t132
http://lists.lysator.liu.se/pipermail/nettle-bugs/2015/003267.html

Processor : NVIDIA Denver 1.0 rev 0 (aarch64)
processor : 0
processor : 1
Features : fp asimd aes pmull sha1 sha2 crc32
CPU implementer : 0x4e
CPU architecture: AArch64
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0

Hardware : Flounder
Revision : 0000
Serial : 0000000000000000
MTS version : 33410787
 
Is there and information or details on the Denver2 core that is mentioned being in the Tegra that will be in the Drive PX2.
We are still waiting for official information about Tegra X2, which should also be detailed separately to the PX2.
I guess that is because we are not seeing them anytime soon *shrug*.
Cheers
 
We are still waiting for official information about Tegra X2, which should also be detailed separately to the PX2.
I guess that is because we are not seeing them anytime soon *shrug*.
Cheers

The DRIVE PX 2 development engine will be generally available in the fourth quarter of 2016. Availability to early access development partners will be in the second quarter.

http://nvidianews.nvidia.com/news/n...-in-car-artificial-intelligence-supercomputer

Since we are almost in Q3 seems like some real information should be available soon.
 
Since we are almost in Q3 seems like some real information should be available soon.
Yeah,
although some of the algorithm-solutions (Driveworks) more recently were mentioned for broader release date Q1 2017, so seems a bit of a confusing situation.
Fingers crossed they sell the hardware Q3 and we get all the associated references.
Cheers
 
Back
Top