NVIDIA Tegra Architecture

Presumably, he's referring to the context of xpea's post mentioning "4 times the power efficiency at the same node".
 
https://blogs.nvidia.com/blog/2016/09/28/xavier
......
The power is a mystery. The GTX 1080 @ 7B transistors is 180 watts. Xavier is same number of transistors at 20 watts. I assume the latter uses LP process. But can it make that much difference?

As for perf—there's no sane way to get to 20 TOPS based on the existing arch. It would take 512 cores clocked at 5 GHz + INT8 to get there. But that's obviously absurd. Best guess is the computer vision accelerator has some kind of programmable low cost INT8 units that boosts performance.

Thoughts?
Worth remembering though that the Tesla P4 is a GP104 with 50/75W and 22 TOPs.

I agree it seems quite incredible the stats for Xavier and not sure myself how they managed it (devil will be in the details just like it was for Drive PX2), but just pointing out that even Pascal can hit some interesting volts-watt numbers for performance, and Xavier-Volta is a true next generation development.
Cheers
 
Worth remembering though that the Tesla P4 is a GP104 with 50/75W and 22 TOPs.
Such a shame that cards like these don't get consumer versions.
How many people wouldn't give >$1200 for a card like this to be put inside a tiny and silent HTPC?
 
Such a shame that cards like these don't get consumer versions.
How many people wouldn't give >$1200 for a card like this to be put inside a tiny and silent HTPC?
Yeah,
I think Nvidia is missing a trick for the consumer market, and maybe to a certain extent mining although cost comes into that even more.

On the plus side it is possible to undervolt the consumer products and still achieve base clocks, Tom's Hardware managed to take the 1060FE down to 61W and still be stable with base clocks in games.
So that gives around 3.8Tflop at 61W.
Shame they never did the same exhaustive test with the 1070 or 1080 or TitanXP, but it must had taken an extreme amount of time to do the full test they ran.
And as you say with the reduced voltage the 1070/1080 can make a great small and silent card if it was designed around that objective, surprised none of the AIB's are considering it.
Smallest card for Pascal so far is the EVGA with one fan but that is a 1060 with normal Pascal performance window.
Cheers
 
Such a shame that cards like these don't get consumer versions.
How many people wouldn't give >$1200 for a card like this to be put inside a tiny and silent HTPC?

Nobody.

For HTPC functions you're much better off with something like a shield tv or intel nuc. You can still use nvidia gamestream or steam inhome streaming if you want to game too.

Unless IQ is so important to you that you don't want to stream your games, but if that is the case why buy anything but the fastest gpu and take the slightly less form factor and slightly more noise for granted? Or just pull a hdmi cable from the room you got your pc in to the room you got your tv in and have no noise or ugly box in sight at all.
 
Tegra 'Parker' GFXBench 4.0 Results

Car Chase Offscreen : 2385 Frames (40.4 Fps)
1440p Manhattan 3.1.1 Offscreen : 2324 Frames (37.5 Fps)
1080p Manhattan 3.1 Offscreen : 4094 Frames (66.0 Fps)
1080p Manhattan Offscreen : 5027 Frames (81.1 Fps)
1080p T-Rex Offscreen : 10265 Frames (183.3 Fps)

https://gfxbench.com/resultdetails.jsp?resultid=hqNQqQ6fR0yzfqdvNKT97w

Looks like 30-50% faster than the Shield TV, so that GPU is probably working in the 1.4GHz range?
Interesting to see Android being installed in a system with a SoC that doesn't seem to want anything to do with Android. Could be a Shield TV successor.


Nobody.

For HTPC functions you're much better off with something like a shield tv or intel nuc. You can still use nvidia gamestream or steam inhome streaming if you want to game too.

You're assuming everyone wanting to play PC games in the living room would have a high-performance desktop elsewhere in the house, with ethernet connection between this other room and the living room, and would be fine with in-home streaming's latency, H264 compression artifacts, loss of performance and loss of sound quality.

Those are a lot of assumptions.

I'm a big fan of In-Home Streaming and have used it for many hours, but if money wasn't a problem I'd definitely prefer to have a high-performance HTPC in the living room and play the games locally.

There's a reason why the Nano was popular, even though it lacked important HTPC credencials like HDMI 2.0, HDCP 2.2 and HEVC decoding.
 
volta has 128 bit core ?

p.2-(c) 4-Wide SIMT lane detail
http://research.nvidia.com/sites/default/files/publications/Gebhart_MICRO_2012.pdf

gm104 has 5.2b transistors and 2048 core.
Xavier has 512 core(128bit).

512*4(128/32)=2048.

That publication says that the simulated GPU is loosely modeled off of Fermi, although interestingly its execution unit arrangement ends up being similar to Maxwell's.

Nothing here talks about 128-bit "cores", it still refers to their SM as 32-wide SIMT. So each of the 4 ALU lanes in the cluster would correspond to a different thread.
 
Looks like 30-50% faster than the Shield TV, so that GPU is probably working in the 1.4GHz range?
Interesting to see Android being installed in a system with a SoC that doesn't seem to want anything to do with Android. Could be a Shield TV successor.

1.5GHz afaik. Even Renesas has Gfxbench results for its RCar H3 SoC as an example, which is for automotive exclusively too. Mass production won't start before March 2018 for the last and since sampling is projected for late 2017 for Parker, if Samsung, QCom & Apple keep the same cadence in late 2017 you're going to see that kind of performance in high end smartphones of the time, whereby they'd be extremely lucky with all the ASIL certicifcations and the likes if they manage to enter mass production for it even in 2018.

Anyway I recall NV claiming 50% higher performance compared to the Erista GPU and the results are along that line. Τhe most interesting aspect of the Parker GPU isn't unfortunately visible from those results and that would be perf/mW. Any possible minor architectural refinements aside, going from 20SoC to 16FF+ should have made quite a difference.
 
Last edited:
I have more thrilling Tegra 4 / Tegra Note 7 observations to share. :D

I have been sparsely playing a port of Return to Castle Wolfenstein called RTCW Touch. Probably the best shooter on Android, but that is beside the point lol.

Most of the time it runs very well, but sometimes it struggles. Yeah it's struggling with a Quake 3 powered game. The frame rate stutters and the audio even breaks up a bit. I seems to happen when you look in the direction where most if the map goes. So it seems like an occlusion culling thing. Need moar fillrate. Fascinating!!!!!!!

With this and KOTOR, I want to put Tegra 4 in the class of something like Geforce 3. Let's call it Geforce FX 5200. That might be a disservice to NV34 because NV34 can at least run FP32. Heh heh heh.
 
Last edited:
Tegra 4's GPU has more horsepower than GeForce 4 Ti... Why the hell would it have issues with RTCW?

It has to be either API/OS issues, driver issues or the port. The Cortex A-15 is decent enough as well. People ran RTCW on Pentium 3s...
 
Tegra 4's GPU has more horsepower than GeForce 4 Ti... Why the hell would it have issues with RTCW?

It has to be either API/OS issues, driver issues or the port. The Cortex A-15 is decent enough as well. People ran RTCW on Pentium 3s...
I know right! But yeah it stutters. Geforce 4 Ti would whip it at KOTOR too. Seems fillrate / bandwidth related to me.
 
Maybe the CPU doesn't keep up? (and the port could be a bit better)
I remember so much the forest scene in Medal Of Honor : AA, and most of Soldier of Fortune II, these make the Quake 3 engine really tank. But RTCW had more traditional "corridor" set ups or was careful with the outdoors environments. Well, I couldn't nearly finish SoF II at all, I would always get shot in that Columbian forest or wherever it is.
When the first Far Cry came out, it was really amazing. The game was overall fairly demanding but kept up with the distances and zillions of detail, so it was likely faster than the Quake 3 engine at doing what it did.

Quake 3 engine liked CPU memory bandwith too.
 
Last edited:
Yeah who knows. It is usually running about 60fps. Vsync appears enabled. Here are some shots of a slowdown area.

Screenshot_2016-10-19_15-08-00.png Screenshot_2016-10-19_15-07-44.png Screenshot_2016-10-19_15-07-26.png Screenshot_2016-10-19_15-07-10.png
Screenshot_2016-10-19_15-12-54.png
It is definitely running with 16 bit Z btw. Z fighting with distance.
Screenshot_2016-10-19_15-17-21.png
Resolution and color depth do nothing here.
 
Last edited:
That brings back fond memories, especially that courtyard. With the only other difference that floor textures could use a healthy portion of AF or better anything starting with Tegra K1.
 
Custom Tegra for the NX:

Nintendo Switch is powered by the performance of the custom Tegra processor. The high-efficiency scalable processor includes an NVIDIA GPU based on the same architecture as the world’s top-performing GeForce gaming graphics cards.

The Nintendo Switch’s gaming experience is also supported by fully custom software, including a revamped physics engine, new libraries, advanced game tools and libraries. NVIDIA additionally created new gaming APIs to fully harness this performance. The newest API, NVN, was built specifically to bring lightweight, fast gaming to the masses.

https://blogs.nvidia.com/blog/2016/10/20/nintendo-switch/

So still Tegra Maxwell?
 
Back
Top