NVIDIA Tegra Architecture

xpea · Jun 23, 2014

Rys said:
Well couldn't you also say the same about Qualcomm, if you choose a context that suits a narrative you want to weave? All of that money, all of the physical design chops, etc, and S805's 3D performance is slower than K1's. A day late and a dollar short. Pathetic.

Not really sure why you just took the hatchet to Intel there.

In my mind, the choice of intel is obvious. Nobody has more resource, nobody has more money, nobody has better transistors, and still, for many generations, intel GPUs s*ck, both in terms of hardware and software. How to explain it ?

edit: just imagine for few seconds if Nvidia or ATI had access to intel fabs. What a beast will be a 350mm2 Maxwell on 14nm Finfet that will ship for Xmas ? how powerful it will be ? how many SMX in such setup ? how many TFLOPS ?

silent_guy · Jun 23, 2014

Rys said:
Well couldn't you also say the same about Qualcomm, if you choose a context that suits a narrative you want to weave? All of that money, all of the physical design chops, etc, and S805's 3D performance is slower than K1's. A day late and a dollar short. Pathetic. Not really sure why you just took the hatchet to Intel there.

I think the difference is that they are currently in a 'nobody ever got fired for choosing Qualcomm' situation. They can afford to have a weaker offering for once.

I've been expecting Intel to knock one out of the ballpark for years, but they never deliver. It's disappointing for such an otherwise awesome company.

Ailuros · Jun 23, 2014

ToTTenTranz said:
Was it an idiotic example? Yes.
Who gave that example? You.

My only point was that you were exaggerating too much with the comparisons. TK1 isn't that much faster than the newer Exynos SoCs (not an 10x order of magnitude like you suggested) and both can find a place in the market.

Which doesn't answer ALU utilisation under only compute scenarios does it? And no I nowhere suggested or implied a magnitude of 10x, not even close to it.

I don't know and you don't know either.
What I do know is that the Exynos versions actually offer borderline identical daily usage experiences and SoCs vary according to regions (mostly Asian countries without LTE). Do they move more snapdragon than exynos units? Much more? A little more? Do you have any idea on the proportions?
Plus, we might all be in for a surprise looking at how the next Exynos compares to S805.

It's fine that you're concluding what I know and don't know, but that doesn't mean that it reflects the truth either.

What kind of a suprise exactly? From the so far supplied data the S805 doesn't seem to even come close to K1 in terms of graphics performance, so the surprise must be something that's a bit more lukewarm then something lukewarm for the Exynos parts.

I'm well aware of that discussion since I was the one who brought it up in that thread.
JohnH's position is that "it could be that the tesselation increases geometry input".
Regardless, I don't know if he was mentioning the adaptive tesselation that can be done in modern hardware or X360's/R600's fixed function units.
What I gathered from his discussion with sebbi, it's more of a question of developers learning how to use efficiently.

Why don't you ask him yourself what he exactly meant?

I'll stick to the opinion of all major IHVs, which is the opposite.

You're entitled to it in the very least.

As stated by more than one very well-educated person in this forum, this concern about the tesselation hardware occupying "too much area" seems to be yours alone.

The overhead is in the ALUs themselves combined with the interconnect logic between pipelines, buffering for geometry etc etc between all corners of the core in order to adaptively tessellate. Nobody denied that part anywhere.

I'm still waiting for any of you "educated folks" to teach this village idiot here (oui moi) why the Adreno420 has barely 40% more arithmetic efficiency compared to an Adreno330 and no I don't expect any half way educated reply to that one either.

From what I could tell, Microsoft wants to erase/blur the line between ARM and x86 experiences, so a Snapdragon device could very well become a "real windows machine".

That doesn't make a ULP SoC less of an ULP SoC in the end. IMHO to each his own; in that part of the market the power constrainst are too high for a proper windows experience. You can easily get there with a Baytrail alike SoC or even a Tegra K1 (if eventually Microsoft also allows ARM into windows above RT). A more "smartphone centric" SoC like the S805 sounds too weak for that, exactly as a K1 sounds too overpowered for a smartphone platform.

silent_guy · Jun 23, 2014

Exophase said:
And even with ART you're still never going to get close to the performance you could get using NEON intrinsics or assembly.

That's what surprised me the most: I have never dabbled in Java, but I've heard arguments for over a decade that Java JIT and compilation is really, really good now. I thought that, other than a very specific class of games, even a 50% performance difference shouldn't outweigh the benefits of what Java allegedly brings.

But the game engine thing definitely makes the most sense.

Blazkowicz · Jun 23, 2014

The biggest argument is against garbage collected languages in general (that includes javascript too), when the garbage collector kicks in it will chew CPU resources which for a game causes unpredictable, random and significant slow downs.

A workaround could be e.g. to be able to trigger the garbage collector manually, every frame or at wanted intervals.
I'm sure there are many developer discussions, blog posts and rants about that.

Picao84 · Jun 23, 2014

ToTTenTranz said:
nVidia won the designs for the Xiaomi Mipad and Nexus 9. These two together will probably take a very large chunk of the Android tablet market.
Next-gen Asus Transformer and Toshiba models are most probably a given, too (three generations of Tegra in a row so far).
Almost nothing sounds like quite an understatement.

I will advise you to read my post again, since neither Xiaomi Mipad or Nexus 9 are MOBILE PHONES

Exophase · Jun 23, 2014

silent_guy said:
That's what surprised me the most: I have never dabbled in Java, but I've heard arguments for over a decade that Java JIT and compilation is really, really good now. I thought that, other than a very specific class of games, even a 50% performance difference shouldn't outweigh the benefits of what Java allegedly brings.

Not all Java is the same.. there's a lot of different VMs and they all have different performance. Android goes further beyond this, because Dalvik isn't even a JVM, it's a new/custom bytecode. And that means they were starting completely from scratch with their JIT technology, although it was supposed to be easier to optimize than JVM. But it took years before Dalvik even had a JIT, yet alone a good one.

I'm sure there's some real case out there where Java code, given the right JVM, was as fast or even faster than its C++ equivalent. But it's really hard to get a good test case for this, it'll almost always be dealing with microbenchmarks where someone converted Java to C++ or vice-versa, and probably not a case where the code was written from the start with its target language in mind. This can heavily skew the results - either because the C++ is written in a Java "style" and suffers for it, or the Java is written to avoid typical Java-idioms for performance reasons. I've seen Javascript code that wasn't that slow, but it had to be turned into a horrible mess first (asm.js is kind of an extension of this).

Personally I'm kind of amazed that people treat big performance differences due to programming languages as no big deal on devices with constrained power budgets. 50% slower could mean 75% worse power consumption, while SoC manufacturers have been fighting over much smaller incremental improvements. I think the problem is that app developers know that people will generally use their app even if it kills their battery, because there's no exact alternative that doesn't. There are some rare cases where this doesn't hold though, for the app I sell performance is everything.

silent_guy said:
But the game engine thing definitely makes the most sense.

That's actually sort of a separate but good point - a lot of middleware needs NDK. But it's not just going to be that, a lot of the app-specific code will benefit from having the same portability (ironic that going Java makes it less portable..)

ams · Jun 23, 2014

Picao84 said:
I will advise you to read my post again, since neither Xiaomi Mipad or Nexus 9 are MOBILE PHONES

Mobile phones take longer to develop than WiFi tablets (and Tegra K1 WiFi tablets are only just now nearing launch). Tegra K1 should make it's way into some high end smartphones before the end of the year. But again, until VoLTE becomes commonplace throughout the industry (which will be some years in the future), most OEM's will have to rely on Qualcomm's modem tech for the USA market. So penetrating the mobile phone market is easier said than done.

Alexko · Jun 23, 2014

ams said:
Tegra K1 should make it's way into some high end smartphones before the end of the year.

Based on what?

JohnH · Jun 24, 2014

ToTTenTranz said:
I'm well aware of that discussion since I was the one who brought it up in that thread.
JohnH's position is that "it could be that the tesselation increases geometry input".
Regardless, I don't know if he was mentioning the adaptive tesselation that can be done in modern hardware or X360's/R600's fixed function units.
What I gathered from his discussion with sebbi, it's more of a question of developers learning how to use efficiently.

Actually I've yet to see a non contrived case of tessellation use that didn't significantly expand the amount of input geometry. This isn't a function of developers not knowing what they're doing, it's a function of current HW and API design, not insurmountable, but that's where we are at the moment.

rpg.314 · Jun 24, 2014

Rys said:
Well couldn't you also say the same about Qualcomm, if you choose a context that suits a narrative you want to weave? All of that money, all of the physical design chops, etc, and S805's 3D performance is slower than K1's. A day late and a dollar short. Pathetic.

Not really sure why you just took the hatchet to Intel there.

Qualcomm is the king of the hill in mobile. Intel has to pay other to take it's chips. The more desperate IHV is expected to go for broke. Qualcomm can stay neck and neck and win sockets based on deep customer relations, engg support, ie, non-perf/W reasons.

Intel is the one that needs to decimate everyone to start selling it's chips.

Intel is a node ahead of every one. Next year broxton will be 14 nm while S810 is 20 nm. That's about the best case for process lead. It's design must have begun around 2011, when Intel truly realized the trouble they were in. If Intel can't bring it's A game to broxton, they never will.

rpg.314 · Jun 24, 2014

Lazy8s said:
Moorefield runs the G6430 faster than the A7 did, and the A7 has sat at the top of its class for graphics performance as far as the results I've seen.

And, even against the S805, I'm not so quick to trust the relevance of synthetic benchmark scores: a prior PowerVR versus Adreno head-to-head with the variants of the Galaxy S4, whose synthetic benchmark scores suggested a performance advantage for the S600 over the Exynos 5410, didn't agree with many of the real-world games that ran with better frame rates on the Exynos version.

I remember seeing moorefield losing in the Trex/Manhattan benchmarks, not in microbenches. But that's what I recall, could be wrong though.

kalelovil · Jun 24, 2014

rpg.314 said:
Qualcomm is the king of the hill in mobile. Intel has to pay other to take it's chips. The more desperate IHV is expected to go for broke. Qualcomm can stay neck and neck and win sockets based on deep customer relations, engg support, ie, non-perf/W reasons.

Intel is the one that needs to decimate everyone to start selling it's chips.

Intel is a node ahead of every one. Next year broxton will be 14 nm while S810 is 20 nm. That's about the best case for process lead. It's design must have begun around 2011, when Intel truly realized the trouble they were in. If Intel can't bring it's A game to broxton, they never will.

That is if Broxton(Morganfield) isn't delayed, it is aimed at November 2015.

Ailuros · Jun 24, 2014

rpg.314 said:
I remember seeing moorefield losing in the Trex/Manhattan benchmarks, not in microbenches. But that's what I recall, could be wrong though.

I don't recall even seeing any Gfxbench3.0 results; I'd estimate the Moorefield GPU to land around 14-15 fps in Manhattan offscreen. The developer platform Anand tested with an Adreno420/S805 (MDP) is at 17.7 fps.

I assume but would like to stand corrected:

G6430@533MHz/Moorefield = 136.45 GFLOPs FP32 (15.0 fps peak?)
Adreno420@520MHz (?)/S805 = 200.00 GFLOPs FP32 (17.7 fps)
MaliT628MP6@650MHz = 124.80 GFLOPs FP32 (10.1 fps)
GK20A@700MHz/Tegra K1 = 268.80 GFLOPs FP32 (30.0 fps)

There are 3dmark Moorefield and Merrifield results though that were published by Intel.

Xmas · Jun 24, 2014

Exophase said:
Personally I'm kind of amazed that people treat big performance differences due to programming languages as no big deal on devices with constrained power budgets. 50% slower could mean 75% worse power consumption, while SoC manufacturers have been fighting over much smaller incremental improvements. I think the problem is that app developers know that people will generally use their app even if it kills their battery, because there's no exact alternative that doesn't. There are some rare cases where this doesn't hold though, for the app I sell performance is everything.

For a wide range of apps screen and network power draw dwarf whatever the app throws at the CPU, so you don't save that much by optimising for performance. Also, given a certain development budget, using a "slower" language might result in faster code.

Turbotab · Jun 25, 2014

Looks like the tessellation hardware on the K1 & Adreno 420 won't be wasted on Android after all, if running Android L.

Screen_Shot_2014-06-25_at_12.58.53_PM.0_cinema_960.0.png

xpea · Jun 25, 2014

TK1 is the reference platform for AndroidTV box that will ship this fall (AndroidTV demo was done on this hardware at Google I/O 2014)

edit: as a precision to previous message (tesselation and so on), EPIC demo ran on TK1 at Google I/O 2014
http://blogs.nvidia.com/blog/2014/06/25/google-io-tegra-aep-gaming/

edit2: it was all Nvidia k1 at Google I/O: http://blogs.nvidia.com/blog/2014/06/25/tegra-k1-at-google-io/
looks like we will see lot of K1 devices this fall

Exophase · Jun 25, 2014

Xmas said:
For a wide range of apps screen and network power draw dwarf whatever the app throws at the CPU, so you don't save that much by optimising for performance. Also, given a certain development budget, using a "slower" language might result in faster code.

Sure, for a wide range of apps, no one would dispute that. A wide range of apps also barely utilize the GPU, or don't utilize at all, but a lot of emphasis is still put on GPU performance and perf/W. On that note, I think you'd similarly agree that if there was a 50% overhead due to language for GPU drivers it'd be a big deal.

I'm not saying optimize everything blindly here or that less efficient languages aren't fine for plenty of tasks, just that blanket statements about 50% performance not mattering make as much sense as saying that CPU perf dropping 50% doesn't matter. Good programmers will be able to determine when optimization is and isn't suitable. Thing is, a lot of programmers today don't have the faintest idea of performance implications of what they program and are neither aware of performance issues with the language they use nor how to optimize at all.

As for your claim about writing faster code with slower languages because it saves budget time that you can use on better optimization, I could see that being true if optimization with the "faster" languages would mean more platform specific branches. Otherwise I'm skeptical, especially if it comes down to Java vs C++.

xpea · Jun 25, 2014

Google Android TV reference box (based on TK1):

Looks very very compact to me

xpea · Jun 26, 2014

Epic Unreal Engine 4 "Rivalry" demo on TK1 for Google I/O 2014:
https://www.youtube.com/watch?v=X-tAZtbDZ8E#t=61

NVIDIA Tegra Architecture

xpea

silent_guy

Ailuros

Epsilon plus three

silent_guy

Blazkowicz

Picao84

Exophase

ams

Alexko

JohnH

rpg.314

rpg.314

kalelovil

Ailuros

Epsilon plus three

Xmas

Porous

Turbotab

xpea

Exophase

xpea

xpea

Similar threads