NVIDIA Tegra Architecture

mboeller · Jan 7, 2013

OlegSH said:
How do you came to such conclusion?
Maybe you know something we don't?
These 72 FPUs should be clocked at 650-700 Mhz considering 20x fp performance over Tegra2 from leaked slide, it's roughly 100 Gflops, should be enough to keep up with something like SGX554MP4 in iPad4 in GLB2.5

What? 100GFlops?

quoting Anandtech:

NEW: the shaders aren't unified, the majority are 20-bit pixel shader cores though. No idea on the ratio yet.

Disappointing!

No wonder they showed no GLB2.5 scores.

Ailuros · Jan 7, 2013

mboeller said:
What? 100GFlops?

quoting Anandtech:

Disappointing!

No wonder they showed no GLB2.5 scores.

Theoretical peak FLOPs have what exactly to do with pixel shader precision? I would had bet on at least USC ALUs; as it stands I'm betting 4*Vec4 FP20 PS ALUs and 2*Vec4 VS ALUs, 1 TMU/Vec4 ALU and of course coverage sampling AA

Peak theoretical GFLOPs might even be as high IF frequency is as high as he claims it to be; if performance now should not break even with an iPad4 I for one won't be the one that will have expected too much

mboeller · Jan 7, 2013

Ailuros said:
Theoretical peak FLOPs have what exactly to do with pixel shader precision? I would had bet on at least USC ALUs; as it stands I'm betting 4*Vec4 FP20 PS ALUs and 2*Vec4 VS ALUs, 1 TMU/Vec4 ALU and of course coverage sampling AA

16bit Flops then

Ailuros · Jan 7, 2013

mboeller said:
16bit Flops then

A large portion FP20 and a smaller on FP32 for the VS ALUs; however since applications don't require strictly only FP32 everywhere: floating point is still floating point irrelevant of precision.

OGL_ES is quite specific where lowp, mediump and highp should be used; as long as their vertex shaders are inevitably FP32 they've got the majority of highp recommendations covered. The question now is what the OGL_ES3.0 requirements exactly will be; by the sound of Tegra4 I'm willing to bet that the minimum won't be FP32

Xmas · Jan 7, 2013

Ailuros said:
The question now is what the OGL_ES3.0 requirements exactly will be; by the sound of Tegra4 I'm willing to bet that the minimum won't be FP32

The spec is public. In GLSL ES 3.00 highp support is required in both vertex and fragment shaders and must be FP32, while mediump must be at least FP16.

Ailuros · Jan 7, 2013

Xmas said:
The spec is public. In GLSL ES 3.00 highp support is required in both vertex and fragment shaders and must be FP32, while mediump must be at least FP16.

Meaning that the ULP GF in Tegra4 won't reach OGL_ES3.0?

Nebuchadnezzar · Jan 7, 2013

Ailuros said:
Meaning that the ULP GF in Tegra4 won't reach OGL_ES3.0?

That already seems out of question due to lack of unified shaders.

Ailuros · Jan 7, 2013

Nebuchadnezzar said:
That already seems out of question due to lack of unified shaders.

Great

Well is there anything really interesting besides the i500 to the whole Tegra4 enchilada or am I the only one here that seems bored to death? Oh wait there's project shield...

arggghhh

Xmas · Jan 7, 2013

Nebuchadnezzar said:
That already seems out of question due to lack of unified shaders.

How so? No API feature requires unified shaders.

Deleted member 13524 · Jan 7, 2013

Xmas said:
How so? No API feature requires unified shaders.

Can OpenCL be done without unified shaders?

fellix · Jan 7, 2013

OCL can run on large variety of hardware, as long as there's proper vendor run-time support.

ams · Jan 7, 2013

mboeller said:
What? 100GFlops?

quoting Anandtech:

Disappointing!

No wonder they showed no GLB2.5 scores.

Yes, this is surprising (and one has to wonder if NVIDIA is saving their more modern and more powerful GPU architecture to go up against next gen Rogue/Mali/Adreno/Radeon, etc.), but at the end of the day it now makes sense where the 6x GPU performance improvement comes from when comparing Tegra 4 vs. Tegra 3. Compared to Tegra 3, it appears (but has not yet been confirmed) that Tegra 4 has 6x more pixel shader execution units (ie. 48 pixel shader execution units vs. 8 pixel shader execution units) and 6x more vertex shader execution units (ie. 24 vertex shader execution units vs. 4 vertex shader execution units), for a grand total of 72 pixel/vertex shader execution units in Tegra 4 vs. a grand total of 12 pixel/vertex shader execution units in Tegra 3.

Up to 6x GPU performance improvement in Tegra 4 vs. Tegra 3 is quite significant, and should be good enough to put NVIDIA at or near the top of the heap with respect to GPU performance compared to the highest performance mobile/handheld SoC's currently on the market today. Top that off with what is arguably the highest CPU performance in a mobile/handheld SoC, in addition to some of the new features introduced on Tegra 4, and NVIDIA can legitimately claim that Tegra 4 is the world's fastest mobile processor.

OlegSH · Jan 7, 2013

First tablet - http://www.theverge.com/2013/1/7/38...combines-tegra-4-android-thin-body/in/3610341
Resolution clearly hints that GPU perf is good:smile:

ltcommander.data · Jan 7, 2013

OlegSH said:
First tablet - http://www.theverge.com/2013/1/7/38...combines-tegra-4-android-thin-body/in/3610341
Resolution clearly hints that GPU perf is good:smile:

I'm not sure manufacturer's choice in resolution is indicative of performance. Tegra 3 was weaker than the SGX543MP2 in the iPad 2, yet that didn't stop them from combining it with higher resolution 1080p displays.

wishiknew · Jan 8, 2013

Is this going to work power wise? I thought Anandtech's last article had the A15 in exynos sucking up a lot of juice.

silent_guy · Jan 8, 2013

wishiknew said:
Is this going to work power wise? I thought Anandtech's last article had the A15 in exynos sucking up a lot of juice.

It's also a question what clock you're going to run this thing at, right? Is it a given that an A15 does worse in terms of perf/W and perf/mm2? I imagine that, with very high maximum clock speeds, you can save quite a bit of power by clocking down and lowering VDD as well.

As long as the possibility is there to reasonably trade off performance vs. power consumption, it doesn't have to be a problem.

3dcgi · Jan 8, 2013

ToTTenTranz said:
Can OpenCL be done without unified shaders?

Yes, though you likely won't be able to use all of your FLOPS.

Lazy8s · Jan 8, 2013

nVidia boasted that Tegra 3 enabled new-era PCs, equipped with Windows RT, that would never use a fan/heatsink like conventional PCs.

Although Project Shield is a different product with different priorities and even though the fan/heatsink is mostly a non-issue, it's at least a little amusing that they'd be the company to release a mobile device needing one.

NathansFortune · Jan 8, 2013

Late to the discussion, but Tegra 4 doesn't have an integrated modem and it doesn't have a SM4 GPU. What the fuck are they playing at?!? What were they doing for the past year. How hard can it be to stuff 80 or so Kepler ALUs into this and integrate the baseband!

Now comes the long wait for Tegra 5 where we just have to hope Nvidia become a real competitor to Qualcomm, because right now they aren't even turning up to the smartphone race with quad A15 and no integrated baseband.

Laurent06 · Jan 8, 2013

NathansFortune said:
How hard can it be to stuff 80 or so Kepler ALUs into this and integrate the baseband!

Do you think it's easy to integrate IP that comes from another company you just bought? Just ask Intel about integrating Infineon baseband on Atom SoC

NVIDIA Tegra Architecture

mboeller

Ailuros

Epsilon plus three

mboeller

Ailuros

Epsilon plus three

Xmas

Porous

Ailuros

Epsilon plus three

Nebuchadnezzar

Ailuros

Epsilon plus three

Xmas

Porous

Deleted member 13524

Guest

fellix

ams

OlegSH

ltcommander.data

wishiknew

silent_guy

3dcgi

Lazy8s

NathansFortune

Laurent06

Similar threads