NVIDIA Tegra Architecture

ninelven · Jul 13, 2013

Well, he either needs to say,

1) I expect Logan's gpu to be functionally greater than or equal to Kepler.

or

2) Logan's gpu will be functionally inferior to Kepler in these specific ways.

Saying an ultramobile chip is going to be optimized for power consumption is meaningless.

Observe:

I hear Qualcomm's next chip will trade functionality for power efficiency.

I hear Apple's next chip will trade functionality for power efficiency.

I hear Samsung's next chip will trade functionality for power efficiency.

All will certainly be true in some respect. All are equally meaningless.

AlexV · Jul 13, 2013

Take it to PM guys.

Alexko · Jul 13, 2013

Helmore said:
Ailuros is only saying that the GPU in Tegra 5 won't be a 1 on 1 copy of the Kepler architecture. That's what the original argument was about. I don't remember who said it or anything and I'm too lazy to reread this thread, but someone implied that the GPU in Tegra 5 would be a 1 or 2 Kepler SMX units with hardly any alterations. At least no alterations that would affect functionality. That's what started the discussion, with Ailuros saying that that would be absurd.

Can we move on now?

As it is, GK208 already has half the TMUs and half the ROPs that GK107 has, for an identical number of shaders.

Logan might not differ very much from GK208, but as such I doubt it would have the typical unit mix we might expect from "Kepler".

mczak · Jul 14, 2013

Alexko said:
As it is, GK208 already has half the TMUs and half the ROPs that GK107 has, for an identical number of shaders.

Logan might not differ very much from GK208, but as such I doubt it would have the typical unit mix we might expect from "Kepler".

I think you're wrong about the TMUs. As far as I can tell all the basic building blocks are identical, the SMX (including the TMUs) is just about the same. Yes it has CUDA 3.5 capability but that's a pretty minimal change (and gk110 has got that already too).
Number of ROPs being different also isn't an architecture change at all, quite the contrary, ROPs were always linked to memory partitions for nvidia chips ever since at least G80 (I suspect since forever actually) and hence gk107 having only one 64bit MC instead of two naturally only gets half the ROPs. This is just part of ordinary Kepler family scaling.
(Ok so that single ROP/MC partition has 4 times the cache as a gk107 ROP/MC but that again is a pretty trivial change.)
To be honest though I have no idea if Logan will look a lot like gk208 or more like a really distant relative of Kepler. AMD managed to get GCN (1.1) down to the required power levels without really changing the architecture (though at least the frontend is simpler with only 1/4 prim/clock throughput so there are indeed some changes but overall it's still very very similar to other GCN 1.1 chips), so there's no reason nvidia couldn't do it. It was a bit of a stretch though for Kabini (the clocks are really low for the 3.9W part but as said nvidia might want to do higher TDP parts too), and the most simple Kepler part (with one SMX) would still be bigger so something with more changes might be more efficient but obviously there are benefits for being able to use the same arch too. (Intel did that too in fact but I have no idea how it fares yet.)

Alexko · Jul 14, 2013

mczak said:
I think you're wrong about the TMUs. As far as I can tell all the basic building blocks are identical, the SMX (including the TMUs) is just about the same. Yes it has CUDA 3.5 capability but that's a pretty minimal change (and gk110 has got that already too).
Number of ROPs being different also isn't an architecture change at all, quite the contrary, ROPs were always linked to memory partitions for nvidia chips ever since at least G80 (I suspect since forever actually) and hence gk107 having only one 64bit MC instead of two naturally only gets half the ROPs. This is just part of ordinary Kepler family scaling.
(Ok so that single ROP/MC partition has 4 times the cache as a gk107 ROP/MC but that again is a pretty trivial change.)
To be honest though I have no idea if Logan will look a lot like gk208 or more like a really distant relative of Kepler. AMD managed to get GCN (1.1) down to the required power levels without really changing the architecture (though at least the frontend is simpler with only 1/4 prim/clock throughput so there are indeed some changes but overall it's still very very similar to other GCN 1.1 chips), so there's no reason nvidia couldn't do it. It was a bit of a stretch though for Kabini (the clocks are really low for the 3.9W part but as said nvidia might want to do higher TDP parts too), and the most simple Kepler part (with one SMX) would still be bigger so something with more changes might be more efficient but obviously there are benefits for being able to use the same arch too. (Intel did that too in fact but I have no idea how it fares yet.)

The TMU thing comes from Damien's short article [French]: http://www.hardware.fr/news/13222/nvidia-gpu-gk208-gt-640-630.html
It claims 16 TMUs for GK208, vs. GK107's 32. But yes, the ROP count is consistent with the memory bus.

So yeah, I'd expect NVIDIA to take GK208, maybe remove some cache and reduce primitive throughput as AMD did, but more or less keep everything else the same. I think it's worth remembering that although AMD did get GCN 1.1 into tablets, that's only with 128 shaders, and not low-power enough for phones. Kabini/Temash is a tablet~ultrathin notebook design.

Logan is a phone~tablet design. Then again, if Logan is a 20nm chip, that changes things.

silent_guy · Jul 14, 2013

Going back at the differences between GF100 and GF104, I remember a different SM architecture and different throughput of the tex units. And that's off the top of my head without going back to read up on the subject.

Is there anyone who considers them to be of a different architecture family?

If not, I have a hard time seeing what this whole discussion is about.

Alexko · Jul 14, 2013

silent_guy said:
Going back at the differences between GF100 and GF104, I remember a different SM architecture and different throughput of the tex units. And that's off the top of my head without going back to read up on the subject.

Is there anyone who considers them to be of a different architecture family?

If not, I have a hard time seeing what this whole discussion is about.

That's semantics, indeed. I think the interesting question is just what will they change?

Ailuros · Jul 14, 2013

ninelven said:
Well, he either needs to say,

1) I expect Logan's gpu to be functionally greater than or equal to Kepler.

Again functionality will be roughly the same, while it'll still be an architectural derivative.

or

2) Logan's gpu will be functionally inferior to Kepler in these specific ways.

I haven't seen ONE SFF mobile GPU yet where TMUs aren't one way or another connected to the ALUs; in other words you're typically not going to get the peak FLOP amount out of them once you're texturing. It's been mentioned there in one of the links above and it's exactly hairsplitting either. If the ULP GF in Logan shouldn't follow that thrend it would be quite a big surprise.

Saying an ultramobile chip is going to be optimized for power consumption is meaningless.

I didn't say that; it's just highly convenient to minimize it to that level and yes at this point it's pretty clear also that you don't have a single intention to carry out a civilized debate ON TOPIC.

mczak · Jul 14, 2013

Alexko said:
The TMU thing comes from Damien's short article [French]: http://www.hardware.fr/news/13222/nvidia-gpu-gk208-gt-640-630.html
It claims 16 TMUs for GK208, vs. GK107's 32. But yes, the ROP count is consistent with the memory bus.

Oh you're right I totally missed that. I suspect Damien got that simply from the official texture fillrate specs on nvidia's site. That also solves the "mystery" that gk208 is noticeably smaller than gk107 despite having all the same units (except one ROP/MC). Looks like half the TMUs hits a better balance even on the desktop FWIW judging by the benchmarks I've seen so far (I think that's not really surprising as nvidia got quite a bit higher TMU:ALU ratio compared to amd with gk1xx, though granted with gk208 it would be quite a bit less now, except for fp16).

So yeah, I'd expect NVIDIA to take GK208, maybe remove some cache and reduce primitive throughput as AMD did, but more or less keep everything else the same. I think it's worth remembering that although AMD did get GCN 1.1 into tablets, that's only with 128 shaders, and not low-power enough for phones. Kabini/Temash is a tablet~ultrathin notebook design.

Logan is a phone~tablet design. Then again, if Logan is a 20nm chip, that changes things.

Well Tegra 4 didn't make it into phones neither so are you sure Logan will?
If nvidia really keeps Kepler the same in Logan, there's imho no way they are going to have more than 1 SMX. And even with just one clocks would need to be low, and even then the viability of that in a phone seems questionable to me (on 28nm at least).

Ailuros · Jul 14, 2013

http://gfxbench.com/device.jsp?benchmark=gfx27&D=Xiaomi+MI+3

Xiaomi M3 has a T4 (CPU1800/GPU605) and the ZTE U988S.

mczak · Jul 14, 2013

I would classify the Xiaomi M3 though with a 5.5 inch screen as a phablet not a phone. The ZTE is supposed to be smaller though (not small!). I guess reviews will tell how feasible T4 in these form factors really is.

Ailuros · Jul 14, 2013

mczak said:
I would classify the Xiaomi M3 though with a 5.5 inch screen as a phablet not a phone. The ZTE is supposed to be smaller though (not small!). I guess reviews will tell how feasible T4 in these form factors really is.

I'd say that it's more important how large a battery of a device is then what we want to call it; my recently acquired 5" thingy is a budget product and carries a 2000mAH battery. In the case of T4 smartphones I assume batteries should be in the >3000mAH league (like past HTC One and HTC One + smartphones); the result should be feasable in a smartphone or superphone or phablet, however I don't think anyone would expect that you could spend several days without charging it. Worst case scenario with quite some heavy usage is once a day usually.

ninelven · Jul 15, 2013

Ailuros said:
...debate ON TOPIC

It is rather impossible to have a debate over non-falsifiable statements.

DSC · Jul 24, 2013

http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler

NVIDIA took its Ira demo, originally run on a Titan at GTC 2013, and got it up and running on a Logan development board. Ira did need some work to make the transition to mobile. The skin shaders were simplified, smaller textures are used and the rendering resolution is dropped to 1080p. NVIDIA claims this demo was done in a 2 - 3W power envelope.

If anyone still thinks this isn't a Kepler GPU.......

lanek · Jul 24, 2013

I allways like the Nvidia presentation for Tegra / Nvidia gpu, specially the graph lol .

othe than that, i find it really promising. ( But other was look too before we get it seeing running )

mczak · Jul 24, 2013

DSC said:
http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler
If anyone still thinks this isn't a Kepler GPU.......

The diagram though is most probably just taken from some ordinary desktop Kepler. Because it still shows the 16 TMUs, even though it is most likely just 8 per SMX like gk208. Well if it really is that close to gk208 that is. Could be though but I really don't see them achieving those 1Ghz clocks they are claiming at 2W (FWIW gt630 gk208 is 25W TDP so ok that's 2 SMX but still without any sacrifice in clocks I just don't see them hitting useful power levels if that's a kepler smx, even if the non-smx parts are cut down).
Edit: actually they don't claim the 2W at 1Ghz. There's a comparison to 8800GTX flops which it can exceed which would require the 1Ghz, the 2W claim is separate. So more than likely to get the 8800GTX flops is probably closer to 10W rather than 2W, and the actual achievable flops (in tablets/smartphones) is quite a bit lower (unless of course you'd have a device like shield where you could indeed have 10W TDP).

Anyone knows what that "closest mobile competitor" is? Bay Trail should already offer all the same features (minus CUDA) much earlier (though is going to be slower), and I thought Rogue's should as well - though not sure if anyone is going to sell the dx11 versions. No idea about the featureset of adreno 4xx or Mali's.

Alexko · Jul 24, 2013

Toshiba's latest Tegra 4-powered tablet is available from Amazon, but 4 out of 7 (at this time) reviews mention overheating problems: http://www.amazon.com/Toshiba-Excit...iewpoints=1&sortBy=bySubmissionDateDescending

I guess this explains why we're not seeing many Tegra 4s in phones.

Blazkowicz · Jul 24, 2013

Does the 2560x1600 display heat much?, and then there's the Cortex A15 CPUs.
Tegra or not that's maybe what you get when using too high end hardware. Maybe they should make it a bit thicker, like the Surface Pro.
Beware of what you buy and what you wish for, lol.

Laurent06 · Jul 24, 2013

Alexko said:
Toshiba's latest Tegra 4-powered tablet is available from Amazon, but 4 out of 7 (at this time) reviews mention overheating problems: http://www.amazon.com/Toshiba-Excit...iewpoints=1&sortBy=bySubmissionDateDescending

I guess this explains why we're not seeing many Tegra 4s in phones.

Or perhaps it explains why Toshiba isn't successful at making tablets? IMHO before drawing conclusions you'd need a larger sample of reviews, and reviews from other designs based on Tegra 4.

OlegSH · Jul 24, 2013

http://www.youtube.com/watch?v=TPtgsrv5xqM
GLB2.7 - 18 AVG FPS at 1080p offscreen with 1W power

http://blogs.nvidia.com/wp-content/uploads/2013/07/NVIDIA_Siggraph_Mobile_HR_1.jpg
^High res spoiler - http://gfxbench.com/device.jsp?benchmark=gfx27&D=NVidia+Tegra+Note+Premium
+
Some nice demos:
https://www.youtube.com/watch?v=Vx0t-WJFXzo
https://www.youtube.com/watch?v=fpvfTuaO75k

NVIDIA Tegra Architecture

ninelven

PM

AlexV

Heteroscedasticitate

Alexko

mczak

Alexko

silent_guy

Alexko

Ailuros

Epsilon plus three

mczak

Ailuros

Epsilon plus three

mczak

Ailuros

Epsilon plus three

ninelven

PM

DSC

lanek

mczak

Alexko

Blazkowicz

Laurent06

OlegSH

Similar threads