NVIDIA Tegra Architecture

More advanced than desktop Kepler too, I guess Maxwell will have ASTC support in the texture units?

K1 is DX11.2. What would any Maxwell standalone GPU (outside the future Parker ULP GPU) have ASTC for?
 
It's obvious Nvidia will make announcements at Mobile World Congress in February about T4i, wait until then.

Though I wished Nvidia would replace the A9 with the A53, if this table is correct, it's faster than A9 and has 64bit ARMv8 support too.

http://www.anandtech.com/show/7573/...ed-on-64bit-arm-cortex-a53-and-adreno-306-gpu

DOeaWmN.png

It's only faster in 64-bit mode. When executing old 32-bit code, it's slower.
 
More advanced than desktop Kepler too, I guess Maxwell will have ASTC support in the texture units?
In SOCs for smartphones/tablets yes, others remain to be seen.
(For the record, just about the only visible feature change from Ivy Bridge Graphics to Bay Trail graphics is the latter supports ETC, something Haswell graphics still does not.)
 
OpenGL gaming on Linux/Android/SteamOS?

And then there are technical merits...

Of course are there a lot of advantages, but OGL gaming on niche use cases (if until the landscape changes) doesn't sound all too convincing for NV to implement ASTC in something like a desktop GPU.
 
I'm afraid most of you are missing viable points if it comes to GLB2.7. The specific benchmark has a ton of alpha test based foliage.
Well I'm not entirely sure it's really completely bottlenecked by that, but regardless it's the only datapoint we have right now. And even if z tests got a proportionally larger increase compared to Tegra 4 than some other parts of the chips (which I don't know as it's very difficult to keep track of these usually undisclosed things), shader gflops increased big time too, so I don't think it's unreasonable to expect it to be a lot faster in general. I agree though certainly other benchmarks need to be used too to judge the worthiness of the competitors.

Despite that nobody missed me (which is nice honestly) and since I am a man of my word: I must admit in public that I have to eat my words and they truly seem to have integrated a full Kepler cluster into GK20A; I don't care if it's almost "full" either since I'm generous enough to stand by my mistakes.
Well the SMX seems to be a really full Kepler SMX (as in GK208, so half the TMUs of older ones), though we don't know yet if GPC etc. survived more or less the same. They might be redesigned though maybe just dropping all the stuff needed for scaling things up to multiple GPCs (which we know nvidia did) is all that's really changed.
 
Well I'm not entirely sure it's really completely bottlenecked by that, but regardless it's the only datapoint we have right now. And even if z tests got a proportionally larger increase compared to Tegra 4 than some other parts of the chips (which I don't know as it's very difficult to keep track of these usually undisclosed things), shader gflops increased big time too, so I don't think it's unreasonable to expect it to be a lot faster in general. I agree though certainly other benchmarks need to be used too to judge the worthiness of the competitors.
It doesn't take much to see what the bottleneck is if you look at it. On a side note their GPU slide might reveal early GLB3.0 results. G6430 seems to be at 11fps and Adreno330 at 9fps. GFXbench 3.0 seems to be close & the public will forget about the 2.x benchmarks fairly quickly.
Well the SMX seems to be a really full Kepler SMX (as in GK208, so half the TMUs of older ones), though we don't know yet if GPC etc. survived more or less the same. They might be redesigned though maybe just dropping all the stuff needed for scaling things up to multiple GPCs (which we know nvidia did) is all that's really changed.
It's DX11.2 & supports ATSC. If you're asking whether the raster supports 8 pixels/clock I'm not sure it really matters.
 
It doesn't take much to see what the bottleneck is if you look at it. On a side note their GPU slide might reveal early GLB3.0 results. G6430 seems to be at 11fps and Adreno330 at 9fps. GFXbench 3.0 seems to be close & the public will forget about the 2.x benchmarks fairly quickly.

You could look at the slide in the NV keynote and try to guess a figure for the K1 GFXBench 3.0 Manhattan score .. go on, you know you want to :)
 
You could look at the slide in the NV keynote and try to guess a figure for the K1 GFXBench 3.0 Manhattan score .. go on, you know you want to :)

I'm patient enough to wait for final devices perf and more important perf/mW ratios and then I'll see how it compares to high end smartphone of the time (since that's what the specific slide is aiming to compare with) if K1 ever makes it into one that is.
 
Agreed. I think it's quite clear that the high-end performance demonstrated for T124 definitely requires active cooling and sits outside the design limits for modern smartphones and tablets by a fairly big margin. What it's capable of at ~1.5W and ~5W will be much more interesting.
 
Despite that nobody missed me (which is nice honestly) and since I am a man of my word: I must admit in public that I have to eat my words and they truly seem to have integrated a full Kepler cluster into GK20A; I don't care if it's almost "full" either since I'm generous enough to stand by my mistakes.

As I said at the time, I don't care about apologies but I'm proudly wearing my party hat.
:D

Even though it's not a 2-SMX GPU running at ~350MHz but a 1-SMX part running at ~900MHz, the performance figures are even better than I could ever anticipate.



Agreed. I think it's quite clear that the high-end performance demonstrated for T124 definitely requires active cooling and sits outside the design limits for modern smartphones and tablets by a fairly big margin.

Why not tablets?

AFAIK, there were 7" tablets with K1 in the show floor for hands-on from the press:
We're seeing it working on 7-inch tablets that are relatively thin and compact, but would the same 2GHz+ power be available inside your next Android smartphone?
 
As I said at the time, I don't care about apologies but I'm proudly wearing my party hat.
:D

Even though it's not a 2-SMX GPU running at ~350MHz but a 1-SMX part running at ~900MHz, the performance figures are even better than I could ever anticipate.

Which performance figures? You're sure you want to keep that hat on your head? :LOL:

For the record it was a looooong time ago that Exophase agreed with me here that Tegra5 in its upper boundaries would be an excellent candidate to fight against ULP Haswell configurations. Food for thought perhaps? Unless of course you're as naive to believe that you can cram a 951MHz GPU / 2.3GHz 4+1 A15 into a smartphone; well you can but won't be mobile anymore :)
 
Rys, TK1 was closer to 2.6x faster than the A7 GPU (rather than 3x).

The perf. per watt seems to improve as GPU power consumption goes above 1.5w.

Assuming that perf. per watt increases at a linear rate when moving from 1.5w to 3.5w GPU power consumed (which is unreasonable, but easy for illustrative purposes), then for TK1 to fit into a phone at similar power consumption levels as A7 in iphone 5s, it would have just slightly below 2x more performance in GFXBench 3.0

It would only need ~ 1w more GPU power consumption beyond that to match the claimed 2.6x improvement.

If that is truly the case, then I don't think that TK1 should have any problem meeting these performance claims in a thin fanless tablet (and will work in a high end smartphone too at slightly reduced frequencies). An actively cooled variant in Shield 2 would likely have even greater performance than what was claimed in the slide.
 
Last edited by a moderator:
I bet we'll see the TK1 in a 6" phablet, at least.

In fact, using an aluminum body that is thermally connected to the SoC (like the HTC One), I bet we could see a slightly underclocked TK1 in a 5" smartphone and get a nice battery life using a 3200mAh battery to boot.
 
The perf. per watt seems to improve as GPU power consumption goes above 1.5w.

What is this comment based on? At 1.5W they're probably already vastly dominated by dynamic power over static, I don't think they'd have such strong performance vs their competitors otherwise. From there, getting more performance means increasing clock speed (linear) necessitating increasing voltage (quadratic) which results in a superlinear expansion. Not sublinear.
 
Back
Top