NVIDIA Tegra Architecture

On the Tegra 4 GLBenchmark it lists a test I haven't seen before C24Z16 Offscreen ECT1, how does that compare with the regular C24Z16 Offscreen in terms of performance?
 
Last but not least, Tegra 4i "Phoenix" reference platform game demo: http://www.youtube.com/watch?v=HpKiJsbT1EM and http://www.youtube.com/watch?v=UUvRpVF7eHk&list=UUddiUEpeqJcYeBxX1IVBKvQ&index=2 . I'd say that Real Boxing looks smooth, but Riptide GP 2 [second gen version] could use some work. Based on TheVerge commentary, this demonstration is first silicon for Tegra 4i, so hopefully some hardware and software improvements will result in a smooth gameplay experience when Tegra 4i commercial devices start to ship by end of this year and early next year.
 
Last edited by a moderator:
No power consumption while under any kind of CPU load? That's what people really want to know right now..

I haven't seen anything yet for Tegra 4, but surely with all four A15 CPU cores absolutely pegged I would imagine at least 4w peak power consumption.

Edit: Here are a few power consumption comparisons to S4 Pro with 1080p video recording, 1080p video playback, Web/Book Reading, Audio playback, Standby: http://cdn.androidpolice.com/wp-content/uploads/2013/02/nexusae0_tt4_thumb.png
 
Last edited by a moderator:
I haven't seen anything yet for Tegra 4, but surely with all four A15 CPU cores absolutely pegged I would imagine at least 4w peak power consumption.

I expect > 6W personally, but I'm not really that interested in this scenario either. Would like to see one or two cores pegged, and something with a moderate load, on and off the companion core (you'd think they'd at least want to show that)
 
I expect > 6W personally, but I'm not really that interested in this scenario either. Would like to see one or two cores pegged, and something with a moderate load, on and off the companion core (you'd think they'd at least want to show that)

Hopefully we will see this when people such as Anand get their hands on a Tegra 4 device (such as Shield) a few months from now.
 

Looks like they doubled the TMUs and pixel ROPs, while keeping the depth ROPs (ZOPs?) and triangle setup engine the same.

I've heard that the triangle setup capabilities of GeForce ULP weren't stellar, I wonder if this will be a bottleneck anywhere.. TMUs will probably limit them at least some of the time too - with A6X Apple doubled the ALU:TMU ratio but here they're tripling it. nVidia has also confirmed that Tegra 3 was 2 TMUs (and Tegra 4 is 4 TMUs).

Tegra 4i looks even crazier, with the ALU:TMU ratio increasing a staggering 6x (for fragment shading anyway, but I doubt vertex shading increasing half as much as fragment is going to be much of a bottleneck, especially with the triangle rate not increased)
 
This summarizes the Tegra 4 GPU architecture: http://pc.watch.impress.co.jp/img/pcw/docs/589/158/html/10.jpg.html , and this compares Tegra 4 to Tegra 3: http://pc.watch.impress.co.jp/img/pcw/docs/589/158/html/11.jpg.html. According to this slide, the Tegra 4 GPU clock operating frequency is 672MHz.

This summarizes the Tegra 4i GPU architecture: http://pc.watch.impress.co.jp/img/pcw/docs/589/158/html/16.jpg.html , and this compares Tegra 4i to Tegra 3: http://pc.watch.impress.co.jp/img/pcw/docs/589/158/html/17.jpg.html . According to this slide, the Tegra 4i GPU clock operating frequency is 660MHz.
 
Last edited by a moderator:
The power performance test is pretty slanted, because very rarely are you going to get nearly 2x the IPC from a Cortex-A15 vs a Cortex-A9, even if it's happening for them in SPECInt2k. So while the Cortex-A15s at 825MHz may use less power than the Cortex-A9s at 1.6GHz (which used quite a bit, mind you) they'll probably tend to stay below the same performance. It's even more slanted when you consider that this is the limit of what the companion core could run at, and therefore probably ran off the companion core. I bet available memory bandwidth has something to do with these results..

Still, I hope this encourages sites like Anandtech to do more than compare power consumptions at more than peak clock speeds.
 
Looks like they doubled the TMUs and pixel ROPs, while keeping the depth ROPs (ZOPs?) and triangle setup engine the same.

I've heard that the triangle setup capabilities of GeForce ULP weren't stellar, I wonder if this will be a bottleneck anywhere.. TMUs will probably limit them at least some of the time too - with A6X Apple doubled the ALU:TMU ratio but here they're tripling it. nVidia has also confirmed that Tegra 3 was 2 TMUs (and Tegra 4 is 4 TMUs).

Tegra 4i looks even crazier, with the ALU:TMU ratio increasing a staggering 6x (for fragment shading anyway, but I doubt vertex shading increasing half as much as fragment is going to be much of a bottleneck, especially with the triangle rate not increased)

Yeah, no kidding. The Tegra 4i GPU is a lot different than most of us expected based on what we knew about the Tegra 4 GPU, with a very big focus on performance per mm^2. The Tegra 4 GPU already appears to be extremely small in terms of die size area vs. most other performant mobile GPU's (http://pc.watch.impress.co.jp/img/pcw/docs/589/158/html/20.jpg.html), so the Tegra 4i GPU should be ridiculously small. On a side note, it appears that the Tegra 4i 32-bit memory controller will be operating at a 50% higher frequency than Tegra 3's 32-bit memory controller, so memory bandwidth will be significantly improved in comparison, and hopefully will be adequate for use in a smartphone.
 
That 10mm^2 figure doesn't gel at all with the die shots they've shown, but I guess we've always suspected those to be fake.

nVidia is emphasizing on showing how great their perf/mm^2 is, when it's obvious Apple (to name one) is heavily investing in perf/W at the expense of perf/mm^2. I mean, A6X could have probably had double the GPU clock and half the cores, just to name the most obvious - but I'm sure the emphasis goes a lot deeper than that and partially has to do with how IMG designed the cores. nVidia isn't even trying to make a perf/W comparison here, but I expect it's going to be substantially worse.
 
No doubt, the Tegra 4 die shots shown in their slides are definitely not a true depiction in any way, and are a high level overview at best. I agree that performance per watt would be the most important metric.
 
That 10mm^2 figure doesn't gel at all with the die shots they've shown, but I guess we've always suspected those to be fake.

nVidia is emphasizing on showing how great their perf/mm^2 is, when it's obvious Apple (to name one) is heavily investing in perf/W at the expense of perf/mm^2. I mean, A6X could have probably had double the GPU clock and half the cores, just to name the most obvious - but I'm sure the emphasis goes a lot deeper than that and partially has to do with how IMG designed the cores. nVidia isn't even trying to make a perf/W comparison here, but I expect it's going to be substantially worse.

Which is a bit surprising. Emphasizing perf/mm² at the expense of perf/W would make sense for a chip aimed at cheap tablets like the Nexus 7 (tight budget, big battery) but for a phone SoC?

How long will Tegra 4(i)-based phones last while gaming compared to the competition at similar performance levels?
 
So the Tegra 4i has a single-channel memory controller.
It's definitely a smartphone design, but with all top-end smartphones going into 1080p this year, I wionder if the ~6.5GB/s bandwidth will be enough to compete with all the dual-channel SoCs.

Then again, my Transformer Inifinity has a Tegra 3 with 6.4GB/s bandwith (single-channel DDR3-1600) and it performs quite well.. when the slow I/O doesn't halt the damned thing..
 
If I am reading NVIDIA's slide correctly, then Tegra 4i--intended for smartphones--will have 1.5x higher memory bandwidth than the Nexus 7 (due to use of higher frequency memory), while Tegra 4--intended for tablets and very high end smartphones--will have 2.3x higher memory bandwidth than the Transformer Pad Infinity (due to use of higher frequency memory and use of a dual channel memory controller).
 
Back
Top