NVIDIA Tegra Architecture

I must admit that I missed the 4K onto 100m2 screen use case. Is it as common as using an flash card? ;)

I said video wall; it could be just a white wall or any other type of material to project on. I'm sure at conferences they still use the ancient slide machines with the wire and the knob at the end in order to click and change the next slide... sarcasm aside you let me know what you'd intend to use to display a presentation in a hall that seats say 1000 participants and if 100sqm are really a lot as a display size.
 
A little tid bit to talk over. ..tegra 4 vs apparent galaxy s4 running snapdragon 600..snapdragon equals the browsermark score of the tegra....of course take with a pinch of salt..considering this is pre release hardware on early software. .not to mention it is a browser benchmark. ..

Never the less it makes an interesting read...but with optimus g getting 2500 on the same test...perhaps completely meaningless..in real world terms tegra 4 on a tablet would be several times more powerfull in the re world..at least in cpu terms.
http://m.gsmarena.com/samsung_gti9500_captures_the_browsermark_20_top_spot-news-5622.php
uploadfromtaptalk1362078605595.jpg
 
NVIDIA showed some slides at MWC 2013 comparing Tegra 4 to quad-core S4 Pro (http://forums.anandtech.com/showpost.php?p=34689618&postcount=230). Tegra 4 apparently is more than 2x faster than S4 Pro in SPECint2000, Sunspider, Web page load, WebGL Aquarium (50 fish), Quadrant Pro 2.0, Vellamo Metal, and is between 1.5-2x faster than S4 Pro in Geekbench, AndEBench Native, CFBench Native, Linpack MT (4T-Market), Antutu, DMIPS, Vellamo HTML5, while being only slightly faster than S4 Pro in Coremark. Strangely enough, NVIDIA somehow estimated the performance of Snapdragon 800 on a case by case basis. Now, take this with a huge grain of salt, but according to NVIDIA they expect Tegra 4 to have a significant performance advantage vs. S800 in SPECint2000, Sunspider, Web page load, WebGL Aquarium (50 fish), Quadrant Pro 2.0, Geekbench, CFBench Native, Linpack MT (4T-Market), Antutu, Vellamo HTML5, and Vellamo Metal, while Snapdragon 800 will have a slight performance advantage vs. Tegra 4 in AndEBench Native, and a significant performance advantage in Coremark and DMIPS. Tegra 4i is supposed to have 80% of the CPU performance of Tegra 4, but we will have to wait and see if that is truly the case across the board.
 
NVIDIA showed some slides at MWC 2013 comparing Tegra 4 to quad-core S4 Pro (http://forums.anandtech.com/showpost.php?p=34689618&postcount=230). Tegra 4 apparently is more than 2x faster than S4 Pro in SPECint2000, Sunspider, Web page load, WebGL Aquarium (50 fish), Quadrant Pro 2.0, Vellamo Metal, and is between 1.5-2x faster than S4 Pro in Geekbench, AndEBench Native, CFBench Native, Linpack MT (4T-Market), Antutu, DMIPS, Vellamo HTML5, while being only slightly faster than S4 Pro in Coremark. Strangely enough, NVIDIA somehow estimated the performance of Snapdragon 800 on a case by case basis. Now, take this with a huge grain of salt, but according to NVIDIA they expect Tegra 4 to have a significant performance advantage vs. S800 in SPECint2000, Sunspider, Web page load, WebGL Aquarium (50 fish), Quadrant Pro 2.0, Geekbench, CFBench Native, Linpack MT (4T-Market), Antutu, Vellamo HTML5, and Vellamo Metal, while Snapdragon 800 will have a slight performance advantage vs. Tegra 4 in AndEBench Native, and a significant performance advantage in Coremark and DMIPS. Tegra 4i is supposed to have 80% of the CPU performance of Tegra 4, but we will have to wait and see if that is truly the case across the board.

Thanks..however s4 pro will be significantly slower tha a fully clocked snapdragon 800....also snapdragon 800 will go into smartphones quite comfortable. .whether we see 2.3ghz clocks in a smartphone remains to be seem...but gpu should stay at the same clocks.

Snapdragon 800 has fuly intergrated lte +...which will surely hp with lower consumption.

As a tablet chip tegra sure is a beast..at least on cpu side...gpu it is adequate.
 
while Snapdragon 800 will have a significant performance advantage in Coremark and DMIPS
That's because those benchmarks are so old(the aliens from 198X) that they can fit in L1 cache, so they don't rely on an advanced OoO, memory prefetching and etc, those benchmarks has nothing to do with real workloads
 
That's because those benchmarks are so old(the aliens from 198X) that they can fit in L1 cache, so they don't rely on an advanced OoO, memory prefetching and etc, those benchmarks has nothing to do with real workloads
Coremark is much more recent, it was released in 2009. But it indeed mostly fits in L1. IMHO it's a poor benchmark, but I also consider SPEC 2006 as not that good given the low pressure on L1 I cache ;)
 
http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9MTczOTMxfENoaWxkSUQ9LTF8VHlwZT0z&t=1

Slide 6 clearly shows the T4 besting the S800 by quite a comfortable margin in all tests except Coremark, DMPS and basically a tie in AndEBench Native.

Thanks..but that is surely just guesstimate s is it not? To my knowledge no s800 has been benchmarked.

Also the battery tests are low level fixed function stuff...nothing strenuous or even mildly taxing that get those A15s into play...

Maybe in a tablet I could believe that Tegra could put those dreadnought Eagles to good use and best a krait 400 in some tests....but im im guessing snapdragon 800 will run closer to its reference clocks in a smartphone than nvidia tegra 4 will...just a hunch.

Am I right in thinking tegra 4 operates similarly to tega3..meaning it cant independently clock its eagle cores like krait can?..if so that will surely affect efficiency when heavy multitasking is going down..especially as most people seem to think A15 consume more power than krait anyway.

Anandtechs power consumption article threw light on this and kraits idle power was very efficient. ...I dont think s800 on the new 28nm hkmg TSMC process will be worse than tegra 4...remember tegra 4 shadow core is also a moderately clocked eagle...

Power consumption of adreno 330...qualcomm its self states that adreno 330 consumes HALF the power of its previous gpu uarch....with this I must presume adreno 225...that is an ambitious boast from qualcomm....a company that so far has been the most accurate and truthfull about its performance projections...even so this is the only thing I would raise an eyebrow at.

An interesting point on those slides was nvidias boasting that it can out perform an Ivybridge ULV i5....this is some boast...nvidia has been known to be knocking back the kool aid on more than one occasion before...so I wouldnt stake my bet on this.

We await proper shipping products to test on....but I will say this...nvidias questionable assertions in that investors slide that tegra 4 is that powerfull compared to s800...and will also fit quite nicely I to smartphones will not be as they peut across.

Those nvidia performance metrics are from a tegra 4 tablet prototype clocked through the roof sucking probably 8w load....they make no mention of underclocking the part to be able to fit into smartphones which will surely happen.
 
also snapdragon 800 will go into smartphones quite comfortable. .whether we see 2.3ghz clocks in a smartphone remains to be seem...but gpu should stay at the same clocks.

I'm not so sure that any relatively powerful quad-core SoC will be "quite comfortable" inside a smartphone at the specified clock operating frequencies with CPU and/or GPU-intensive tasks due to thermal throttling. Just to illustrate the point with the quad-core S4 Pro SoC inside the Nexus 4 smartphone, note that when the internal temperature is below 36 degrees Celcius, the Krait CPU is able to maintain a 1.512 GHz clock operating frequency while the Adreno 320 GPU is able to maintain a 400 MHz clock operating frequency. But when the internal temperature rises, the CPU and GPU clock operating frequencies go down due to thermal throttling:

@ 36-38 degrees celcius: the Krait CPU drops down to a 1.296 GHz clock operating frequency while the Adreno 320 GPU drops down to a 325 MHz clock operating frequency
@ 38+ degrees celcius: the Krait CPU drops down to a 1.188 GHz clock operating frequency while the Adreno 320 GPU drops down to a 200 MHz clock operating frequency

See the following Nexus 4 thermal throttling video starting at around the 4 min 30 second mark: http://www.youtube.com/watch?v=abf7nPiUUE8&list=UUB2527zGV3A0Km_quJiUaeQ&index=4

Granted, these thermal throttling settings may be quite conservative on Google's part, but all smartphones will need to throttle frequencies to some extent.
 
I said video wall; it could be just a white wall or any other type of material to project on. I'm sure at conferences they still use the ancient slide machines with the wire and the knob at the end in order to click and change the next slide... sarcasm aside you let me know what you'd intend to use to display a presentation in a hall that seats say 1000 participants and if 100sqm are really a lot as a display size.

The question I have is. How are you going to get the 4k video/stream onto that mobile device.

Sony, who are heavily invested in 4k and will be pushing it heavily has already stated that 4k movies will be 100+ GBs for average length Digitally Downloaded movies. I'm assuming average length is your typical 80-90 minute movie. And I'm assuming Digitally Downloaded implies heavy compression.

Streaming is out of the question. Just imagine the cellular network outages that will result if multiple people attempt to stream 4k video. And I can't think of a single mobile phone that has 100+ GB of contiguous storage space. Even for tablets that'll be pushing it just to store 1 average length movie. Forget about extended (2-3 hours) movies. :p

Regards,
SB
 
The question I have is. How are you going to get the 4k video/stream onto that mobile device.

Sony, who are heavily invested in 4k and will be pushing it heavily has already stated that 4k movies will be 100+ GBs for average length Digitally Downloaded movies. I'm assuming average length is your typical 80-90 minute movie. And I'm assuming Digitally Downloaded implies heavy compression.

Streaming is out of the question. Just imagine the cellular network outages that will result if multiple people attempt to stream 4k video. And I can't think of a single mobile phone that has 100+ GB of contiguous storage space. Even for tablets that'll be pushing it just to store 1 average length movie. Forget about extended (2-3 hours) movies. :p

Regards,
SB

100GB is about 4 times the size of a BluRay, and 4K is 4 times as many pixels as 1080p, so I think we're talking about very high quality here (or minimal compression).

In practice, 1080p movies can fit quite nicely into about 8GB in 1080p with more aggressive h.264 compression, so 32GB ought to do for 4K. Of course that's still quite big for a handheld device, but manageable.
 
The question I have is. How are you going to get the 4k video/stream onto that mobile device.

Sony, who are heavily invested in 4k and will be pushing it heavily has already stated that 4k movies will be 100+ GBs for average length Digitally Downloaded movies. I'm assuming average length is your typical 80-90 minute movie. And I'm assuming Digitally Downloaded implies heavy compression.

Streaming is out of the question. Just imagine the cellular network outages that will result if multiple people attempt to stream 4k video. And I can't think of a single mobile phone that has 100+ GB of contiguous storage space. Even for tablets that'll be pushing it just to store 1 average length movie. Forget about extended (2-3 hours) movies. :p

Regards,
SB

If you re-read the quote the specific wasn't about movies; when I specifically mention conferences and presentations it doesn't take a wizzard to see where I'm pointing at. I'm sure conference participants would love to attend a conference and fall asleep while watching movies but it isn't usually the case. In the future (yes it'll take quite a few years) when 4k will become mainstream on display mediums there will also be 4k projectors and high end devices from every category will support the format and I don't see why tablets of the future should take a backseat in that regard.

We host quite a few conferences at work and once you start dealing with those details, it doesn't take too long to see where technology has improved vastly in the past few years and where and how it could further improve down the line. Movie theaters don't have those kind of issues because many already stream in 4k formats, but on the other hand the equiment also costs several millions, which is not an investment any company can carry that easily.

Storage will always be a headache, but I don't expect tablets half a decade down the line to still top out at "only" 64 or 128GB.
 
Yeah :D

This article compares SPEC2006 vs other benchmarks. I think it shows how bad SPEC2006 is at stressing Icache.

Thanks for the link, this was an interesting paper. It does show the SPEC2006 selection isn't stressing icache as hard as the other selection, but I don't think that's enough to say that it's on the outlying side. Those SPEC tests are a pretty diverse collection of mostly real world applications that don't really have anything to do with each other. bbench doesn't surprise me that much (web pages can touch a lot of code, especially if it's running JITed javascript) but I wouldn't have expected to see ServeStream or Rockbox to be much more icache heavy than SPEC2006.

I wonder if there's a strong connection between the use of system/library code, where on SPEC it's close to zero for all but one but for three of the others it's very substantial. Heavily used shared object functions would naturally decrease locality of reference and inflate the amount of hot code, and would probably cause more capacity misses. But it could also be worse because the system/library code is just, on average, worse.

I wonder if numbers > 40% in system/library code are typical for important real world software (I wouldn't be caught dead relying on that much system code for anything performance critical)

Another informative thing I got out of this is how, for at least the tested Cortex-A9s, an icache miss is a lot less expensive than a dcache miss. This could be down to a mix of the data being more likely in L2 cache and hardware prefetching working more effectively for instructions (not that surprising, instruction access patterns are pretty linear and can be detected early in the fetch unit). This is something that definitely needs to be considered when comparing icache and dcache miss rates, but I rarely see it brought up in those discussions where people say icache misses are more prevalent. Even the programs tested showing far more icache than dcache misses still generally spent at least as many cycles in dcache stalls, if not substantially more.
 
Back
Top