NVIDIA Tegra Architecture

extrajudicial, I think you have the wrong forum. You need to improve your contributions massively, otherwise I fear that your time in these lands will be brief.
 
Here's one result I came up with Googling around for the register reuse cache: A compile-time managed multi-level register file hierarchy, by Nvidia's Bill Dally. It's only a citation, unfortunately, but it describes exactly the kind of configuration that the Maxas guy uncovered, and it was published in 2012, which should be around the time Maxwell was in its architecture definition stage.

The paper claims a reduced register file energy usage by up to 54%.
 
Here's one result I came up with Googling around for the register reuse cache: A compile-time managed multi-level register file hierarchy, by Nvidia's Bill Dally. It's only a citation, unfortunately, but it describes exactly the kind of configuration that the Maxas guy uncovered, and it was published in 2012, which should be around the time Maxwell was in its architecture definition stage.

The paper claims a reduced register file energy usage by up to 54%.

The link you provided just times out.

Here is a link to Nvidia's paper on the subject:

https://research.nvidia.com/publication/compile-time-managed-multi-level-register-file-hierarchy
 
For no credible reason (you have yet to provide one anyway).



Indeed, it is not magic; it is quality engineering.



You have provided zero actual evidence that this is the case.



You have also provided zero evidence that Maxwell is less efficient for compute workloads.


I meant to say I'm NOT discounting its efficiency, why can't I edit my posts?


And I did provide evidence which you ignored. Please explain why compute loads make maxwell only marginally (>10%) more efficient during compute. If it's not power gating as anandtech said then what? You criticize my explanation and provide none of your own besides saying "it's in the architecture"


I'm appalled to see people putting their own personal favor and letting it trump evidence and common sense. If maxwell is so power efficient why can't the nexus 9 manage 10 hours battery life? That's hardly impressive.
 
extrajudicial said:
Please explain why compute loads make maxwell only marginally (>10%) more efficient during compute
Again, you have not yet provided any evidence that this is the case. Furmark power consumption is not evidence. If you don't understand why it is not evidence, I am not sure myself or anyone else here can help you. Indeed, the only actual compute benchmarks in this thread, that I am aware of, show Maxwell to be considerably more efficient than Kepler in compute workloads, sometimes over 100%+.

extrajudicial said:
I'm appalled to see people putting their own personal favor and letting it trump evidence and common sense.
The only person here I see ignoring evidence is you.

extrajudicial said:
If maxwell is so power efficient why can't the nexus 9 manage 10 hours battery life?
The Tegra K1 in the Nexus 9 is based on Kepler, not Maxwell.
 
Last edited by a moderator:
If it's not power gating as anandtech said then what? You criticize my explanation and provide none of your own besides saying "it's in the architecture"
And again you ignore my post completely.

If maxwell is so power efficient why can't the nexus 9 manage 10 hours battery life? That's hardly impressive.
Because Nexus 9 uses Kepler, not Maxwell?
 
If maxwell is so power efficient why can't the nexus 9 manage 10 hours battery life? That's hardly impressive.


The battery life tests that show ~10 hours aren't using the GPU anyway. Those tests are measuring the efficiency of the screen and the ability of the SoC to shut itself off when nothing is happening.
 
Please explain why compute loads make maxwell only marginally (>10%) more efficient during compute. If it's not power gating as anandtech said then what?

Do you know what efficiency means?

How is it only "10% more efficient" if it's doing almost twice as much work?
 
I wonder if that thing is benchmarking the hardware encryption accelerator?

At that point in time I think I'm tired of micro-benchmarks, even if they "improved".

Year 2000 : dhrystones, raw MIPS and the like are dead, let's switch to application benchmarks instead!
Year 2014 : here's a multi-platform collection of ~40 synthetic benchmarks with autogenerated web pages.

It leaves me wondering if the bench is about what fits in L1, or L2.
 
I wonder if that thing is benchmarking the hardware encryption accelerator?
There are indeed too many benchmarks that are broken due to dedicated instructions. But having studied Geekbench code, it's not that bad as long as you don't forget it's a smallish benchmark (though not really a micro-benchmark such as dhrystone or coremark).

At that point in time I think I'm tired of micro-benchmarks, even if they "improved".

Year 2000 : dhrystones, raw MIPS and the like are dead, let's switch to application benchmarks instead!
Year 2014 : here's a multi-platform collection of ~40 synthetic benchmarks with autogenerated web pages.

It leaves me wondering if the bench is about what fits in L1, or L2.
What is the alternative? SPEC has been mostly broken by compilers and autopar, and anyway it can't be run on most smaller devices (it requires 2GB). Javascript benchmarks are somewhat interesting but results vary a lot depending on the browser. So what non micro-benchmark do you propose that is available on many platforms?
 
Indeed I was venting a bit about the benchmarks, at least it's nice to have all these individual results though it's hard to know which ones are the more useful/balanced ones.

Running something under desktop linux (debian, Ubuntu etc.) or maybe some mobile linux like Mer ought to be a solution, at least for comparing Tegra K1 and Atom. Mullins, too.
Sure, it shifts the problem, you won't be able to run abritrary OS on that many devices, or it may come later while the device is "Android first".

I'll be curious about Android L, will it make easy to run a linux container on arbitrary (but rooted) phone/tablet hardware, with some simple linux distro in it? (even text mode allows to crunch numbers and run various stuff fine). That might be possible sometimes but I wonder what performance and features are there on arbitrary or vanilla Android L (going off-topic here. that may be something desirable to do on Denver and tablet x86, let's say)
 
Last edited by a moderator:
Back
Top