Blazkowicz
Legend
We all like to claim that Apple is form without substance and it is easy to troll about them, then we're now forced to remember they are a hardware company afterall.
Two reasons from the top of my head:
1. ARM did the work to make the A57 and A53, so NVIDIA doesn't have to spend any time or money making a leakage optimized A57.
2. Marketing droids prefer octacore.
Apple CPUs, on the other hand, seem to constantly surprise - and AFAIK they don't use big.LITTLE or asynchronous DVFS.
Sure. So if you have a workload with one heavy thread and one light thread, how do you save {power, time} by turning on a little core? You've already burned the power to turn on the big core, so the little core is roundoff error in terms of power, right? Turning on the little core is also roundoff error in terms of performance: you could just keep the big core on and multiplex the heavy thread and the light thread multiplexed on the big core without any performance penalty (otherwise, we have two heavy threads, right?)
Doesn't make much sense to me.
Saw a press release where the X1 was touted as advanced car tech. How it would recognize signs and other road objects.
But Nvidia isn't saying they're introducing a self-driving car platform, are they?
So nVidia says that Tegra X1 is using Cortex-A57 + A53 instead of Denver because this was faster and simpler to implement in 20nm. But Anandtech says that they have a completely custom physical implementation. In that case, how would "hardening" A57 and A53 - two CPUs they've never used before, especially the latter - be faster or simpler than using Denver, which they already have a custom implementation of and would require a more straightforward shrink? The only way this makes sense is if A57 was ready long before Denver, but the release timescale of devices using each respective CPU make this seem unlikely. So I'm skeptical that both of these claims are completely accurate.
Doing it their way instead of ARM's way has benefits:
ARM's way is HMP, not old style cluster migration as on the 5410/5420. I really doubt Nvidia's claims on any kind of benefit of their own CM.
That said, I think that NVIDIA is finally on to something. They seem to have a good implementation of a 4+4 A57/A53 setup (i.e. pretty much the best available IP at the moment) plus a big Maxwell GPU, which we know to be very efficient, plus support for fast FP16. There's not reason for TX1 to have power-efficiency issues anymore.
The problem is that I'm not sure there are many use cases where TX1 would be all that preferable to, say, a Snapdragon 810 with integrated LTE, or a cheaper Mediatek design.
I think that goes without saying..it had to be without DRAM intervention..if not, it would be extremely power inefficient wouldn't it? Btw do we have any idea if there is large L3 cache like Apple has?Cache coherency as per how we questioned and got a response from Nvidia was that a cluster migration is done without any DRAM intervention. Again, I fail to see how this could be more efficient than just migrating via ARM's CCI, even if it's just limited to cluster migration.
I really think most of their power efficiency claims just come from the process advantage and probably better physical libraries compared to the 5433 (I have an article on that one with very extensive power measurements... the 5433 is a comparatively bad A57 implementation if compared to what Samsung has achieved on A15s now on the 5430, many would be surprised).
I think their interconnect just can't do HMP and this is PR spinning.
Qualcomm doesn't have a great reputation for CPU design. Arguments about their DVFS are unconvincing.
Apple CPUs, on the other hand, seem to constantly surprise - and AFAIK they don't use big.LITTLE or asynchronous DVFS.
1. They obviously didn't mind several times so far.
2. It's only a true octacore when global task scheduling works even for marketing droids.
3. Since Parker most likely will bounce back to Denver how will they explain to marketing droids the loss of 6 cores exactly?
Having an integrated modem/LTE is not the holy grail. I think people overestimate the importance of it. Look at Samsung and the international versions of their Galaxy S and Galaxy Note lines over the years. And comparing it to Mediatek is apples and oranges..they simply do not compete with this class of chip. AFAIK Mediatek do have an A57 design in the works but it is on 28nm. And Mediatek's graphics implementations are woefully underpowered.
When Nvidia announced the Tegra K1, the company pointed out that it was based on the same architecture as the company’s PC-based Kepler GPU. Going forward, all Nvidia’s mobile devices will be based on the same GPU core as those in the higher-performance PC parts.
As a result of the new process and the move to its Maxwell GPU core, Nvidia was able to double the performance from the prior-generation Tegra K1, delivering more than a teraflop while holding the TDP at the K1 level, and enlarging the die size only slightly.
The X1 has two streaming multiprocessor (SM) blocks, giving the chip 256 Cuda cores, 16 texture units, 16 ROP units, and twice the performance/watt of the K1. Since the GPU is based on the Maxwell architecture, it is compatible with all the popular mobile APIs such as DirectX 12, AEP, OpenGL 4.5, and Open GL ES 3.
The company also emphasized the chip’s ability to handle 4K video capture and display. It can decode up to 500 megapixels per second with its hardware H.265 codec (500 Mpixels/s is 4K at 60 frames/s), and stream 4K out via a HDMI 2.0 interface at 60 f/s.
The Maxwell SM is partitioned into four processing blocks, each with its own dedicated resources for scheduling and instruction buffering. This new configuration along with scheduler and data-path changes saves power and delivers more performance per core. Also, the SM shared memory is now dedicated instead of being shared with the L1 cache.
The chip’s memory bus is 64 bits wide and can run LPDDR4 at 3,200 Mtransfers/s. It uses Maxwell’s lossless color compression that runs end-to-end. The company says that all the advanced architectural features available on desktop GeForce GTX 980 will be available on mobile Tegra X1 as well.
It may not be the Holy Grail but it's Qualcomm's main competitive advantage, and they do dominate the smartphone market. They do pretty well in tablets too. They also have Krait and Adreno, but neither seems particularly better than Cortex or Mali/PowerVR respectively. The market performance of their Cortex-powered S810 should bring us more controlled information about the competitive value of their modems, unless Adreno 430 turns out to be spectacular.
As for Mediatek, I don't know what they're working on exactly for 2015, but I imagine they must have some sort of 4+4 A57/A53 setup with decent graphics. I'm not trying to say that they compete with NVIDIA on graphics performance.
Rather, I'm arguing that there's really not much you can do on a Tegra device that you can't do on a (cheaper) Mediatek one. Whatever it is that you can do on Tegra and not on an MT chip, I doubt it's enough for Tegra to be viable as a tablet product. Since (from what I've read) JHH spent far more time talking about cars than tablets when presenting Erista, he just might agree with me.
If I was NVIDIA, I would do the following:
- 16FF process
- 2 x Denver (2.5GHz+ with revisions)
- 4 x A53 (2GHz+ - optimised for speed)
- 4 x A53 (1.2GHz+ - optimised for power)
- All cores can be used at the same time...
If I was NVIDIA, I would do the following:
- 16FF process
- 2 x Denver (2.5GHz+ with revisions)
- 4 x A53 (2GHz+ - optimised for speed)
- 4 x A53 (1.2GHz+ - optimised for power)
- All cores can be used at the same time...
It's similar to the Snapdragon 615 using A53@1.7GHz + A53@1.0GHz. There is quite a large power/area efficiency difference between a ~2GHz and a ~1GHz core.What are the 2GHz A53 doing in there? Why not just 2xDenver + 4x A53?
The chip in general does not behave like an ARM chip, although it presents the illusion of it to software.And NVIDIA apparently already can't do this with a standard A57+A53 config, then it becomes even slightly harder with Denver because of the different internal ISA (probably not too bad as long as Denver has similar IPC to A53 on ARM ISA code pre-transcoding).
Perhaps instruction footprint would be less of a problem if the ISA wasn't allegedly capable of emulating arbitrary architectures like x86. An ARM-specific emulation ISA with all the extra bells and whistles could probably find economies in areas of functionality that ARM may not need, but which x86 would. Having the expanded uop format in memory seems to imply safeguarding against decode costs that ARM is not noted for, as an example.One possible advantages of simultaneous Denver+A53 is that you could keep more stuff on the A53s and reduce instruction cache pressure on the Denver.
Eh, it is not that far off. The CPU is probably as good or better, while the GPU is roughly half the performance (or much closer if using fp16). Bandwidth is the only real killer, but if one is willing to accept 720p instead of 1080p, I imagine Xbox One games could run just fine on it with minimal adjustment. It is certainly much better than the Wii U at any rate....Last but not least: did I misread something or did Jensen really claim that X1 has the performance of the Xbox One console?
Eh, it is not that far off. The CPU is probably as good or better, while the GPU is roughly half the performance (or much closer if using fp16). Bandwidth is the only real killer, but if one is willing to accept 720p instead of 1080p, I imagine Xbox One games could run just fine on it with minimal adjustment. It is certainly much better than the Wii U at any rate....
And if you gave the chip the X1's heatsink and power supply?That, and I think there will be few if any actual devices that will allow all four A57s to be run at peak clock simultaneously, at least for any meaningful length of time. Even the original Shield handheld which had a fan had a hard cap of 1.4GHz for the four A15 cores if they were all active.