Blazkowicz
Legend
These days nvidia die shots are highly implausible, pasting 192 little squares next to each other is really not how a Kepler gets made. It is like artist's vision of an exoplanet.
These days nvidia die shots are highly implausible, pasting 192 little squares next to each other is really not how a Kepler gets made. It is like artist's vision of an exoplanet.
Erinyes is usually pretty spot on on things, but it seems it's only you and I that notice
Erista is on 20SoC TSMC from what I've heard and I haven't seen anything planned yet (no matter what) for 16FF for 2015. As for the rather interesting power measurements they have in that link, is that with or without throttling?
I need time to study that article since especially the CPU integration sounds quite interesting. As for FP16 related optimisations let's hear what the usual suspects have to say NOW about it.
Eventually someone will have a die shot and we'll find out. Without knowing though the exact transistor density, die estate is only half the useful information. If they've used a comparable density to the A8X (estimated 24Mio/mm2) and if your estimate is true it could be a healthy bit over 2b transistors.
You could drink shots of coloured paints and then hover over a sheet of paper and wait for nature to take its course, and you'd end up with a more accurate die shot than what's in NV Tegra marketing.
You could drink shots of coloured paints and then hover over a sheet of paper and wait for nature to take its course, and you'd end up with a more accurate die shot than what's in NV Tegra marketing.
TK1-32 and TK1-64 also have a custom interconnect. I imagine that there is quite a sizeable engineering investment, especially since they claim it's "cache coherent". The fact that they say it's cache coherent but still remains a cluster migration SoC is extremely eyebrow raising. I hope we'll find out more in the comings months.Nvidia always seems to have done things differently with regard to their CPU configuration..though I dont know if thats always been to their benefit. How much of an effort is it to design your own interconnect as opposed to using ARM's?
I haven't seen much detail on what AMD is doing about FP16, unfortuntely.I am glad to see that first AMD and now Nvidia joined the ranks. Now everybody has FP16 ALU support in their forthcoming chips.
It looks to be a major undertaking, going by how infrequently major shifts happen for coherent interconnects.TK1-32 and TK1-64 also have a custom interconnect. I imagine that there is quite a sizeable engineering investment, especially since they claim it's "cache coherent". The fact that they say it's cache coherent but still remains a cluster migration SoC is extremely eyebrow raising. I hope we'll find out more in the comings months.
Nvidia always seems to have done things differently with regard to their CPU configuration..though I dont know if thats always been to their benefit. How much of an effort is it to design your own interconnect as opposed to using ARM's?
However, rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM’s CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications. It’s important to note that NVIDIA’s solution is cache coherent, so this system won't suffer from the power/performance penalties that one might expect given experience with previous SoCs that use cluster migration.
http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview
ARM's way is HMP, not old style cluster migration as on the 5410/5420. I really doubt Nvidia's claims on any kind of benefit of their own CM.Doing it their way instead of ARM's way has benefits:
So nVidia says that Tegra X1 is using Cortex-A57 + A53 instead of Denver because this was faster and simpler to implement in 20nm. But Anandtech says that they have a completely custom physical implementation. In that case, how would "hardening" A57 and A53 - two CPUs they've never used before, especially the latter - be faster or simpler than using Denver, which they already have a custom implementation of and would require a more straightforward shrink? The only way this makes sense is if A57 was ready long before Denver, but the release timescale of devices using each respective CPU make this seem unlikely. So I'm skeptical that both of these claims are completely accurate.
Cache coherency as per how we questioned and got a response from Nvidia was that a cluster migration is done without any DRAM intervention. Again, I fail to see how this could be more efficient than just migrating via ARM's CCI, even if it's just limited to cluster migration. I really think most of their power efficiency claims just come from the process advantage and probably better physical libraries compared to the 5433 (I have an article on that one with very extensive power measurements... the 5433 is a comparatively bad A57 implementation if compared to what Samsung has achieved on A15s now on the 5430, many would be surprised).Cache coherency between the two clusters only means anything if they're both running the same time. Which should mean that they're capable of HMP, even if the normal use model is to only keep one on to let it wind down for a while during migration. Or maybe only the cache is kept on.
That's very far-fetched.An advantage to having only one cluster on at a time could be that it can mux the same voltage and clock domains between both clusters, assuming the former has enough dynamic range. But this again doesn't make sense if they're cache coherent.
Cache coherency as per how we questioned and got a response from Nvidia was that a cluster migration is done without any DRAM intervention. Again, I fail to see how this could be more efficient than just migrating via ARM's CCI, even if it's just limited to cluster migration. I really think most of their power efficiency claims just come from the process advantage and probably better physical libraries compared to the 5433 (I have an article on that one with very extensive power measurements... the 5433 is a comparatively bad A57 implementation if compared to what Samsung has achieved on A15s now on the 5430, many would be surprised).
I think their interconnect just can't do HMP and this is PR spinning.
¯\_(ツ)_/¯But then why wouldn't they just use ARM's CCI?
¯\_(ツ)_/¯
Maybe they wanted to continue using what they had instead of switching over to a new IP. There's no real use of ARM's CCI in the context of Denver, keeping vanilla and custom cores on a same interconnect would pose less effort I imagine. Both TK1 variants use the same SoC architecture for example.
What is HMP? Running both A57 and A53 cores simultaneously? If so, I doubt ARM's claims on any kind of benefit to HMP. If not, enlighten me. =)ARM's way is HMP, not old style cluster migration as on the 5410/5420. I really doubt Nvidia's claims on any kind of benefit of their own CM.
That's very far-fetched.
Let me twist that one: if there aren't any benefits why not stick with a 4+1 config in the first place?What is HMP? Running both A57 and A53 cores simultaneously? If so, I doubt ARM's claims on any kind of benefit to HMP. If not, enlighten me. =)