As a personal speculation, I was always a bit doubtful when it came to the claims that the x86 tax was some 10% or so. The claim is based on the idea that the x86 ISA is translated to a lower level code that once the translation is done runs optimally, and that the conversion carries only a small cost (and perhaps a somewhat higher cost in terms of complexity). But this has never really been truly tested.

My impression is that the tax is also considerd to be due to the fact that the ISA has been carried around for many , many years and the compatibility forces compromises on the decoder implementations.
The ISA could be considered bloated by some. I certainly missed any objective ISA analysis / comparison if ever there was one recently

The last official context i recall it being tangetially refered to was the 64 bit x86 introduction, with Intel invoking the need for a clean slate when Itanium was spawned. And then in reply, AMD came with amd64 , a port which according to them incured a much lower performance tax than Intel had anticipated (something in the vein of ~ 5 % , ofcourse pulling this number from ancient and unreliable memory)

Micro-ops' existance, their 'fission' / fusion and whatnot are implementation decisions. They could get abandoned if they would not be worth it. So that shouldn't be a strong reason to support the 'tax'
 
I certainly missed any objective ISA analysis / comparison if ever there was one recently
Most of such an analysis is pointless without in-depth knowledge of the electrical engineering considerations that need to go into the RTL. You could say the ISA is far too high level in this regard.
 
Those 1TB iPads with 6 gigs, I'm wondering if that extra ram is due to the flash mapping table being larger on 1TB vs smaller storage sizes.
 
Those 1TB iPads with 6 gigs, I'm wondering if that extra ram is due to the flash mapping table being larger on 1TB vs smaller storage sizes.

Since you'll want to store the mapping table in some kind of NVM, it's very unlikely that main memory is used for storing a significant part of the mapping table (some kind of caching is possible, of course, but you only need about 2GB for the whole table for an 1TB SSD with block-level mapping).
 
Last edited:
Both Skylake 6700K and A12X are about 120 mm2 die area.
The Skylake is 14nm and 1.75B transistors, the A12X at 7nm is 10B transistors.
(https://en.wikipedia.org/wiki/Transistor_count)
Clock frequency for the A12X cores are currently not known AFAIK, something like 2.5Ghz probably.
At 4Ghz the 6700K is likely faster.
In any case it shows how far Intel has fallen back both in silicon process and architectural efficiency of x86.


I feel like I am reading this wrong, are you saying the A12x's transisters are over 5x denser than the 6700k's?

1.7b vs 10b for the roughly the same size peice of silicon?

If so how is that possible? I thought Intels 14nm and TSMC 7nm were only about one gen apart? TSMC 10nm being equivilant to Intels 14nm?

I feel like the 6700k transister count must be to low? the wiki article does not seem to have a source for the number?

Still its bonkers what Apple have done here.
 
Since you'll want to store the mapping table in some kind of NVM, it's very unlikely that main memory is used for storing a significant part of the mapping table (some kind of caching is possible, of course, but you only need about 2GB for the whole table for an 1TB SSD with block-level mapping).

Those 1TB iPads with 6 gigs, I'm wondering if that extra ram is due to the flash mapping table being larger on 1TB vs smaller storage sizes.


Lot of speculation that Apple will switch Macs to ARM. Or that they will make iOS more Mac-like, with cursors, windowing, access to file system, maybe even installs of apps. bypassing the App. Store.

But to support Mac OS or to support IOS with more desktop features, you'd think they'd have to really more RAM in the Apple SOCs, at least 8 GB with options to go to 16, 32 or 64 GB? Will the Apple AOC architecture scale in performance and efficiency as you run more demanding OSes on it?
 
But to support Mac OS or to support IOS with more desktop features, you'd think they'd have to really more RAM in the Apple SOCs, at least 8 GB with options to go to 16, 32 or 64 GB?
What makes you think the SoC already don't support more than current devices have? And given that the largest iPad has 6 GB of RAM, it very likely supports 8 GB.

Will the Apple AOC architecture scale in performance and efficiency as you run more demanding OSes on it?
More than the OS itself, I wonder how Apple SoC will scale with heavier multitasking, and apps requiring gigabytes of data (SPEC results look promising here).
 
This is probably a fool's errand, but has anyone been able to work out the number of ALUs in the A12? I'm trying to calculate some of this information for a single GPU core, but I'm lacking a good point of reference.
 
This is probably a fool's errand, but has anyone been able to work out the number of ALUs in the A12?
I'm trying to calculate some of this information for a single GPU core, but I'm lacking a good point of reference.
Mobile is seeing a 10X modem speed increase from 1 to 10Gbps. AI engines are increasing by 3X. CPU is inceasing by 1.5X from Specint 25 to 38. GPU is increasing by 2X from 650 GFLOPS to 1300 GFLOPS. Transistor counts from 5B to 10B...but power budgets are unchanged
source
from those blogs, i believe it was pointed to Apple A12x GPU because mobile space, transistor counts, and apple keynote announce(xbox one s GPU performance).

i guess every core have 96 FP32 ALU and 8 texture unit. 729,6 GFLOPs for 4-core A12 GPU and 1300 GLOPs for 7-core A12X GPU.

650GFLOPs for A10X GPU.
 
This is probably a fool's errand, but has anyone been able to work out the number of ALUs in the A12? I'm trying to calculate some of this information for a single GPU core, but I'm lacking a good point of reference.
It's 64 FP32 FMAs per core (or 64 Vec2 FP16 FMAs). I guess you could argue it's more "flops" than that with special function co-issue etc... but then the same would be true for many other architectures and we only count FMAs for those.

Ryan, I wrote a couple of Metal microbenchmarks a few weeks ago that had some interesting results - been meaning to clean up the code, open source it, then write a blog post about their ALU design - but I've been getting distracted by other things. I'll PM you later today or tomorrow when I have a chance :)
 
Last edited:
It does, though at minimum cTDP the consumption goes up 33%.

Technically you are correct in that its classified as a U, but as you state, even on cTDPdown it uses 33% more power, and I've never seen a laptop that uses cTDPdown exclusively.

But yea I meant 15W when talking about the Us. Outside of Apple, and maybe Sony no-one else uses 28W U chips.
 
I've never seen a laptop that uses cTDPdown exclusively.
Don't the fanless Surface Pro models with the Core i5 use cTDPdown?

But yea I meant 15W when talking about the Us. Outside of Apple, and maybe Sony no-one else uses 28W U chips.
Which is a damn shame. So many Core i5/i7 GT2 being sold in laptops with a Geforce MX110/130 that would be better off with the GT3e version instead.
 
Also, note that Apple cannot bin these processors at all. Rather, they set clock/power limits that ensure that they don't have to discard otherwise functional dies. That limitation does not apply to the x86 products.
Roughly how much performance/power is being left on the table due to the lack of binning?

For example, suppose that Apple hypothetically took the highest performing 10% (or so) of the A12X chips, clocked them as high as possible, and put them into the 13" MacBook Pro. The remaining A12X chips are distributed among the iPad Pro, MacBook, and MacBook Air. Would the difference between these two variants be large enough to justify using the same chip in Apple's entire ≤ 13" laptop line?

It's a bit odd to me that Apple SoCs are usually not binned, especially when Apple has products that could benefit from a slightly faster chip (iPhone Plus and 12.9" iPad Pro) and Apple already has performance numbers on the Compare iPad models page (not for the iPhones though) which suggests that even a small performance difference can be advertised.
 
It's a bit odd to me that Apple SoCs are usually not binned,
Every single mobile SoC out there is binned via voltage binning. The difference is that the mobile silicon vendors are binning via power rather than via performance. In the PC industry you'll not see major power differences in a certain SKU simply because the dies that perform worse will be binned into a different SKU. In mobile that bad die will simple have 20-25% worse power consumption, and the average user won't be able to tell as that difference is further diluted among the power consumption of other components in a device. Also mobile vendors design for volume and high yield - while PC vendors will possibly design for higher performance SKUs precisely because they can sell lower yielding dies as lower tier SKUs.

Also I'm not up to date on how this works on the PC space, but mobile SoCs in this regard are also highly voltage binned, with 7-10 power planes (Imagine splitting your die into 10 sections), each are individually binned in a single die to increase overall yield. This isn't something that your average PC CPU or GPU can do.
 
A12 GPU iPhone XR Floating Point Benchmarks with CPUdasherx
FP32
jjpaus.jpg

FP16
inrbll.jpg


@Arun @Nebuchadnezzar @Ryan Smith How good is the GPU A12 when compared to GPU Tegra X1 in terms of power, performance, and area? from anandtech i only know PPA of GPU A12 maximum power 6.1W(SOC i think), performance over 2x on GFXBenchmark High Aztec Ruins better than Tegra X1 GPU, and area of A12 GPU is 14.88mm2.
 
Last edited:
Depends on the frequency. If I remember correctly GPU alone can go to 9W but that's at the peak platform power of 13-14W. Really apples and oranges because of the process node differences. I don't remember the size but it was on the larger side. X2 was much more efficient.
 
Mark Gurman and Debby Wu of Bloomberg claimed a few weeks ago that the iPad Pro will get an update later this year:
Mark Gurman and Debby Wu said:
The 11-inch and 12.9-inch iPad Pros will get similar upgrades to the iPhones, gaining upgraded cameras and faster processors. Otherwise, the new iPads will look like the current versions.
In an unusual move for Apple, these iPad Pros may use an A12X instead of the often speculated "A13X."
@never_released said:
There is an Apple A12X refresh called Tinos (t8028/T0), with 4 performance cores and 4 efficiency ones. It might be the part for this year's iPads.
It may be interesting to see how Apple deals with an A12X refresh in this year's iPad Pro instead of an A13 or "A13X."

If Apple updates the iPad Pro lineup at an event (e.g. the one on September 10th), then I don't think Apple will mention the A12X refresh as it may not look good for marketing when compared to the A13. Instead, I think Apple will focus on the cameras.

The higher-end iPhones this year, rumored to be called "iPhone 11 Pro," are expected to have three rear cameras. The iPad Pro is also rumored to have three rear cameras, so Apple may just spend one slide talking about how the magical new cameras on the "iPhone 11 Pro" and "iPhone 11 Pro Max" are also coming to the iPad Pro.
 
Back
Top