I think they want to say its 6 cluster because that was Anandtech's assumption on the day of the iphone6 announcement. The fact that they obviously could not identify 6 similar looking areas does not seem to have dissuaded them from marking 6 equal sized blocks and calling it a GX6650 !
I can do some die shot analysis like everyone else.Rys, are you allowed to confirm what GPU model and what GPU cluster/core config is used here?
Rys, are you allowed to confirm what GPU model and what GPU cluster/core config is used here?
I can do some die shot analysis like everyone else.
Just from looking at the zoomed shot, there are 4 repeated large blocks of digital logic with a layout that definitely says "processor" to me (because of the SRAM layouts) and that they appear to operate in pairs.
Jason's analysis of the rest of the chip also needs some work.
Is it possible that A8's GPU is actually 6 clusters oriented horizontally in the diagram (with 3 clusters next to each other on the top portion of the GPU and 3 clusters next to each other on the bottom portion of the GPU)? To be honest, the die shot of the GPU area is not particularly clear, especially compared to the CPU area.
I can do some die shot analysis like everyone else.
Just from looking at the zoomed shot, there are 4 repeated large blocks of digital logic with a layout that definitely says "processor" to me (because of the SRAM layouts) and that they appear to operate in pairs.
Jason's analysis of the rest of the chip also needs some work.
G6430 = 192 FP16 SPs
GX6450 = 256 FP16 SPs
---------------------------
Increase = 25%
You are correct, although that is technically a 33.3% increase. So the "up to 50% GPU perf. improvement" per cluster and per clock for Series 6XT (GX6450) vs Series 6 (G6430) comes from 33.3% improvement in number of FP16 ALU's and > 10% improvement in throughput efficiency.
So it is pretty safe to say that A8 is using the 4 cluster Series 6XT GX6450 with a GPU clock operating frequency similar to what was used in A7.
http://arstechnica.com/apple/2014/09/iphone-6-and-6-plus-in-deep-with-apples-thinnest-phones/3/
Ars Technica has throttling results from a beta GeekBench build. The A8 can still only sustain max clock speeds for a minute which is about 20 seconds better than the A7. The A8 then drops to 1.2 GHz on the 6 and 1.15 GHz on the 6 Plus compared to 1 GHz for the A7 for 10 minutes. Finally after 30 minutes the A8 in the 6 is a 950 MHz, the 6 Plus is at 1.1 GHz, and the A7 is at 750 Mhz. There's some weird recovery behaviour in the iPhone 6 too. So A8 throttling is better than the A7, but not the straight line Apple showed in their keynote.
.We don’t know what kind of workload this slide represents, but the implied message is that the A8 didn’t need to throttle down at all
G6430 = 192 FP16 SPs
GX6450 = 256 FP16 SPs
---------------------------
Increase = 25%
Going back to the CPU - I really don't understand why it's so hard to figure out the frequency on SoCs of *any* vendor?
Every modern CPU has a 1-cycle INT32 ADD latency. So just run a loop of, say, 1000 dependent ADDs (or whatever fits in the instruction cache) and there's no way for multi-issue or OoOE to optimise that. So that should give you the exact frequency very easily. Am I missing something? (yes, you might need a debug build to prevent optimisation, or just write the assembly manually - but again that doesn't seem like a huge deal?)
What the increase in performance might be is open to debate. However adding 64 ALUs to 192 ALUs is definitely increasing the ALUs by 33%
eMMC controller, USB controllers, PCIe controllers, image signal processor(s), hardware video decoders/encoders, scalers, jpeg units (maybe), PLLs, chip network interconnect, memory interfaces, low power audio blocks, ....Interesting that over 50% of the chip is not graphics/cpu/identified cache. That's a lot of real estate taken up by what ?
eMMC controller, USB controllers, PCIe controllers, image signal processor(s), hardware video decoders/encoders, scalers, jpeg units (maybe), PLLs, chip network interconnect, memory interfaces, low power audio blocks, ....
Let me know if I miss something that hasn't been integrated in a SoC. And FYI, that's just an example of what you find in current various SoCs, not what I say is in the A8.
http://forum.beyond3d.com/showpost.php?p=1875389&postcount=136I'm sure there are many things you missed in that list, but my indirect point is that, the non GPU and non CPU stuff is taking increasingly large proportions of the die.
Most of what you mentioned (obviously not PCIe controller) in some form would be in the A6X as an example. And yet GPU and CPU took up about 60% of that die.
http://cdn.iphonehacks.com/wp-content/uploads/2012/11/a6x-899x1024.jpg
It looks to me that apple has only given 40% of the total die area to the GPU & CPU in the A8.
I was estimating a 60% shrink in the L3 cache in line with the 59% shrink in the CPU. The GPU only shrunk by 70% so there's clearly new functionality there.As an aside, the 4MB cache doesn't seem to have shrunk much in absolute measurements