Apple is an existential threat to the PC

Geekbench 6 benchmarks starting to appear.

This one shows us the M4 (base frequency 4.4GHz) in an iPad solidly trouncing the M3 series in single-core and matching the M3 Pro in multi-core.

Single-Core score: 3810 (20% increase over M3)
Multi-Core score: 14541 (25% increase over M3)

It seems that Geekbench 6.3 introduced support for the Arm Scalable Matrix Extensions (SME) and the M4 finally exposes that for applications.




It's the fastest single-core performance in any hardware available in any form factor today, be it server rack, desktop computers or anything else for that matter. This is just in a 5,1mm thin tablet 😅

The performance per watt is really great.
Extremely impressive -- really goes to show that the M3 and A17 was a stop-gap solution.
I fully except a transition of the entire Mac lineup to the M4 -- though I still have a lot of questions about how they're going to handle the Mac Pro. If that thing doesn't support PCIE GPUs or swappable ECC-RAM, then it's a waste of space.
 
Extremely impressive -- really goes to show that the M3 and A17 was a stop-gap solution.
I fully except a transition of the entire Mac lineup to the M4 -- though I still have a lot of questions about how they're going to handle the Mac Pro. If that thing doesn't support PCIE GPUs or swappable ECC-RAM, then it's a waste of space.

It looks like because Geekbench 6.3 now supports SME and M4 seems to support SME too, quite a lot of performance comes from that.
If we look at the "ordinary" subtests, comparing to a M3 the advantage is now between 10% ~ 20% in single threaded tests. Furthermore, most of the M4 results seems to be between 3600 and 3700, only two of these results are > 3800.
It's still quite nice as that suggests around 5% of IPC improvements (M4 is ~10% higher clock than M3), but probably not as dramatic as the initial numbers might suggest.
 
It looks like because Geekbench 6.3 now supports SME and M4 seems to support SME too, quite a lot of performance comes from that.
If we look at the "ordinary" subtests, comparing to a M3 the advantage is now between 10% ~ 20% in single threaded tests. Furthermore, most of the M4 results seems to be between 3600 and 3700, only two of these results are > 3800.
It's still quite nice as that suggests around 5% of IPC improvements (M4 is ~10% higher clock than M3), but probably not as dramatic as the initial numbers might suggest.
Which are great increases for just 7 months. Remember some of the scores are probably from the 3 performance core variant as well.

Memory bandwidth also increased by 20% from 100GB/s to 120GB/s.

I'm guessing the less dense N3E process really helps with clock speeds at the expense of a slightly larger die.
 
Which are great increases for just 7 months. Remember some of the scores are probably from the 3 performance core variant as well.

Memory bandwidth also increased by 20% from 100GB/s to 120GB/s.

I'm guessing the less dense N3E process really helps with clock speeds at the expense of a slightly larger die.

All results I've seen on Geekbench 6's browser are from the 10 cores variant.
Of course, the comparison is not entirely fair, even with iPad Pro vs Macbook Air. Macbook Air probably have more thermal headroom. On the other hand, Geekbench's tests generally are not that long to be thermal limited.
Also I'm not sure if it's still the case that Geekbench has different test sizes between "desktop" and "mobile" versions. It used to be the case but I didn't see that mentioned in the latest whitepaper so maybe it's no longer the case.
Anyway this is quite interesting because just recently there seems to be some people claiming that Apple "hit a wall" and Intel/AMD will soon catch up. Apparently the rumors are exaggerated. ;)
 
All results I've seen on Geekbench 6's browser are from the 10 cores variant.
I guess Apple only sent 10 cores variant to reviewers ;)

Anyway this is quite interesting because just recently there seems to be some people claiming that Apple "hit a wall" and Intel/AMD will soon catch up. Apparently the rumors are exaggerated. ;)
Since M1, the IPC increase has been rather small. I'm still surprised they could push the frequency that high given how wide their microarchitecture is. And while keeping power under control. They definitely have a very good implementation team.
 
The problem with PC is that the entire architecture is bottlenecked; you have slow busses requiring huge memory pools and also the GPU and CPU are separate and cannot really work together
 
There is not really a lot of back and forth on most workloads. There's redundancy and it hurts efficiency a bit, but above a couple 10s of Watt if hardly matters any more.

Meanwhile it allowed a lot more innovation ... not all companies can just buy all their know how :p
 
One of the reasons why GPU and CPU use separate memory pools is, other than historical reasons, and the benefits mentioned above by MfA, because CPU and GPU had different memory requirements. CPU generally loves lower latency but not that sensitive to bandwidth, and GPU are the opposite. If you look at current consoles, they basically sacrificed the CPU in favor of GPU by using a higher latency (but more bandwidth) memory.

Some technologies allows the best of both worlds. such as the way Apple is doing with M series CPU. Basically they use the memory designed for CPU but stack a lot of them to provide better bandwidth. However, it's not cheap and the bandwidth is still not that great. For example, a full M3 Max has 400GB/s memory bandwith, lower than a desktop GeForce 4070 GPU, which has 500GB/s bandwidth. It's not bad for a laptop, but you can't really do much better even if power consumption is not that limited. The best you can get is a M2 Ultra which doubles the bandwidth, still less than a GeForce 4090. There is also HBM of course but that's currently just too expensive for consumer market.

With CPU going massively multicore today, it could be that in the future CPU will be more like a GPU and will prefer higher bandwidth than latency. However, as long as single thread performance is still important, latency is still very important for CPU, and the best we can do is to put a lot of cache to reduce that. Techs like AMD's V3D cache could be a solution but they are also not cheap. It'd be interesting to see how the CPU performs in some sort of SoC with V3D cache and GPU style memory.
 
One of the reasons why GPU and CPU use separate memory pools is, other than historical reasons, and the benefits mentioned above by MfA, because CPU and GPU had different memory requirements. CPU generally loves lower latency but not that sensitive to bandwidth, and GPU are the opposite. If you look at current consoles, they basically sacrificed the CPU in favor of GPU by using a higher latency (but more bandwidth) memory.

Some technologies allows the best of both worlds. such as the way Apple is doing with M series CPU. Basically they use the memory designed for CPU but stack a lot of them to provide better bandwidth. However, it's not cheap and the bandwidth is still not that great. For example, a full M3 Max has 400GB/s memory bandwith, lower than a desktop GeForce 4070 GPU, which has 500GB/s bandwidth. It's not bad for a laptop, but you can't really do much better even if power consumption is not that limited. The best you can get is a M2 Ultra which doubles the bandwidth, still less than a GeForce 4090. There is also HBM of course but that's currently just too expensive for consumer market.

With CPU going massively multicore today, it could be that in the future CPU will be more like a GPU and will prefer higher bandwidth than latency. However, as long as single thread performance is still important, latency is still very important for CPU, and the best we can do is to put a lot of cache to reduce that. Techs like AMD's V3D cache could be a solution but they are also not cheap. It'd be interesting to see how the CPU performs in some sort of SoC with V3D cache and GPU style memory.
They (meaning the GeForce 4070 or 4090 for that matter and the Apple GPU IP) does not have the same memory bandwidth requirements. At least acknowledge that 😅

That may change if Apple intends on building a larger GPU in the future.

The 32-core GPU in the M1 Max (432²mm) is only 114.5mm² or 26.5% of the total size of the SoC sans SLC blocks.

The GPU may take up more than that in the M3 Max but I don't have any die shot or die area to go by right now.
 
They (meaning the GeForce 4070 or 4090 for that matter and the Apple GPU IP) does not have the same memory bandwidth requirements. At least acknowledge that 😅

That may change if Apple intends on building a larger GPU in the future.

The 32-core GPU in the M1 Max (432²mm) is only 114.5mm² or 26.5% of the total size of the SoC sans SLC blocks.

The GPU may take up more than that in the M3 Max but I don't have any die shot or die area to go by right now.

Of course, I'm not saying M3 Max is designed to be as fast as a 4090. However, if you want a 4090-ish system (e.g. a SoC with a 7950X3D-ish CPU and a 4090-ish GPU), it'll be a lot more expensive if you do it with a lot of LPDDR5. :)
 
Back
Top