Apple is an existential threat to the PC

Geekbench 6 benchmarks starting to appear.

This one shows us the M4 (base frequency 4.4GHz) in an iPad solidly trouncing the M3 series in single-core and matching the M3 Pro in multi-core.

Single-Core score: 3810 (20% increase over M3)
Multi-Core score: 14541 (25% increase over M3)

It seems that Geekbench 6.3 introduced support for the Arm Scalable Matrix Extensions (SME) and the M4 finally exposes that for applications.




It's the fastest single-core performance in any hardware available in any form factor today, be it server rack, desktop computers or anything else for that matter. This is just in a 5,1mm thin tablet 😅

The performance per watt is really great.
Extremely impressive -- really goes to show that the M3 and A17 was a stop-gap solution.
I fully except a transition of the entire Mac lineup to the M4 -- though I still have a lot of questions about how they're going to handle the Mac Pro. If that thing doesn't support PCIE GPUs or swappable ECC-RAM, then it's a waste of space.
 
Extremely impressive -- really goes to show that the M3 and A17 was a stop-gap solution.
I fully except a transition of the entire Mac lineup to the M4 -- though I still have a lot of questions about how they're going to handle the Mac Pro. If that thing doesn't support PCIE GPUs or swappable ECC-RAM, then it's a waste of space.

It looks like because Geekbench 6.3 now supports SME and M4 seems to support SME too, quite a lot of performance comes from that.
If we look at the "ordinary" subtests, comparing to a M3 the advantage is now between 10% ~ 20% in single threaded tests. Furthermore, most of the M4 results seems to be between 3600 and 3700, only two of these results are > 3800.
It's still quite nice as that suggests around 5% of IPC improvements (M4 is ~10% higher clock than M3), but probably not as dramatic as the initial numbers might suggest.
 
It looks like because Geekbench 6.3 now supports SME and M4 seems to support SME too, quite a lot of performance comes from that.
If we look at the "ordinary" subtests, comparing to a M3 the advantage is now between 10% ~ 20% in single threaded tests. Furthermore, most of the M4 results seems to be between 3600 and 3700, only two of these results are > 3800.
It's still quite nice as that suggests around 5% of IPC improvements (M4 is ~10% higher clock than M3), but probably not as dramatic as the initial numbers might suggest.
Which are great increases for just 7 months. Remember some of the scores are probably from the 3 performance core variant as well.

Memory bandwidth also increased by 20% from 100GB/s to 120GB/s.

I'm guessing the less dense N3E process really helps with clock speeds at the expense of a slightly larger die.
 
Which are great increases for just 7 months. Remember some of the scores are probably from the 3 performance core variant as well.

Memory bandwidth also increased by 20% from 100GB/s to 120GB/s.

I'm guessing the less dense N3E process really helps with clock speeds at the expense of a slightly larger die.

All results I've seen on Geekbench 6's browser are from the 10 cores variant.
Of course, the comparison is not entirely fair, even with iPad Pro vs Macbook Air. Macbook Air probably have more thermal headroom. On the other hand, Geekbench's tests generally are not that long to be thermal limited.
Also I'm not sure if it's still the case that Geekbench has different test sizes between "desktop" and "mobile" versions. It used to be the case but I didn't see that mentioned in the latest whitepaper so maybe it's no longer the case.
Anyway this is quite interesting because just recently there seems to be some people claiming that Apple "hit a wall" and Intel/AMD will soon catch up. Apparently the rumors are exaggerated. ;)
 
All results I've seen on Geekbench 6's browser are from the 10 cores variant.
I guess Apple only sent 10 cores variant to reviewers ;)

Anyway this is quite interesting because just recently there seems to be some people claiming that Apple "hit a wall" and Intel/AMD will soon catch up. Apparently the rumors are exaggerated. ;)
Since M1, the IPC increase has been rather small. I'm still surprised they could push the frequency that high given how wide their microarchitecture is. And while keeping power under control. They definitely have a very good implementation team.
 
The problem with PC is that the entire architecture is bottlenecked; you have slow busses requiring huge memory pools and also the GPU and CPU are separate and cannot really work together
 
There is not really a lot of back and forth on most workloads. There's redundancy and it hurts efficiency a bit, but above a couple 10s of Watt if hardly matters any more.

Meanwhile it allowed a lot more innovation ... not all companies can just buy all their know how :p
 
One of the reasons why GPU and CPU use separate memory pools is, other than historical reasons, and the benefits mentioned above by MfA, because CPU and GPU had different memory requirements. CPU generally loves lower latency but not that sensitive to bandwidth, and GPU are the opposite. If you look at current consoles, they basically sacrificed the CPU in favor of GPU by using a higher latency (but more bandwidth) memory.

Some technologies allows the best of both worlds. such as the way Apple is doing with M series CPU. Basically they use the memory designed for CPU but stack a lot of them to provide better bandwidth. However, it's not cheap and the bandwidth is still not that great. For example, a full M3 Max has 400GB/s memory bandwith, lower than a desktop GeForce 4070 GPU, which has 500GB/s bandwidth. It's not bad for a laptop, but you can't really do much better even if power consumption is not that limited. The best you can get is a M2 Ultra which doubles the bandwidth, still less than a GeForce 4090. There is also HBM of course but that's currently just too expensive for consumer market.

With CPU going massively multicore today, it could be that in the future CPU will be more like a GPU and will prefer higher bandwidth than latency. However, as long as single thread performance is still important, latency is still very important for CPU, and the best we can do is to put a lot of cache to reduce that. Techs like AMD's V3D cache could be a solution but they are also not cheap. It'd be interesting to see how the CPU performs in some sort of SoC with V3D cache and GPU style memory.
 
One of the reasons why GPU and CPU use separate memory pools is, other than historical reasons, and the benefits mentioned above by MfA, because CPU and GPU had different memory requirements. CPU generally loves lower latency but not that sensitive to bandwidth, and GPU are the opposite. If you look at current consoles, they basically sacrificed the CPU in favor of GPU by using a higher latency (but more bandwidth) memory.

Some technologies allows the best of both worlds. such as the way Apple is doing with M series CPU. Basically they use the memory designed for CPU but stack a lot of them to provide better bandwidth. However, it's not cheap and the bandwidth is still not that great. For example, a full M3 Max has 400GB/s memory bandwith, lower than a desktop GeForce 4070 GPU, which has 500GB/s bandwidth. It's not bad for a laptop, but you can't really do much better even if power consumption is not that limited. The best you can get is a M2 Ultra which doubles the bandwidth, still less than a GeForce 4090. There is also HBM of course but that's currently just too expensive for consumer market.

With CPU going massively multicore today, it could be that in the future CPU will be more like a GPU and will prefer higher bandwidth than latency. However, as long as single thread performance is still important, latency is still very important for CPU, and the best we can do is to put a lot of cache to reduce that. Techs like AMD's V3D cache could be a solution but they are also not cheap. It'd be interesting to see how the CPU performs in some sort of SoC with V3D cache and GPU style memory.
They (meaning the GeForce 4070 or 4090 for that matter and the Apple GPU IP) does not have the same memory bandwidth requirements. At least acknowledge that 😅

That may change if Apple intends on building a larger GPU in the future.

The 32-core GPU in the M1 Max (432²mm) is only 114.5mm² or 26.5% of the total size of the SoC sans SLC blocks.

The GPU may take up more than that in the M3 Max but I don't have any die shot or die area to go by right now.
 
They (meaning the GeForce 4070 or 4090 for that matter and the Apple GPU IP) does not have the same memory bandwidth requirements. At least acknowledge that 😅

That may change if Apple intends on building a larger GPU in the future.

The 32-core GPU in the M1 Max (432²mm) is only 114.5mm² or 26.5% of the total size of the SoC sans SLC blocks.

The GPU may take up more than that in the M3 Max but I don't have any die shot or die area to go by right now.

Of course, I'm not saying M3 Max is designed to be as fast as a 4090. However, if you want a 4090-ish system (e.g. a SoC with a 7950X3D-ish CPU and a 4090-ish GPU), it'll be a lot more expensive if you do it with a lot of LPDDR5. :)
 
The 4090 desktop is 2.2x times faster than M3 Max in testing Adobe Premiere Pro GPU Effects.
The 4090 desktop is also 2.5x times faster in Davinci Resolve Studio GPU score, 2.9x faster in AI score, and 2.27x times faster in RAW score.
The 4090 desktop is also 2.55x times faster in Cinebench GPU score and 3.0x times faster in Blender GPU score.

Overall the 4090 is anywhere between 2.2x to 3.0x times faster in GPU limited tests.

The is the first comprehensive GPU benchmark set comparing the best of Apple (M3 Max) vs the best of NVIDIA (RTX 4090). Although the best of NVIDIA remains the RTX 6000 Ada professional GPU.

 
It's actually not too surprising, as M3 Max has 40% of 4090's memory bandwidth and maybe 1/4 of 4090's FLOPS. It's actually pretty competitive considering the differences.
I have tested some LLM inferences on both a 3090 on Linux and a 40-core M3 Max, and the 3090 is about twice as fast as the M3 Max.
 
More reasonable comparison is using the 4080 super for price concious consumers.
Comparing Mac to PC is never as simple as raw benchmark results. The PCs we used for the comparison range from somewhat cheaper than the M3 Max MacBook Pro to 4 times the cost and are locked to your desk rather than being portable (except for the Puget Mobile 17″, of course). In addition, the operating systems and available applications are different. However, one part of deciding on whether to buy a Mac or PC is the performance, both at comparable prices and at the top end of what is possible.
...
On the GPU rendering side, Apple’s current best is about half the performance of an NVIDIA RTX 4080 SUPER or 4090 mobile. Rendering is also an area where, on the PC side, money can translate directly to performance. Upgrading a PC to the AMD Ryzen Threadripper PRO 7995WX (although staggeringly expensive) or multiple NVIDIA RTX 4090 GPUs can result in scores up to an order of magnitude larger than anything Apple can do. Although most users don’t need (or have the budget for) that type of setup, a PC is the only option for those who do.
 
The 4090 desktop is 2.2x times faster than M3 Max in testing Adobe Premiere Pro GPU Effects.
The 4090 desktop is also 2.5x times faster in Davinci Resolve Studio GPU score, 2.9x faster in AI score, and 2.27x times faster in RAW score.
The 4090 desktop is also 2.55x times faster in Cinebench GPU score and 3.0x times faster in Blender GPU score.

Overall the 4090 is anywhere between 2.2x to 3.0x times faster in GPU limited tests.

The is the first comprehensive GPU benchmark set comparing the best of Apple (M3 Max) vs the best of NVIDIA (RTX 4090). Although the best of NVIDIA remains the RTX 6000 Ada professional GPU.


If the pc is 3 times faster then it will use 6 times the energy to accomplish that.
 
The 4090 desktop is 2.2x times faster than M3 Max in testing Adobe Premiere Pro GPU Effects.
The 4090 desktop is also 2.5x times faster in Davinci Resolve Studio GPU score, 2.9x faster in AI score, and 2.27x times faster in RAW score.
The 4090 desktop is also 2.55x times faster in Cinebench GPU score and 3.0x times faster in Blender GPU score.

Overall the 4090 is anywhere between 2.2x to 3.0x times faster in GPU limited tests.

The is the first comprehensive GPU benchmark set comparing the best of Apple (M3 Max) vs the best of NVIDIA (RTX 4090). Although the best of NVIDIA remains the RTX 6000 Ada professional GPU.

Could you please start using basic math concepts right? When you say "2.2 times faster" you actually mean "1.2 times faster" or "2.2 times as fast", it's not 2.2 times faster.

If the pc is 3 times faster then it will use 6 times the energy to accomplish that.
And if you slow down the PC to match the scores the power consumption plummets down a lot faster than performance, could be close match or win for either, we don't know 'till someone tests it.
 
Back
Top