Apple is an existential threat to the PC

Even with those "low level" API, driver optimizations are still important. They need to know which use cases are more common and what could be the major bottlenecks. For example, if a commonly used operation is slow in some way, it won't perform to the full potential of the hardware. This is not necessarily a hardware limitations, but the knowledge of what's the most important operations to optimize is critical in both hardware and software designs.
Driver optimizations aren't as important as you would believe they are in this day and age. Applications are now expected to be designed to reach the "fast paths" in a driver/hardware and if they don't well then there's not a whole lot a 'thin' driver implementation can do to save them. Great power (pipeline states/persistent mapping/bindless/GPU-driven rendering) comes with great responsibility (barriers/PSOs/explicit memory management) since these features can frequently reduce the amount of 'blackbox' contained in the driver which ties the hands of IHVs ...

An IHV today might frequently do some shader replacement to bypass the driver compiler or set an application to use an alternate API implementation path in the driver but the days of IHVs straight up fixing/working around buggy graphics code or replacing major section of a game's renderer are well behind us ...

If specific modern API usage or patterns are found to be slow then it's either a case of genuinely mediocre hardware (HW functionality itself is slow) or complex API emulation in the driver (HW has no native functionality for API). If Apple wants to be more competitive in high-end graphics technology then they have to design their future graphics architectures in mind to be closer to AMD/NV (HW does MOST of the API work for them) or end up like the others (Intel/QCOM) because no PC/console developer will create an "Apple specific renderer" where they have to do content changes just to avoid the pitfalls their other rivals don't have since they downright have fast API/HW implementations that developers mostly stick to using! What *worked* for Apple in mobile land isn't going to work in console/desktop land ...
 
I don't really see which part of what you said contradicts with what I said. :) The term "driver optimization" means many things, including what you said, such as shader replacements (which is probably the major part of today's driver optimizations).
We don't really know what's the problem (or problems) with Apple's poor performance on games, but it certainly not because the hardwares are too weak. Therefore, the most likely problems are probably either poorly optimized games, pooly optimized drivers (which includes the API layer), or some hardware inefficiencies. If I have to guess I'd say it's probably all of them. Years of experiences of writing drivers for those games also help when you need to diagnose what's the problem in a game's rendering pipeline. I doubt that Apple are able to provide the same level of support and insights to game developers as NVIDIA and AMD can.
 
I don't really see which part of what you said contradicts with what I said. :) The term "driver optimization" means many things, including what you said, such as shader replacements (which is probably the major part of today's driver optimizations).
We don't really know what's the problem (or problems) with Apple's poor performance on games, but it certainly not because the hardwares are too weak. Therefore, the most likely problems are probably either poorly optimized games, pooly optimized drivers (which includes the API layer), or some hardware inefficiencies. If I have to guess I'd say it's probably all of them. Years of experiences of writing drivers for those games also help when you need to diagnose what's the problem in a game's rendering pipeline. I doubt that Apple are able to provide the same level of support and insights to game developers as NVIDIA and AMD can.
Shader replacement can't really explain the major gap and if a game is "poorly optimized" on their hardware, that's almost certainly because Apple GPU designs aren't used to PC/console graphics workloads (they don't see many releases of graphically high-end games so they hardly have any incentives to optimize their hardware around them) with deferred renderers, async compute, GPU-driven rendering, bindless, virtual texturing/geometry or all sorts of state of the art graphics technology to which mobile graphics technology haven't made those leaps yet ...

Their shader architecture might look fine on the surface but what about their fixed function hardware for which they do very little disclosure of ? Tile based renderers (Apple designs belong in this family) aren't known to scale well with very high geometry density and quite a few implementations don't even feature hardware blending units either! Apple HW goes so far as to NOT implement industry standards compliant (D3D12/VK) texture units so we have to wonder if games had to do "performance degrading workarounds" since their API itself is deficient (by extension the HW design too) ...

In the past, Metal API had to support multiple vendors since Apple didn't offer in-house GPU solutions in the desktop space but that eventually changed too so now Metal and especially it's newer iterations should be a MUCH CLOSER fundamental reflection of their HW design!
 
Last edited:
Shader replacement can't really explain the major gap and if a game is "poorly optimized" on their hardware, that's almost certainly because Apple GPU designs aren't used to PC/console graphics workloads (they don't see many releases of graphically high-end games so they hardly have any incentives to optimize their hardware around them) with deferred renderers, async compute, GPU-driven rendering, bindless, virtual texturing/geometry or all sorts of state of the art graphics technology to which mobile graphics technology haven't made those leaps yet ...

Their shader architecture might look fine on the surface but what about their fixed function hardware for which they do very little disclosure of ? Tile based renderers (Apple designs belong in this family) aren't known to scale well with very high geometry density and quite a few implementations don't even feature hardware blending units either! Apple HW goes so far as to NOT implement industry standards compliant (D3D12/VK) texture units so we have to wonder if games had to do "performance degrading workarounds" since their API itself is deficient (by extension the HW design too) ...

In the past, Metal API had to support multiple vendors since Apple didn't offer in-house GPU solutions in the desktop space but that eventually changed too so now Metal and especially it's newer iterations should be a MUCH CLOSER fundamental reflection of their HW design!

You talked about a lot of possible reasons why Apple's hardware performs badly in those games, but you don't know which ones (if any) are actually the reason. That's exactly my point: both NVIDIA and AMD have years of experiences on supporting games so they know. Apple don't, so they probably don't know. If Apple had similar experiences they'll have hardware designs more suitable, or they'll know how to advise game developers to work around these issues.
I also don't think the games are very well optimized for Apple's hardware. There were some games ported to mobile phones and they don't run very well either, so they didn't sell very well. I don't think many game developers are willing to spend a lot of resources on optimizing for Macs, unfortunately.
 
You talked about a lot of possible reasons why Apple's hardware performs badly in those games, but you don't know which ones (if any) are actually the reason. That's exactly my point: both NVIDIA and AMD have years of experiences on supporting games so they know. Apple don't, so they probably don't know. If Apple had similar experiences they'll have hardware designs more suitable, or they'll know how to advise game developers to work around these issues.
I also don't think the games are very well optimized for Apple's hardware. There were some games ported to mobile phones and they don't run very well either, so they didn't sell very well. I don't think many game developers are willing to spend a lot of resources on optimizing for Macs, unfortunately.
GPU-driven rendering is still a pain point for mobile GPUs in general and many PC/console ports don't make good use of renderpass APIs too (because it's not required for D3D12!) so I believe we DO actually have a good idea of their source of bad performance and I think Apple's own engineers know as well since it's their job to performance profile these applications too ...

Sure it helps that AMD and Nvidia have more experience but they're first and foremost proactive about how they approach high-end graphics rather than Apple's proscriptive stance of preferring to do nearly NOTHING about their problems because they don't want to spend their resources on making TWO separate architectures (one for desktop & one for mobile) with different drivers. They're well AWARE that what they're doing for mobile graphics isn't translating well to desktop graphics ...
 
GPU-driven rendering is still a pain point for mobile GPUs in general and many PC/console ports don't make good use of renderpass APIs too (because it's not required for D3D12!) so I believe we DO actually have a good idea of their source of bad performance and I think Apple's own engineers know as well since it's their job to performance profile these applications too ...

Sure it helps that AMD and Nvidia have more experience but they're first and foremost proactive about how they approach high-end graphics rather than Apple's proscriptive stance of preferring to do nearly NOTHING about their problems because they don't want to spend their resources on making TWO separate architectures (one for desktop & one for mobile) with different drivers. They're well AWARE that what they're doing for mobile graphics isn't translating well to desktop graphics ...

I think if they wanted and they know how they don't even have to make two separate architectures. It's entirely possible to make something workable for both settings, but since Apple never actually like games (despite the fact that the majority of App Store apps are games), they only recently actively trying to push games on Macs. It's just not that easy to overcome these years of negligence.

Personally I'm not bullish on Mac games because Mac are expensive and many people bought them as work machines. The market share is just not big enough to justify the cost of development for many game studios (unless Apple's willing to subsidize them, which I believe is probably the case for many projects). Today's game developers already face huge hurdles optimizing their games for PC well, it just does not make much sense to do more works for a much smaller market.
 
I think if they wanted and they know how they don't even have to make two separate architectures. It's entirely possible to make something workable for both settings, but since Apple never actually like games (despite the fact that the majority of App Store apps are games), they only recently actively trying to push games on Macs. It's just not that easy to overcome these years of negligence.
How can they not make separate architectures at this point ? Mobile graphics lives in an alternate reality where forward rendering is still the default lighting technique with questionable compute shader perf. Low geo density of mobile content and their carefully minimized renderpasses (goes out the window desktop style deferred) also works well for tile-based renderers ...

The only architecture that has somewhat succeeded in targetting both markets were AMD's RDNA 2/3 but their future designs are looking less suitable for that intention because the direction of PC graphics technology (ML upscaling & ray tracing) forced them to pivot to more hardware implementation overhead leaving features that aren't important in the mobile space ...
Personally I'm not bullish on Mac games because Mac are expensive and many people bought them as work machines. The market share is just not big enough to justify the cost of development for many game studios (unless Apple's willing to subsidize them, which I believe is probably the case for many projects). Today's game developers already face huge hurdles optimizing their games for PC well, it just does not make much sense to do more works for a much smaller market.
I think what Apple should do (if they're truly serious about high-end graphics) is to establish an independent graphics division with their own management to be able to create D3D12/Windows compatible graphics cards so that way they'll have more freedom to compete directly against AMD/NV because right now Apple refuses to open themselves to competitive pressure in that segment ...
 
Apple makes the most money on games from iOS, that is mobile games with things like loot boxes and other distasteful business models.

In fact, I believe Apple makes more money from games, if you can call mobile games which try to get addicts to spend a lot on micro transactions real games.

It's doubtful that they are interested in catering to the gaming market. Sure they will have demos at some keynotes but they know they're not selling a lot of M4 Max and M3 Ultra Macs to gamers.
 
When it comes to productivity applications the performance is better aligned. It seems those software makers have a better grip on the hardware than many of game companies. Just because it's a native AArch64 and use Metal doesn't mean it is well done. Blender shows the M4 Max performance is clearly better than what people seem to suggest here.

Interestingly the M4 Max gives around 130-139 points per GPU core and the M3 Max / Ultra only gives us 90-105 points per GPU core in Blender. Perhaps Apple felt that an M4 Ultra would be too high of a generational jump or something silly.

Screenshot 2025-03-24 at 12.19.14.png
Screenshot 2025-03-24 at 12.30.58.png

In other applications like the Adobe suite it also shows great performance in both Photoshop, Lightroom and Premiere Pro.

Games have also shown consistently bad scaling with the chips using UltraFusion (CoWoS-S) from the M1 Ultra, M2 Ultra and now M3 Ultra. I'm sure Apple could build a powerful gaming GPU if they were less conservative about where they need to land on power efficiency curve and had NVIDIA and AMD power usage. Except that would ruin their laptop first approach (and where their majority of sales are).

Apple makes the most money on games from iOS, that is mobile games with things like loot boxes and other distasteful business models.

In fact, I believe Apple makes more money from games, if you can call mobile games which try to get addicts to spend a lot on micro transactions real games.

It's doubtful that they are interested in catering to the gaming market. Sure they will have demos at some keynotes but they know they're not selling a lot of M4 Max and M3 Ultra Macs to gamers.
The business practices of individual game companies are not really Apple's "business" as long as they play by the rules (given by society and the elected officials) but yes, they do make more money than Microsoft, Nintendo and Sony combined in the gaming segment. A lot of that segment with dubious practices also happens to be available on consoles and Android. App / Game stores are filled with gacha / gambling mechanics.

It's silly saying that Apple is "not interested in the gaming market". They have consistently raised the lower end of the market with each generation of Apple Silicon. The M4 is now 4TFlops and offers much more advanced features.

They should really have a stronger business presence with developers though, so more games would get a native port.

I believe many people use Macs for both business and gaming. It may not be the primary reason for using Macs.
 
It's silly saying that Apple is "not interested in the gaming market". They have consistently raised the lower end of the market with each generation of Apple Silicon. The M4 is now 4TFlops and offers much more advanced features.

They should really have a stronger business presence with developers though, so more games would get a native port.

I believe many people use Macs for both business and gaming. It may not be the primary reason for using Macs.

They might be now, but they certainly weren't before. It takes many years for Apple to actually show some interests in gaming market. Otherwise you'd think Apple should already pay Epic Games to make a good port of Unreal engine on Macs.
It's better that they seem to put more efforts on games now, and I think it's good that there are some competitions, but as I said it will take many years. The hardware is probably fine (or at least some issues should be solvable), but the software side still have a lot of catch up to do.
 
I don't know if they're specifically trying to improve gaming per se.

They just lean into their strengths, one of which is access to leading edge fabs and using that access to load it up with GPU cores.

So they have to give reasons for why having an SOC with a lot of GPU power is a benefit and so they give some lip service about gaming.

I'm considering purchasing a Mac Studio this year and my most taxing use will be photo and video editing. There's some benefit from GPU acceleration but probably not to justify the M3 Ultra over the M4 Max, other than being able to load up a ton of RAM.

I would spend more money on RAM and maybe SSD storage over CPU and GPU cores.

But actually also interested in the display update which may come from them, like an Apple Studio Display with miniLED to support HDR and also Pro Motion.
 
Well, these integrated GPU do have pretty good tech specs. For example, M3 Max has more than 400GB/s main memory bandwidth, which is about 90% of a 3060 Ti or 3070, or PS5. Its theoretical FP32 performance is 16TFLOPS, which is again roughly the same as a 3060 Ti, and actually almost to the PS5 Pro level (which is 18TFLOPS). So in raw performance a full M3 Max should be roughly the same as a 3060 Ti. The M3 Ultra, which is basically 2 M3 Max welded together (which means double everything, including memory bandwidth), should be just slightly weaker than a 3090.
All of this comparison packs the same assumptions as console performance -- yeah, lots of gigabytes per second and mips and tops and flops and ints, but it's also a shared power limit, host memory bus, I/O die and cache hierarchy as the CPU core complex. It's not so simple as a dedicated card with all of those things with it's own power budget, it's own I/O control, it's own dedicated memory pool, and its own caching mechanisms. Yes, the M-series has some cool capabilities around dedicated cache pools for GPU vs CPU workloads, however the simple reality is it's all still shared.

All the work the CPU is doing to feed that GPU is operating out of the same power, memory, cache, and I/O budget.

Anyone who has ever tried to "optimize" a gaming laptop has somewhat run into a similar set of problems... My Gigabyte Aero 15v8 uses a dedicated 1070MQ GPU alongside the 8750H CPU, however they both share the same cooling system and overall tiny laptop power delivery budget. By forcing the 8750H to fully disable turbo boost functionality (capped at 2.2GHz instead of ~4.1GHz) the GPU can scale almost 50% higher and ultimately delivers far better gaming performance at overall better temperatures. The M-silicon is in the same boat, only worse...
 
All of this comparison packs the same assumptions as console performance -- yeah, lots of gigabytes per second and mips and tops and flops and ints, but it's also a shared power limit, host memory bus, I/O die and cache hierarchy as the CPU core complex. It's not so simple as a dedicated card with all of those things with it's own power budget, it's own I/O control, it's own dedicated memory pool, and its own caching mechanisms. Yes, the M-series has some cool capabilities around dedicated cache pools for GPU vs CPU workloads, however the simple reality is it's all still shared.

All the work the CPU is doing to feed that GPU is operating out of the same power, memory, cache, and I/O budget.

Anyone who has ever tried to "optimize" a gaming laptop has somewhat run into a similar set of problems... My Gigabyte Aero 15v8 uses a dedicated 1070MQ GPU alongside the 8750H CPU, however they both share the same cooling system and overall tiny laptop power delivery budget. By forcing the 8750H to fully disable turbo boost functionality (capped at 2.2GHz instead of ~4.1GHz) the GPU can scale almost 50% higher and ultimately delivers far better gaming performance at overall better temperatures. The M-silicon is in the same boat, only worse...

Power is indeed another problem to Macbooks, but probably not so bad for Mac mini and Mac Studios. I don't know if those Macbooks were throttling when running these games though.
I remembered trying to use my Macbook Pro to run Final Fantasy 14, the fan was running but probably not to the point of throttling. The performance is not great but it's playable. I think that's impressive considering it's running x86 binary (the Mac version of Final Fantasy 14 is basically just a wrapper running the Windows version).
 
Yeah, I'd be interested to see what (if any) performance changes exist when moving from the gorgeous but very svelte Mac Pro chassis to a much larger, better ventilated Studio chassis. Also, for what it's worth, even if the CPU + GPU aren't "throttling", the fact that they're both busy means neither one is going to get all the I/O, cache, and memory capacity to themselves. The dynamic slicing of cache in the M silicon is pretty awesome and avoids some types of cache-thrashing behavior, however it also downsizes the total available cache to both CPU and GPU while they're both "exceptionally busy." And in a gaming scenario, I think we can be sure both are exceptionally busy.

All things considered, I'm aligned with your general thoughts around the M4 performance being pretty impressive for what it is. Even for native code, having an IGP perform at this level in such a small power envelope and on basically all-Apple IP is pretty damned nifty IMO. I wish we could get X3D chips with this kind of IGP power.
 
Power is indeed another problem to Macbooks, but probably not so bad for Mac mini and Mac Studios. I don't know if those Macbooks were throttling when running these games though.
I remembered trying to use my Macbook Pro to run Final Fantasy 14, the fan was running but probably not to the point of throttling. The performance is not great but it's playable. I think that's impressive considering it's running x86 binary (the Mac version of Final Fantasy 14 is basically just a wrapper running the Windows version).
The performance delta between the 16-inch MacBook Pro with M4 Max and the Mac Studio are within a few percent. The design is well-balanced for their primary market, which is laptops.

For sustained performance (like gaming) the beefier cooling system of the Mac Studio wins out over a long gaming session but not in a drastic way. The clear difference being that the Mac Studio is essentially silent and the MacBook Pro is starting to sound like a vacuum. Remember that the GPU is only clocked at 1.6GHz in the latest generation.
 
It's true that resource sharing between CPU and GPU means they're competing for limited bandwidth/cache/power.
It's also the case that unified memory saves bandwidth and power since there's no need to copy data between CPU and GPU. And the GPU has access to lots of memory, if needed.
If the limiting factor is power, a unified design is more efficient.
 
A unified device can certainly be more power efficient for a given performance level. However, a unified device lacks the ability to be the most performant when compared alongside dedicated units which can have their own specific memory, power, and I/O requirements met without being constrained by the preferences or peculiarities of the other parts of a unified model. Memory access patterns for GPUs and CPUs are incongruent, because the compute instructions they process are not similar.

In a GPU, thousands of tiny GPU compute ALUs will all chew through an enormous amount of in-memory data in a very sequential fashion and in a consistent memory access stride size, all performing the same singular instruction or sequence of instructions (GPUs are a massive SIMD engines afterall), which lends to a very specific memory access pattern and caching coherency, population and evacuation model. Comapre this to CPUs which do not tend to work in this way; a dozen-ish general compute units will enact a lot of disparate instructions against a lot of disparate, dissociated data in irregular stride sizes. In a device-centric implementation, the memory and cache hierarchy designs for these compute targets are very different because they work in very different ways. Forcing them to use the same memory and cache hierarchy solves an efficiency and space problem, yet causes a performance problem. And despite the "numbers are big" approach to gigabytes and petaflops and teraops, how the devices are fed by these hierarchies matters greatly in how effective they are in getting work done.

None of this is unique to M-series silicon, these are the same challenges every unified device faces. The fact that Apple's M-silicon is so performant even in the face of a unified model speaks to how well Apple architected the platform.
 
Back
Top