Spinoff - Let's talk about Apple, gaming, and GPU performance

It's silly saying that Apple is "not interested in the gaming market". They have consistently raised the lower end of the market with each generation of Apple Silicon. The M4 is now 4TFlops and offers much more advanced features.

They should really have a stronger business presence with developers though, so more games would get a native port.

I believe many people use Macs for both business and gaming. It may not be the primary reason for using Macs.

They might be now, but they certainly weren't before. It takes many years for Apple to actually show some interests in gaming market. Otherwise you'd think Apple should already pay Epic Games to make a good port of Unreal engine on Macs.
It's better that they seem to put more efforts on games now, and I think it's good that there are some competitions, but as I said it will take many years. The hardware is probably fine (or at least some issues should be solvable), but the software side still have a lot of catch up to do.
 
I don't know if they're specifically trying to improve gaming per se.

They just lean into their strengths, one of which is access to leading edge fabs and using that access to load it up with GPU cores.

So they have to give reasons for why having an SOC with a lot of GPU power is a benefit and so they give some lip service about gaming.

I'm considering purchasing a Mac Studio this year and my most taxing use will be photo and video editing. There's some benefit from GPU acceleration but probably not to justify the M3 Ultra over the M4 Max, other than being able to load up a ton of RAM.

I would spend more money on RAM and maybe SSD storage over CPU and GPU cores.

But actually also interested in the display update which may come from them, like an Apple Studio Display with miniLED to support HDR and also Pro Motion.
 
Well, these integrated GPU do have pretty good tech specs. For example, M3 Max has more than 400GB/s main memory bandwidth, which is about 90% of a 3060 Ti or 3070, or PS5. Its theoretical FP32 performance is 16TFLOPS, which is again roughly the same as a 3060 Ti, and actually almost to the PS5 Pro level (which is 18TFLOPS). So in raw performance a full M3 Max should be roughly the same as a 3060 Ti. The M3 Ultra, which is basically 2 M3 Max welded together (which means double everything, including memory bandwidth), should be just slightly weaker than a 3090.
All of this comparison packs the same assumptions as console performance -- yeah, lots of gigabytes per second and mips and tops and flops and ints, but it's also a shared power limit, host memory bus, I/O die and cache hierarchy as the CPU core complex. It's not so simple as a dedicated card with all of those things with it's own power budget, it's own I/O control, it's own dedicated memory pool, and its own caching mechanisms. Yes, the M-series has some cool capabilities around dedicated cache pools for GPU vs CPU workloads, however the simple reality is it's all still shared.

All the work the CPU is doing to feed that GPU is operating out of the same power, memory, cache, and I/O budget.

Anyone who has ever tried to "optimize" a gaming laptop has somewhat run into a similar set of problems... My Gigabyte Aero 15v8 uses a dedicated 1070MQ GPU alongside the 8750H CPU, however they both share the same cooling system and overall tiny laptop power delivery budget. By forcing the 8750H to fully disable turbo boost functionality (capped at 2.2GHz instead of ~4.1GHz) the GPU can scale almost 50% higher and ultimately delivers far better gaming performance at overall better temperatures. The M-silicon is in the same boat, only worse...
 
All of this comparison packs the same assumptions as console performance -- yeah, lots of gigabytes per second and mips and tops and flops and ints, but it's also a shared power limit, host memory bus, I/O die and cache hierarchy as the CPU core complex. It's not so simple as a dedicated card with all of those things with it's own power budget, it's own I/O control, it's own dedicated memory pool, and its own caching mechanisms. Yes, the M-series has some cool capabilities around dedicated cache pools for GPU vs CPU workloads, however the simple reality is it's all still shared.

All the work the CPU is doing to feed that GPU is operating out of the same power, memory, cache, and I/O budget.

Anyone who has ever tried to "optimize" a gaming laptop has somewhat run into a similar set of problems... My Gigabyte Aero 15v8 uses a dedicated 1070MQ GPU alongside the 8750H CPU, however they both share the same cooling system and overall tiny laptop power delivery budget. By forcing the 8750H to fully disable turbo boost functionality (capped at 2.2GHz instead of ~4.1GHz) the GPU can scale almost 50% higher and ultimately delivers far better gaming performance at overall better temperatures. The M-silicon is in the same boat, only worse...

Power is indeed another problem to Macbooks, but probably not so bad for Mac mini and Mac Studios. I don't know if those Macbooks were throttling when running these games though.
I remembered trying to use my Macbook Pro to run Final Fantasy 14, the fan was running but probably not to the point of throttling. The performance is not great but it's playable. I think that's impressive considering it's running x86 binary (the Mac version of Final Fantasy 14 is basically just a wrapper running the Windows version).
 
Yeah, I'd be interested to see what (if any) performance changes exist when moving from the gorgeous but very svelte Mac Pro chassis to a much larger, better ventilated Studio chassis. Also, for what it's worth, even if the CPU + GPU aren't "throttling", the fact that they're both busy means neither one is going to get all the I/O, cache, and memory capacity to themselves. The dynamic slicing of cache in the M silicon is pretty awesome and avoids some types of cache-thrashing behavior, however it also downsizes the total available cache to both CPU and GPU while they're both "exceptionally busy." And in a gaming scenario, I think we can be sure both are exceptionally busy.

All things considered, I'm aligned with your general thoughts around the M4 performance being pretty impressive for what it is. Even for native code, having an IGP perform at this level in such a small power envelope and on basically all-Apple IP is pretty damned nifty IMO. I wish we could get X3D chips with this kind of IGP power.
 
Power is indeed another problem to Macbooks, but probably not so bad for Mac mini and Mac Studios. I don't know if those Macbooks were throttling when running these games though.
I remembered trying to use my Macbook Pro to run Final Fantasy 14, the fan was running but probably not to the point of throttling. The performance is not great but it's playable. I think that's impressive considering it's running x86 binary (the Mac version of Final Fantasy 14 is basically just a wrapper running the Windows version).
The performance delta between the 16-inch MacBook Pro with M4 Max and the Mac Studio are within a few percent. The design is well-balanced for their primary market, which is laptops.

For sustained performance (like gaming) the beefier cooling system of the Mac Studio wins out over a long gaming session but not in a drastic way. The clear difference being that the Mac Studio is essentially silent and the MacBook Pro is starting to sound like a vacuum. Remember that the GPU is only clocked at 1.6GHz in the latest generation.
 
It's true that resource sharing between CPU and GPU means they're competing for limited bandwidth/cache/power.
It's also the case that unified memory saves bandwidth and power since there's no need to copy data between CPU and GPU. And the GPU has access to lots of memory, if needed.
If the limiting factor is power, a unified design is more efficient.
 
A unified device can certainly be more power efficient for a given performance level. However, a unified device lacks the ability to be the most performant when compared alongside dedicated units which can have their own specific memory, power, and I/O requirements met without being constrained by the preferences or peculiarities of the other parts of a unified model. Memory access patterns for GPUs and CPUs are incongruent, because the compute instructions they process are not similar.

In a GPU, thousands of tiny GPU compute ALUs will all chew through an enormous amount of in-memory data in a very sequential fashion and in a consistent memory access stride size, all performing the same singular instruction or sequence of instructions (GPUs are a massive SIMD engines afterall), which lends to a very specific memory access pattern and caching coherency, population and evacuation model. Comapre this to CPUs which do not tend to work in this way; a dozen-ish general compute units will enact a lot of disparate instructions against a lot of disparate, dissociated data in irregular stride sizes. In a device-centric implementation, the memory and cache hierarchy designs for these compute targets are very different because they work in very different ways. Forcing them to use the same memory and cache hierarchy solves an efficiency and space problem, yet causes a performance problem. And despite the "numbers are big" approach to gigabytes and petaflops and teraops, how the devices are fed by these hierarchies matters greatly in how effective they are in getting work done.

None of this is unique to M-series silicon, these are the same challenges every unified device faces. The fact that Apple's M-silicon is so performant even in the face of a unified model speaks to how well Apple architected the platform.
 
The performance delta between the 16-inch MacBook Pro with M4 Max and the Mac Studio are within a few percent. The design is well-balanced for their primary market, which is laptops.
In one of the links I posted, the 14 inch MackBook Pro with M3 Max faced severe throttling issues (timestamped), maybe because of it's smaller size vs the 16 inch model? Anyway with more cooling the tester gained ~10fps in Stray.


On the other hand, DigitalFoundry tested Assassin's Creed Shadows and their conclusions are the same, very low performance across the board (720p sub 30ps) for M1 and M2 Max, M3 and M4 barely scrape by, even M3 Ultra barely does 1080p60 at low settings.


Possible reasons include the heavy reliance on mesh shading, and the heavy compute shading cost of GPU simulations (for weather, cloths, destruction ... etc)
 
Last edited:
Apple gpus are architecturally very different. Don't remember the details off the top of my head, but the bottlenecks are quite different. Even if you port the game to mac native, that doesn't necessarily mean the game will run well unless you change the architecture of the renderer to suit the gpu. It is a big downside of them being such a small player in gaming. Companies are unlikely to make large changes when porting. Not that I think Apple would suddenly get 4090 performance or anything, but I'd guess they could do better with particular care.
 
Apple gpus are architecturally very different. Don't remember the details off the top of my head, but the bottlenecks are quite different. Even if you port the game to mac native, that doesn't necessarily mean the game will run well unless you change the architecture of the renderer to suit the gpu. It is a big downside of them being such a small player in gaming. Companies are unlikely to make large changes when porting. Not that I think Apple would suddenly get 4090 performance or anything, but I'd guess they could do better with particular care.
Does anyone know how different they are from Imaginations standard architectures Apple still licensenses? (Yes, they tried dropping Imagination once but apparently that didn't work out, probably infringing their patents)
 
Even for native code, having an IGP perform at this level in such a small power envelope and on basically all-Apple IP is pretty damned nifty IMO. I wish we could get X3D chips with this kind of IGP power.

Was hopeful to see how Strix Point Halo evolves, but in the near future...apparently not so much. The next gen still being based on RDNA 3.5 sucks as any GPU now without decent AI upscaling is a non-starter IMO, so really hope they can get FSR4 on these APU's working at some point.
 
Was hopeful to see how Strix Point Halo evolves, but in the near future...apparently not so much. The next gen still being based on RDNA 3.5 sucks as any GPU now without decent AI upscaling is a non-starter IMO, so really hope they can get FSR4 on these APU's working at some point.
AMD has said (repeatedly I think?) RDNA4 isn't aimed at iGPU (or even mobile) space, reportedly Zen6 APUs will stick to RDNA 3.5 too
 
Mod mode: my fault for getting us into the RDNA4 / X3D tangent. This is the Apple Existential Gaming Dread thread ( ;) ) so let's wander back into the intended topic. I think the RDNA stuff would be interesting over in the RDNA4 thread...
 
Does anyone know how different they are from Imaginations standard architectures Apple still licensenses? (Yes, they tried dropping Imagination once but apparently that didn't work out, probably infringing their patents)
Their fixed function HW design hasn't really changed all that much before the point of divergence (much of it still based on Imagination) but their ISA is supposedly custom ...
 
Mod mode again: @DavidGraham dropped a suggestion into the mod inbox that perhaps all the Apple gaming talk makes more sense as its own thread versus kinda hidden in the Existential Dread Thread. I think it was a good idea, so here we are with a nice shiny place to talk Apple gaming. Enjoy!
 
Back
Top