Apple is an existential threat to the PC

Ouch, don't use a Mac running MacOS if you need robust data integrity and durability. It's a long thread with a lot of information.

The guy is highlighting that Apple's "fsync()" differs from of modern linux. When macOS (OSX) was created, fsync on both platforms behaved the same and over time, around kernel 2.5/2,6?, linux changed the behaviour of fsync to include a flush of data to permanent storage (not drive cache). On macOS apple added a new function (f_fullsync) instead.

I don't think the issue here is macOS, it is the drives that ignore lower level Force Unit Access mode. There are two levels of flush, the first is where you flush what's in the OS buffers to the drive (including the drive's cache) and the second is where you flush everything to permanent storage. Some drive firmware does not follow the latter's instruction, and this is why this impacts some non-Apple drives.

So let's not go crazy overboard with the data integrity issue, eh?
 
It's interesting for sure but have in mind that fsync is just a command telling a device it needs to be consistent. Issuing the command does not guarantee a flush will do as you ask, which is why it's pushed up a level to the filesystem and not internally in the device. This isn't isolated to macOS by the way. In fact, it is mostly ignored by many drives and isn't even enabled by default in most kernels (be it Unix or Windows) due to synchronous writes being slow.

 
It's interesting for sure but have in mind that fsync is just a command telling a device it needs to be consistent. Issuing the command does not guarantee a flush will do as you ask, which is why it's pushed up a level to the filesystem and not internally in the device. This isn't isolated to macOS by the way. In fact, it is mostly ignored by many drives and isn't even enabled by default in most kernels (be it Unix or Windows) due to synchronous writes being slow.


The problem isn't with drives that don't implement it correctly, the problem is that when used with the MacOS, it doesn't matter if your drive properly implements it or not MacOS doesn't use it correctly. To replicate the behavior in MacOS requires to use a special case function F_FULLSYNC, unfortunately if you do so it absolutely cripples writes to the Apple SSD drive in Mac M1 devices.

Single threaded, simple Python file rewrite test:

Macbook Air M1 (macOS):
- flushing: 46 IOPS
- not: 40000 IOPS x86

iMac + WD SN550 1TB NVMe (Linux):
- flushing: 2000 IOPS
- not: 20000 IOPS

x86 laptop + Samsung SSD 860 EVO 500GB SATA:
- flushing: 143 IOPS
- not: 5000 IOPS

Basically, in order to make the Apple SSD faster, it's possible they made a conscious decision to not have that functionality fully implemented on their SSD and thus their SSD appears to be much faster than it would be if data integrity and robustness were actually important to Apple. All drives lose speed when flushing writes, but the Apple SSD in M1 Mac's is incredibly bad when actually flushing data to the drive. So rather than have a performant I/O subsystem that properly implements flush() in order to preserve data robustness, they decided to not even use that functionality by default and not just that they made it more obtuse to enable it in the first place.

Now, it's important to note that not everyone needs data durability, so for most people they wouldn't notice if their Mac loses power and they lose data. However, there are users where they do require that and had erroneously thought that fsync() would operate as it should in MacOS, but it doesn't. And if they were going by benchmarks that they were doing to see the performance of their Mac with fsync() enabled, they would be erroneously thinking that their Mac was not only protecting their data in the event of power loss, but that it was also incredibly fast on Apple SSDs.

It's likely a bug and not Apple doing this to deliberately cheat on an admittedly niche function for consumer devices. Pointing it out is what's needed to alert Apple to this bug and hopefully have them fix it.

Regards,
SB
 
Unless they put in a SSD with PLP (power loss protection) you don't want to suddenly lose power regardless of the OS. The SSD is perfectly capable of screwing up on its own. But unless a cosmic ray screws up the OS/hardware it isn't relevant on a laptop, when the battery runs down the OS will shutdown gracefully before it happens.

FSync is a giant mess, all OS's and drives delay it and don't treat it like a proper barrier either. File system integrity is mostly a happy accident. 5 minutes is a bit beyond the pale though.
 
Last edited:
The problem isn't with drives that don't implement it correctly, the problem is that when used with the MacOS, it doesn't matter if your drive properly implements it or not MacOS doesn't use it correctly. To replicate the behavior in MacOS requires to use a special case function F_FULLSYNC, unfortunately if you do so it absolutely cripples writes to the Apple SSD drive in Mac M1 devices.

How can you claim to be worried about data integrity and dismiss drives that fake the drive cache purge command? If the drive doesn't respect the OS insisting the drive purge the cache to data onto permanent storage, what the heck are you doing to ensure the integrity of your data before it is safety saved? :???:

If you read the guy's more recent tweets he's cottoned on to the fact that it is the drives causing the disparity in performance. He has named two drives that work properly but has refused to name the drives that revealed the original discrepancy, you know.. because.. fake cache purging.

It's also a bit mad to expect the hybrid unix kernel that is macOS/Darwin to behave functionally identical to the linux monolithic kernel in terms of I/O. This is why on macOS you have fsync, f_fullsync and f_barriersync to better handle distinct usage needs and I believe f_barriersync would provide what he wants without the performance hit - although not the original performance hit because, as above, the guy discovered the drives he was using were faking their cache purges. :runaway:
 
Last edited by a moderator:
Traditionally flush pushed buffered data from a process to kernel space. Ie. your app/service can crash, but your data will be fine as long as the OS doesn't crash. Fsync pushes data from the OS all the way to the disk. Traditionally windows fsyncs a file every time it is closed (because windows users traditionally have been very keen to power cycle their machine whenever it was unresponsive). Traditionally, disk blocks were the same size as memory pages, - or smaller.

Enter Flash storage. Flash is written in large 256+KB chunks. If your app performs fsync frequently (like a DBMS), with memory page sized writes, 256KB is read, 4KB updated and the resulting 256KB chunk rewritten to the flash. Beside the performance penalty, the resulting write amplification is a good way to ensure a very short life for your flash device.

So all kinds of games are played with fsync. At the OS level, fsync guarantees the data is persisted. If you have DRAM on your flash controller and a capacitor big enough to ensure you can write buffered data to flash in case of power loss, the device can tell the OS, the data is persisted right away; Internally coalesce a bunch of writes and maintain high performance and low power.

Fine in theory, but firmware bugs and DRAM-less controllers complicate things, so an extra fsync_for_real() is put in place, and it is this that kills performance (and your drive, because write amplification is a thing again). The disrepancy is because fsync now means different things on different OSs

Cheers
 
Its not a problem for laptops i assume, you have a build in UPS. I compared nvme benchmarks on my asus laptop vs my sisters m1 max, they perform about the same, some slight speed advantage to the asus nvme. Maybe the M1 has better seek times.
 
Its not a problem for laptops i assume, you have a build in UPS. I compared nvme benchmarks on my asus laptop vs my sisters m1 max, they perform about the same, some slight speed advantage to the asus nvme. Maybe the M1 has better seek times.
Remember there are differences between capacity sizes with the 4TB and 8TB performing the best with the Apple NAND flash controller.
 
PCIE4 has that 7gb/s raw limit right? Both platforms where quite close to that limit.
@Pressure is saying that SSD performance scales up with storage capacity on the Apple T2, M1, and M1 Pro/Max chips (and all Apple A-series SoCs). You would need either a 4TB or 8TB MacBook Pro to hit peak NAND storage performance on M1 Pro/Max.

I'm not sure what protocol M1 Pro/Max use for NAND storage. I've seen benchmarks with read speeds hitting Apple's claimed 7.4GB/s figure.
 
It's an interesting approach and Apple didn't delve into the details of how two/multiple M1Max chips interconnect but it reminds me of when Sony put a 2048-bit rambus into the PlayStaton 2 when conventional wisdom was that this was not practical. It was never my field of expertise but I recall that from an electronics perspective, going wide could be preferable to going faster where signal integrity was an issue which it definitely would be here.
 
Last edited by a moderator:
Just for yields alone this must be a nice setup. That's a lotta transistors :p
Yup. The ability to easily interconnect two semiconductors and have a flat 2x boost - with pretty much no overhead - is such a no brainer in terms of production and yields. It's kind of astonishing that Apple Silicon suffers almost no perceviable performance loss by interconnecting two M1Max chips. That feels like the real achievement here. It's been the aim since dual-processors setups like PowerPC Bebox, dual-GPUs setups like SLI and dual/quad/multi any processor configurations but there is almost always a performance hit.

This bodes well for ARM Mac Pro.
 
This bodes well for ARM Mac Pro.

Apple's teaser for the Mac Pro was pretty cool too:
"There's only one more Mac left -- Mac Pro -- but that... is for another day."

Apple mentioned that M1, Pro, Max, and Ultra are the complete lineup for the M1 family. So my guess is that Mac Pro will transition to Apple Silicon in the "M2" generation. Perhaps developing a chiplet-based solution which can handle both unified memory and expandable DRAM modules was a bit of a challenge. After all, I don't think current Mac Pro customers would be happy with a 128GB RAM limit.

EDIT: In other news, it seems like Apple discontinued the 27-inch iMac. :( Sad times. It would've been nice to have a large desktop AIO with M1 Pro/Max/Ultra.
 
I just sold my old Mac Pro in anticipation for this. I'll be honest, I have no use for the M1 Ultra and might just get the base line model with the M1 Max.
 
Back
Top