BSD team discovered a new hardware issue with Ryzen
https://svnweb.freebsd.org/base?view=revision&revision=321899
Only a few of those Phoronix segfaults were real segfaults due to Ryzen hardware. Most were "conftest" segfaults, i.e. accesses through null pointers due to sloppy software, which had nothing to do with Ryzen.
Nonetheless, the Ryzen bug is very real, as I have discovered on my own Ryzen, where this hardware bug causes very rare random illegal instruction faults or memory page faults.
I believe that does not quite address whether or not Epyc is affected, given that the poster on RWT only has access to a consumer Ryzen SKU - please correct me if I am wrong.
That would go to the question of whether there is a smaller increment to the steppings besides just B1 and B2.It doesn't affect every Ryzen chip apparently, only some. So it could be something related to early samples perhaps.
That was my thinking as well as it seems to be an obscure timing issue with some event happening too quickly. Might actually be memory timings in conjunction with Infinity, as TR/Epyc have relatively low memory clocks and not widely tested yet.Perhaps another reason might be that the multi-die setup of Threadripper and EPYC might be injecting a bit of latency somewhere that is slowing down some borderline portion of the memory pipeline due to extra synchronization or longer stalls.
Doesn't occur on Windows, only Linux and only with some models according to the Phoronix article, so they might be able to fix it through software. Delaying the scheduler a couple cycles or whatever is required shouldn't be overly harmful to performance. Strange they haven't addressed it, but it could be in a future kernel update that hasn't been tested.There's no clear indication that this can be fixed by an update, which may mean something like the microcode can't work around a timing path hit by bad binning, or something else that it cannot readily change due to characterization data written or fused by an older evaluation suite. Even if the bad data is in microcode, without the necessary testing environment there may not be a way to set the correct values in the wild.
At least some of the descriptions of reported issue indicate this may not be limited to memory speeds above what EPYC officially supports.That was my thinking as well as it seems to be an obscure timing issue with some event happening too quickly. Might actually be memory timings in conjunction with Infinity, as TR/Epyc have relatively low memory clocks and not widely tested yet.
Without knowing the cause, we may not be able to rule out a difference in the way the platforms handle specific kernel functions or how and where they allocate resources. Hitting a TLB corner case or hitting kernel and user space addresses in a certain sequence can be unique to the OS architectures. However, it would be disruptive to change a kernel function or remap buffers for a few steppings.Doesn't occur on Windows, only Linux and only with some models according to the Phoronix article, so they might be able to fix it through software. Delaying the scheduler a couple cycles or whatever is required shouldn't be overly harmful to performance. Strange they haven't addressed it, but it could be in a future kernel update that hasn't been tested.
Got the NH-U12s with the AM4 kit, looks quite nice
Now I can get stable 4 GHz with 1.38 vcore, max temp doesn't exceed 70C during stress testing and it is dead silent!
Yes, but for the love of everything holy, they do not have any taste on colors they use.Noctua makes nice coolers. My previous one lasted me about 10 years.