AMD RyZen CPU Architecture for 2017

That looks like one of those minor quirks fixed with OS/BIOS tweak though?
Not like the FPU is calculating wrong ...
 
http://www.realworldtech.com/forum/?threadid=170454&curpostid=170458

Only a few of those Phoronix segfaults were real segfaults due to Ryzen hardware. Most were "conftest" segfaults, i.e. accesses through null pointers due to sloppy software, which had nothing to do with Ryzen.

Nonetheless, the Ryzen bug is very real, as I have discovered on my own Ryzen, where this hardware bug causes very rare random illegal instruction faults or memory page faults.
 
EPYC purportedly uses a B2 stepping, which might have a fix or general improvement in manufacturing that handled whatever is off about Ryzen under heavy utilization.
I thought the last time it was discussed for Threadripper that it was using B1, which isn't different from Ryzen in general unless there's a minor revision number difference between affected B1 chips. However, didn't AMD state that the highest binning chips went into Threadripper? If there's something variable like timings or physical characterization in early chips, perhaps they flake out on chips that don't meet the highest bins or don't get the same level of power-delivery or overall platform quality that an X399 board would.

Perhaps another reason might be that the multi-die setup of Threadripper and EPYC might be injecting a bit of latency somewhere that is slowing down some borderline portion of the memory pipeline due to extra synchronization or longer stalls.
 
It doesn't affect every Ryzen chip apparently, only some. So it could be something related to early samples perhaps.
 
It indicates there definitely was a revision at some point though.
 
It doesn't affect every Ryzen chip apparently, only some. So it could be something related to early samples perhaps.
That would go to the question of whether there is a smaller increment to the steppings besides just B1 and B2.
If the circuit revision is consistent between affected and unaffected chips, then it's not immediately clear that AMD has fixed something.
If there were more sample data, it might be possible to see if there was a range of affected weeks, or if specific SKUs were more likely to be affected. The binning process might have been tightened after some point, which may explain why the less-forgiving high-end or high-reliability products don't see this within their factory clock ranges.

There's no clear indication that this can be fixed by an update, which may mean something like the microcode can't work around a timing path hit by bad binning, or something else that it cannot readily change due to characterization data written or fused by an older evaluation suite. Even if the bad data is in microcode, without the necessary testing environment there may not be a way to set the correct values in the wild.
 
Perhaps another reason might be that the multi-die setup of Threadripper and EPYC might be injecting a bit of latency somewhere that is slowing down some borderline portion of the memory pipeline due to extra synchronization or longer stalls.
That was my thinking as well as it seems to be an obscure timing issue with some event happening too quickly. Might actually be memory timings in conjunction with Infinity, as TR/Epyc have relatively low memory clocks and not widely tested yet.

There's no clear indication that this can be fixed by an update, which may mean something like the microcode can't work around a timing path hit by bad binning, or something else that it cannot readily change due to characterization data written or fused by an older evaluation suite. Even if the bad data is in microcode, without the necessary testing environment there may not be a way to set the correct values in the wild.
Doesn't occur on Windows, only Linux and only with some models according to the Phoronix article, so they might be able to fix it through software. Delaying the scheduler a couple cycles or whatever is required shouldn't be overly harmful to performance. Strange they haven't addressed it, but it could be in a future kernel update that hasn't been tested.
 
That was my thinking as well as it seems to be an obscure timing issue with some event happening too quickly. Might actually be memory timings in conjunction with Infinity, as TR/Epyc have relatively low memory clocks and not widely tested yet.
At least some of the descriptions of reported issue indicate this may not be limited to memory speeds above what EPYC officially supports.

Doesn't occur on Windows, only Linux and only with some models according to the Phoronix article, so they might be able to fix it through software. Delaying the scheduler a couple cycles or whatever is required shouldn't be overly harmful to performance. Strange they haven't addressed it, but it could be in a future kernel update that hasn't been tested.
Without knowing the cause, we may not be able to rule out a difference in the way the platforms handle specific kernel functions or how and where they allocate resources. Hitting a TLB corner case or hitting kernel and user space addresses in a certain sequence can be unique to the OS architectures. However, it would be disruptive to change a kernel function or remap buffers for a few steppings.
I am not sure which scheduler you are referencing in this case. OS thread scheduling in a high load case is probably not trying to intercede that frequently, and the internal scheduling of the SOC, CPU, or non-OS software routines may not be up to the kernel to modify.
 
I've ran two tests with AoTS GPU/CPU with the 1700 at 3.8 GHz and my optimized 3466 timings:

ashesofthesingularityn4xq7.png

ashesofthesingularityotbwj.png


I've also ordered the Noctua U12s to replace my Wraith Spire, hopefully i'll be able to run the 1700 at 4.0GHz with that :D
 
Got the NH-U12s with the AM4 kit, looks quite nice :)

dsc07904kobew.jpg

Now I can get stable 4 GHz with 1.38 vcore, max temp doesn't exceed 70C during stress testing and it is dead silent!
 
I have the same cooler for my i7-6800K rig. Naturally, it barely sweats @4GHz and 1.25V with much headroom left, but I like it quiet foremost. :p
 
Aida64 CPU benchmarks with the 1700 at 4.0 GHz:

a1nmb9c.png
a2xox36.png
a3tfbmf.png
a47eb3a.png
a5wqy2d.png

:)

Edit: 7-zip if anyone is interested:

7zipj5xhu.png
 
Last edited:
Got the NH-U12s with the AM4 kit, looks quite nice :)

dsc07904kobew.jpg

Now I can get stable 4 GHz with 1.38 vcore, max temp doesn't exceed 70C during stress testing and it is dead silent!

Noctua makes nice coolers. My previous one lasted me about 10 years.
 
Back
Top