Information concerning Ryzen has been added to Agner's optimization guide.
http://www.agner.org/optimize/
My skim of the optimization document had a few tidbits about Rizen:
The perceptron-based predictor predicts certain patterns well that the prior generation perceptron did not handle efficiently, such as nested loops. Some corner-case quirks concerning whether Ryzen would mispredict once after a loop that repeats more than 12 times.
Mispredict was measured at roughly 18.
The fetch bandwidth for the front end is indicated as 32 bytes by AMD, but was not measured to get much more than 16.
FMA does partially occupy the issue capability of FADD, with a mix of FMA and FADD getting 2 cycles' worth of issue out 3 cycles.
Store forwarding is significantly more robust that previous generations.
A brief check of the instruction tables does show Ryzen's gather instruction support is rather perfunctory, being poorer than Haswell's first attempt.