That sounds a bit dubious. The inter-die (intra-package) GMI links are allegedly synced to the memory frequency (as the on die infinity fabric links) and provide 256Bit bandwidth per clock (1.33 GHz at 2.66GT/s memory) per direction. That is totally expected. Having the exact same bandwidth as provided by the ports to the intra-die infinity fabric makes a lot of sense.
The external (between sockets) xGMI links are supposed to be higher in frequency and lower in width to provide also the same bandwidth. That is something I would shoot for too. But we know, Epyc uses the PCIe lanes (64 in total, probably 4x16, i.e., each die from one package uses either 16 lanes to connect to exactly one die in the other package [basically setting up a cube as topology where the "faces" in each package may be fully connected] with a lesser probability of 8x8 meaning each die in one package connects to two other dies in the other package) for xGMI, which are basically capable of pushing 8 GT/s over PCIe slot connectors, equalling 16 GB/s for an x16 link. One may expect they can hit higher speeds when just connecting the two sockets. But each xGMI link (assuming there are 4 in total between the dies) would not be able to provide anywhere close to 42,6GB/s (there was recently a demo of a combined PCIe CCIX PHY pushing 25GT/s over a normal PCIe slot, but that was under lab conditions, I wouldn't expect Epyc to run their PCIe-PHYs at up to 21.3 GT/s). The maximum one may allow for it may be a doubling of the signaling rate between the sockets (I would deem that pretty aggressive already), giving 32GB/s per x16 link. If AMD could do north of 20GT/s, they would probably have said so. Or at least I would expect to tell us. Or the topology between the dies in a 2S system looks a bit different. But bundling all 32 lanes each from two dies to get two 32x links between the sockets appears weird (especially as two x16 parts per die are physically separated) and I wouldn't see a clear advantage, albeit it's not impossible (and the required 10.6 GT/s to get 42.6GB/s on a x32 link appear to be totally doable).
And having only 42 GB/s per direction over all 64 PCIe lanes together appears to be too low to me and would probably limit scaling.
For quick reference, I will link what I think the source for the PHY description is: https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdfSo just want to point out the SERDERS interface AMD is using is rated for 12.5Ghz and they dont have to just use pci* on it, who knows how high it can be "pushed" in a controlled environment.
http://man7.org/linux/man-pages/man3/numa.3.htmlThis is fine for most enterprise applications, but Ryzen performance clearly points out that consumer applications (including games) like a big shared LLC much more than slightly lower LLC latency. Future programmers need to be thinking more about memory locality. Big shared LLC is starting to become too expensive (and it gets slower as you add more cores and more capacity).
So just want to point out the SERDERS interface AMD is using is rated for 12.5Ghz and they dont have to just use pci* on it, who knows how high it can be "pushed" in a controlled environment. The other thing to consider is that a 16x PCI-E block on zepplin is 3x4+2x2 interfaces so there would be quite a few extra pairs that might potentially be able to be used but i think this is unlikely. I also took the stilt as meaning per Zepplin not for the entire socket in terms of GMIx, but i have a hard time reconciling how it gets to 42.6GB/s for GMIx.
*Supports PCI Express 3.1, SATA 6G, Ethernet 40GBASE-KR4, 10GBASE-KR, 10GBASE-KX4, 1000BASE-KX, 40GBASE-CR4, 100GBASE-CR10, XFI, SFI (SFF-8431), QSGMII, and SGMII
Nice find!For quick reference, I will link what I think the source for the PHY description is: https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
This sounds like a subset of the Synopsys DesignWare Enterprise 12G product: https://www.synopsys.com/dw/ipdir.php?ds=dwc_ether_enterprise12g
Actually, 100GBASE-CR10 uses differential signaling on 10 pairs (per direction) with slightly above 10GT/s. I doubt this can be done with just 4 lanes. Probably one has to bundle more (exactly ten, I suppose) for that.Just thinking about it, the biggest block the 12G PHY supports is quad channel, that quad channel can run 100GBASE-CR10 which is 10 lanes @ 10.3125ghz ( im assuming this can be done because it doesn't use differential pairs)
As Naples is speced with a maximum of DDR4-2667 and server systems use OC-RAM very rarely, I guess they are okay.One issue in this overall scenario is that if the PHY for the xGMI links cannot be pushed, DDR 3200 doesn't quite fit under the 12.5 Gbs ceiling without running something out of spec or at a different clock.
The APU's math still barely works out with the 12.5 max, but it is also stated that the links are GMI rather than xGMI and the APU has a full 64 lanes of external PCIe. There's no external product requirement that GMI match the 12G PHY's specs, or it can get under it again since The Stilt said GMI is twice as wide but clocked half as fast as xGMI.
Naples the server MCM may not officially be able to scale to 3200 without some adjustment, however.
There is plenty of room in the +$500 segment. They just need to reposition their chips to be more favorable against Intels stack in the "lower" end.
This clearly points out that they need room for the 12-core and 16-core Threadrippers. They are going to be aggressively priced. My guess = 999$ for the 16-core flagship and 699$ for the 12-core flagship.Substantial price cuts on Ryzen?
http://wccftech.com/amd-ryzen-7-prices-drop-ahead-of-threadripper-launch/
checking daily here, 'cos I still have a 3200MHz memory module running at 2666MHz but I guess the next bios update for the Msi B350M Gaming Pro should be out next week or so.It's a pretty good processor
I've been running all cores at 3.9GHz with 1.34 load voltage and temps don't go over 72C at full load with the stock cooler (although I did re-paste using Arctic Mx4). Try to find a BIOS with agesa 1.0.0.6 for your motherboard, it's very useful when trying to run memory at rated speeds.
But obviously you can't be selling the 1800x and the lower end Threadripper at the same price. I'm thinking $500 will be the starting point for TR all the way up to $999.
Due to the sockets they're locked to different platforms, so they could overlap prices if they really wanted. Motherboard, RAM, cooling, etc will make the Threadripper more expensive.But obviously you can't be selling the 1800x and the lower end Threadripper at the same price. I'm thinking $500 will be the starting point for TR all the way up to $999.
checking daily here, 'cos I still have a 3200MHz memory module running at 2666MHz but I guess the next bios update for the Msi B350M Gaming Pro should be out next week or so.