“There is machine learning training and there is inference; if I start with inference, the Xeon is still the most widely deployed landing space for that part of the workload and that has not been broadly advertised,” Wuishpard explains. “It’s an interesting and evolving area and we think Xeon Phi is going to be a great solution here and are in trials with a number of customers. We believe ultimately we’ll get a faster and more scalable result than GPUs—and we say GPUs because that’s gotten a lot of the attention.”
While the estimates about the majority of the inference side of the workload are based on Intel’s own estimates, this is a tough figure to poke holes in, in part because indeed, while the training end gets more attention, it is rarely a CPU-only conversation. “These codes tend to be tough to scale, they tend to live in single boxes. So people are buying these big boxes and chock them full of high power graphics cards, and there is an efficiency loss here,” he says, noting that users in this area desire to keep the entire workload on a single machine, have the ability to scale out, and use a cluster with a highly parallel implementation and without an offload model on the programming front. And it is in this collection of needs that the strongest case for Knights Landing is made—at least for deep learning training and inference.
So, why is Knights Landing a suitable competitor in the deep learning training and inference market? That answer kicks off with the fact that it’s a bifurcated workload, requiring two separate clusters for many users with limited scalability. In fact, scalability of deep learning frameworks on GPUs has been a challenge that many have faced for some time. The easy answer is to solve those underlying challenges using a common architecture that both scales and allows deep learning training and inference to happen on the same cluster using a simplified code base (i.e. not requiring offload/CUDA) and do so in a way that moves from beyond a single node for training.