Nvidia Pascal Speculation Thread

Status
Not open for further replies.

DSC

Regular
Banned
ubSnxzK.jpg


sdlnYL3.jpg
 
Last edited by a moderator:
GTC Keynote states that is had NVLINK (a new GPU-GPU link) and 3D Memory (Stacked DRAM).

Is this GPU a replacement or renamed Volta?
 
The Anandtech liveblog made mention of very wide bus widths for stacked memory, which is consistent with HBM or some variety of future WideIO than the narrower Hybrid Memory Cube bus.

I haven't seen mention of an interposer, however, which would be needed for the highest bus widths.
 
So NVLink is going to require both compatible motherboards and CPU's? And since theres no chance AMD will support this its basically reliant on Intel doing so.

Anyone care to wager on how likely that is? Would they need to support this to be able to compete with IBM in the supercomputer market when being used in combination with NV GPU's? Or is it more likely Intel will use a competitve technology?

I find it hard to get excited about this until there's some hint that we might actually see it in desktop PC's, which so far I'm not seeing.
 
A low latency interconnect is absolutely vital for a supercomputing chip. (even PC users complain of the PCIe)
AMD has had Hypertransport since 2003, Intel has had QPI from 2009, next-gen Xeon Phi has QPI which is a similar move.

The most likely use of NVLink in early system is given by their latter block diagram. Up to four GPUs communicate between them, probably accessing each other's data in a NUMA fashion. But communication to what's called "CPU" is done through slow PCI express.

The candidate for a NVLink-enabled CPU would certainly be an ARMv8 SoC. ARM SoC are friendly to customization, revolve around an internal bus that is thus impossibly fast and where various accelerators and high speed interfaces plug in.. Best known example is maybe AMD Seattle, for now.
From GPU's viewpoint, the prize is accessing at the least the "system CPU's" 1 TB or so memory.

An ugly chipset could mash up between QPI and NVLink maybe, but that would increase latency and Intel would have to allow it.
 
I wonder who NVIDIA is working with for the memory. Hynix too? HBM seems the only wide IO memory which will be ready soon (HMC is not wide IO memory).

PS. guess they might be doing it with eDRAM at IBM ... expensive, but hell that silicon interposer is probably a 1000$ all by itself.
 
This is from one of the press releases linked above. Seems that PowerPC is the first to use NvLink


NVIDIA will add NVLink technology into its Pascal GPU architecture -- expected to be introduced in 2016 -- following this year's new NVIDIA Maxwell compute architecture. The new interconnect was co-developed with IBM, which is incorporating it in future versions of its POWER CPUs.

"NVLink enables fast data exchange between CPU and GPU, thereby improving data throughput through the computing system and overcoming a key bottleneck for accelerated computing today," said Bradley McCredie, vice president and IBM Fellow at IBM. "NVLink makes it easier for developers to modify high-performance and data analytics applications to take advantage of accelerated CPU-GPU systems. We think this technology represents another significant contribution to our OpenPOWER ecosystem."

With NVLink technology tightly coupling IBM POWER CPUs with NVIDIA Tesla® GPUs, the POWER data center ecosystem will be able to fully leverage GPU acceleration for a diverse set of applications, such as high performance computing, data analytics and machine learning
 
http://techreport.com/news/26226/nvidia-pascal-to-use-stacked-memory-proprietary-nvlink-interconnect

Turns out Volta remains on the roadmap, but it comes after Pascal and will evidently include more extensive changes to Nvidia's core GPU architecture.

Nvidia has inserted Pascal into its plans in order to take advantage of stacked memory and other innovations sooner. (I'm not sure we can say that Volta has been delayed, since the firm never pinned down that GPU's projected release date.) That makes Pascal intriguing even though its SM will be based on a modified version of the one from Maxwell. Memory bandwidth has long been one of the primary constraints for GPU performance, and bringing DRAM onto the same substrate opens up the possibility of substantial performance gains.

Compared to today's GPU memory subsystems, Huang claimed Pascal's 3D memory will offer "many times" the bandwidth, two and a half times the capacity, and four times the energy efficiency. The Pascal chip itself will not participate in the 3D stacking, but it will have DRAM stacks situated around it on the same package. Those DRAM stacks will be of the HBM type being developed at Hynix. You can see the DRAM stacks cuddled up next to the GPU in the picture of the Pascal test module below.
 
Last edited by a moderator:
I guess last year's news was easily forgotten or looked over, by me at least.

http://www.forbes.com/sites/davealt...celerator-and-strategic-partnership-with-ibm/

From wikipedia's article on the POWER8 we even have a name for the bus. That's CAPI?
It says it is layered on top of PCIe 3.0 but that would suck. I would think the nvidia GPU (and presumably the POWER8 CPU) use the physical PCIe 3.0 lines but run a different protocol over them.
Else (likely), a POWER8 variant would be fabbed with a more appropriate bus (that still speaks "CAPI" on a high level but otherwise has the huge speed, low latency and compatibility with the NVLink GPU)

Where previous POWER processors use the GX++ bus for external communication, POWER8 removes this from the design and replaces it with the CAPI port (Coherence Attach Processor Interface) which is layered on top of PCI Express 3.0. The CAPI port is used to connect auxiliary specialized processors such as GPUs, ASICs and FPGAs.[5][6] Units attached to the CAPI bus can use the same memory address space as the CPU. At the 2013 ACM/IEEE Supercomputing Conference, IBM and Nvidia announced an engineering partnership to closely couple POWER8 with Nvidia GPUs in future HPC systems.[7]

/edit : wikipedia's sources, I'll let you look for yourself about the CAPI bus (not sure I like how that name sounds)
http://www.pcworld.idg.com.au/article/524768/ibm_new_power8_doubles_performance_watson_chip/
http://wccftech.com/ibm-power8-processor-architecture-detailed/
 
Last edited by a moderator:
MfA said:
I wonder who NVIDIA is working with for the memory. Hynix too? HBM seems the only wide IO memory which will be ready soon (HMC is not wide IO memory).

It is indeed Hynix HBM. I confirmed this talking a Hynix rep at GTC. It may be that Pascal uses 2nd gen HBM, which would give 1TByte/sec throughput. This would match with Jen-hsun's rough numbers.
 
I wonder who NVIDIA is working with for the memory. Hynix too? HBM seems the only wide IO memory which will be ready soon (HMC is not wide IO memory).

PS. guess they might be doing it with eDRAM at IBM ... expensive, but hell that silicon interposer is probably a 1000$ all by itself.

I wish these companies would be more forthcoming with HBM details.

Edit;
It is indeed Hynix HBM. I confirmed this talking a Hynix rep at GTC. It may be that Pascal uses 2nd gen HBM, which would give 1TByte/sec throughput. This would match with Jen-hsun's rough numbers.
Well that's good to hear.
Perhaps we can expect ~500GB/s first gen HBM on the upcoming 20nm parts?
 
Status
Not open for further replies.
Back
Top