Yes?in what SP compute ? SGEMM ? seriously ?
Yes?in what SP compute ? SGEMM ? seriously ?
That was not promised, just the stacked memory.
Stack a module on the other, and it's stacking a full PCB, a multi-hundred Watt GPU, and all those hot VRMs right on top of each other.
(addendum: The full PCB is admittedly small, but it's still a thermal barrier if nothing else.)
NvLink is nothing special, every HPC company are now and still a long time at work for provide the same things.. if IBM was working on this, thats not a mistery and i can ensure you that it will not be so much different of what other have to offer on this aspect. ( they was just the first to put it in a roadmap ). And i can ensure you that AMD have allready the same type of link in preparation since a long long time before this was annonced.
The numbers are big and so is the news. The U.S. Department of Energy today unveiled plans to build two GPU-powered supercomputers. Each will deliver at least 100 petaflops of compute performance.
And one – the Summit system at Oak Ridge National Laboratory, designed for open science – is expected to be 150 petaflops. That’s more than three times the peak speed of today’s fastest supercomputer.
...
NVIDIA GPUs and IBM POWER CPUs, connected with the NVLink interconnect technology, will power both machines.
Already first design win for Pascal !!!
more at the source: http://blogs.nvidia.com/blog/2014/11/14/what-is-nvlink/
Summit features more than 3,400 compute nodes, enough to deliver a peak performance between 150
and 300 petaFLOPS, and is expected to deliver more than five times the system-level application
performance of Titan while consuming only 10% more power. Each compute node includes multiple
next-generation IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs based on the NVIDIA Volta
architecture. Each node is expected to deliver more than 40 TFLOPS, which is enough to outperform an
entire rack of Haswell x86 CPU servers. In fact, just four nodes in Summit system would be powerful
enough to qualify for the Top500 list, as of June 2014.
No, Jensen said Pascal has 2x the single precision performance, with FP16 being an additional 2x.Not sure if this is the right place to ask, but it was said that Pascal will have 4x the mixed precision performance as maxwell in the recent conference. What is the mixed precision performance of maxwell? can I assume it is the same as single precision?
Interesting, I take it that means that FP16 Pascal performance is 4x maxwell's single precision performance?No, Jensen said Pascal has 2x the single precision performance, with FP16 being an additional 2x.
Interesting, I take it that means that FP16 Pascal performance is 4x maxwell's single precision performance?
T
I got more the impression that it is, Pascal have 2x the FP32 performance of Maxwell and if they compare FP16 performance , you have 2x the performance in FP16 than Maxwell.
Discrete Maxwell currently has to do FP16 ops on its FP32 ALUs, 1:1. Tegra X1 introduced bundled FP16 instructions, which allowed for 2 FP16 instructions (that are the same operation) to be executed together in a FP32 unit, doubling FP16 throughput. This feature is coming to discrete GPUs with Pascal, and will lead to Pascal having 2x the FP16 throughput as Maxwell, all things held equal. Meanwhile Pascal is supposed to also offer 2x the perf-per-watt (NVIDIA is careful not to say raw performance) of Maxwell, hence 2x * 2x = 4x.Some sources are saying 4x the mixed precision performance over maxwell, I take it that means 4x the fp16 performance. I'm not sure what maxwell's mixed precision performance is,
Pascal has 2x the throughput in FP16 compared to FP32. The 4x comes because Pascal can do 2x the FP32 throughout at the same power, or 4x the FP16 throughput. Jensen referred to mixed precision because most algorithms will still use FP32 for a few numerically critical parts of the application. I certainly will - FP16 is great for training deep neural networks, except in the reductions where we overflow and need FP32.
Pascal is Nvidia’s follow-up to Maxwell, and the first desktop chip to use TSMC’s 16nmFF+ (FinFET+) process. This is the second-generation follow-up to TSMC’s first FinFET technology — the first generation is expected to be available this year, while FF+ won’t ship until sometime next year.
Jen-Hsun claims that Pascal will achieve over 2x the performance per watt of Maxwell in Single Precision General Matrix multiplication. But there are two caveats to this claim, as far as gamers are concerned. First, recall that improvements to performance per watt, while certainly vital and important, are not the same thing as improvements to top-line performance. The second thing to keep in mind is that boosting the card’s SGEMM performance doesn’t necessarily tell us much about gaming.
Pascal will be the first Nvidia product to debut with variable precision capability. If this sounds familiar, it’s because AMD appears to have debuted a similar capability last year. It’s not clear yet how Nvidia’s lower-precision capabilities dovetail with AMDs, but Jen-Hsun referred to 4x the FP16 performance in mixed mode compared with standard (he might have been referencing single or double-precision).
http://www.extremetech.com/gaming/2...mance-gains-from-upcoming-pascal-architecture
July 2015 for volume FF+ is what I've seen the last four months. Not that it matters at all for Pascal, if nVidia says 2016 at this point in time, we have no real reason to doubt that this is their (realistic) target.Pascal is Nvidia’s follow-up to Maxwell, and the first desktop chip to use TSMC’s 16nmFF+ (FinFET+) process. This is the second-generation follow-up to TSMC’s first FinFET technology — the first generation is expected to be available this year, while FF+ won’t ship until sometime next year.
Tonga (GCN 1.2) though does not get any peak flops throughput increase due to fp16 support, unlike pascal. Might still be faster in practice due to needing less registers, however. (And of course doesn't mean that future gcn iterations couldn't support 2xfp16 packed operations.)Pascal will be the first Nvidia product to debut with variable precision capability. If this sounds familiar, it’s because AMD appears to have debuted a similar capability last year. It’s not clear yet how Nvidia’s lower-precision capabilities dovetail with AMDs, but Jen-Hsun referred to 4x the FP16 performance in mixed mode compared with standard (he might have been referencing single or double-precision).
Tonga (GCN 1.2) though does not get any peak flops throughput increase due to fp16 support, unlike pascal. Might still be faster in practice due to needing less registers, however. (And of course doesn't mean that future gcn iterations couldn't support 2xfp16 packed operations.)
My guess would be 12 months on the nose (if all goes well), 18 if it doesn't. One thing I am curious about is they did not mention the 64bit throughput from mixed precision.The window is annoyingly wide though, 9-21 months from now. HBM2 may be just as much of a restriction timing wise.
Theoretically games could already support this, since d3d11.1 supports specifying variables which are allowed to have less precision (the lesser precision isn't guaranteed, so this should always work as the driver is free to ignore this). Doubtful though anyone already does this...I assume the use of FP16 would have to be something specifically supported/implemented by developers and won't just automatically work in existing games?