Nvidia Pascal Speculation Thread

Status
Not open for further replies.
That was not promised, just the stacked memory.
Stack a module on the other, and it's stacking a full PCB, a multi-hundred Watt GPU, and all those hot VRMs right on top of each other.
(addendum: The full PCB is admittedly small, but it's still a thermal barrier if nothing else.)

Agreed.
What IBM is able to do is to bundle many modules next to each other and use powerful liquid cooling.
That will do, with the usual low tech measure of stacking computers (or nodes) on top of each other
 
NvLink is nothing special, every HPC company are now and still a long time at work for provide the same things.. if IBM was working on this, thats not a mistery and i can ensure you that it will not be so much different of what other have to offer on this aspect. ( they was just the first to put it in a roadmap ). And i can ensure you that AMD have allready the same type of link in preparation since a long long time before this was annonced.


Right, that's why the technology behind NVLink won a best paper award at ISSCC.
http://isscc.org/about/awards_2013.html
 
Already first design win for Pascal !!!
The numbers are big and so is the news. The U.S. Department of Energy today unveiled plans to build two GPU-powered supercomputers. Each will deliver at least 100 petaflops of compute performance.
And one – the Summit system at Oak Ridge National Laboratory, designed for open science – is expected to be 150 petaflops. That’s more than three times the peak speed of today’s fastest supercomputer.
...
NVIDIA GPUs and IBM POWER CPUs, connected with the NVLink interconnect technology, will power both machines.

more at the source: http://blogs.nvidia.com/blog/2014/11/14/what-is-nvlink/
 
correction, finally it's not Pascal but Volta
source: http://info.nvidianews.com/rs/nvidi...k at Summit and Sierra Supercomputers-3-1.pdf
page 4:
Summit features more than 3,400 compute nodes, enough to deliver a peak performance between 150
and 300 petaFLOPS, and is expected to deliver more than five times the system-level application
performance of Titan while consuming only 10% more power. Each compute node includes multiple
next-generation IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs based on the NVIDIA Volta
architecture. Each node is expected to deliver more than 40 TFLOPS, which is enough to outperform an
entire rack of Haswell x86 CPU servers. In fact, just four nodes in Summit system would be powerful
enough to qualify for the Top500 list, as of June 2014.
 
300 PetaFLOPS :eek:

EDIT: If I remember correctly, Skynet from Terminator 3 ran at about 55 TFLOPS. Skynet had better watch out!
 
Last edited:
Not sure if this is the right place to ask, but it was said that Pascal will have 4x the mixed precision performance as maxwell in the recent conference. What is the mixed precision performance of maxwell? can I assume it is the same as single precision?
 
Not sure if this is the right place to ask, but it was said that Pascal will have 4x the mixed precision performance as maxwell in the recent conference. What is the mixed precision performance of maxwell? can I assume it is the same as single precision?
No, Jensen said Pascal has 2x the single precision performance, with FP16 being an additional 2x.
 
Interesting, I take it that means that FP16 Pascal performance is 4x maxwell's single precision performance?
T

I got more the impression that it is, Pascal have 2x the FP32 performance of Maxwell and if they compare FP16 performance , you have 2x the performance in FP16 than Maxwell.
 
I got more the impression that it is, Pascal have 2x the FP32 performance of Maxwell and if they compare FP16 performance , you have 2x the performance in FP16 than Maxwell.

Some sources are saying 4x the mixed precision performance over maxwell, I take it that means 4x the fp16 performance. I'm not sure what maxwell's mixed precision performance is, I assume by mixed precision they refer to FP16. So if you have 4x the performance, it should be either 4x times the single precision(if maxwell fp16 performance is the same as its single precision performance) or if maxwell's fp16 performance is twice the single precision performance then Pascal should be 4x that which would mean its FP16 performance would be 8x maxwell's single precision performance.
 
Some sources are saying 4x the mixed precision performance over maxwell, I take it that means 4x the fp16 performance. I'm not sure what maxwell's mixed precision performance is,
Discrete Maxwell currently has to do FP16 ops on its FP32 ALUs, 1:1. Tegra X1 introduced bundled FP16 instructions, which allowed for 2 FP16 instructions (that are the same operation) to be executed together in a FP32 unit, doubling FP16 throughput. This feature is coming to discrete GPUs with Pascal, and will lead to Pascal having 2x the FP16 throughput as Maxwell, all things held equal. Meanwhile Pascal is supposed to also offer 2x the perf-per-watt (NVIDIA is careful not to say raw performance) of Maxwell, hence 2x * 2x = 4x.
 
Pascal1-640x223.png


Pascal is Nvidia’s follow-up to Maxwell, and the first desktop chip to use TSMC’s 16nmFF+ (FinFET+) process. This is the second-generation follow-up to TSMC’s first FinFET technology — the first generation is expected to be available this year, while FF+ won’t ship until sometime next year.

Jen-Hsun claims that Pascal will achieve over 2x the performance per watt of Maxwell in Single Precision General Matrix multiplication. But there are two caveats to this claim, as far as gamers are concerned. First, recall that improvements to performance per watt, while certainly vital and important, are not the same thing as improvements to top-line performance. The second thing to keep in mind is that boosting the card’s SGEMM performance doesn’t necessarily tell us much about gaming.


SGEMM-640x193.png



Pascal2-640x220.png


Pascal will be the first Nvidia product to debut with variable precision capability. If this sounds familiar, it’s because AMD appears to have debuted a similar capability last year. It’s not clear yet how Nvidia’s lower-precision capabilities dovetail with AMDs, but Jen-Hsun referred to 4x the FP16 performance in mixed mode compared with standard (he might have been referencing single or double-precision).

http://www.extremetech.com/gaming/2...mance-gains-from-upcoming-pascal-architecture
 
Last edited:
Pascal1-640x223.png


Pascal is Nvidia’s follow-up to Maxwell, and the first desktop chip to use TSMC’s 16nmFF+ (FinFET+) process. This is the second-generation follow-up to TSMC’s first FinFET technology — the first generation is expected to be available this year, while FF+ won’t ship until sometime next year.

Jen-Hsun claims that Pascal will achieve over 2x the performance per watt of Maxwell in Single Precision General Matrix multiplication. But there are two caveats to this claim, as far as gamers are concerned. First, recall that improvements to performance per watt, while certainly vital and important, are not the same thing as improvements to top-line performance. The second thing to keep in mind is that boosting the card’s SGEMM performance doesn’t necessarily tell us much about gaming.


SGEMM-640x193.png



Pascal2-640x220.png


Pascal will be the first Nvidia product to debut with variable precision capability. If this sounds familiar, it’s because AMD appears to have debuted a similar capability last year. It’s not clear yet how Nvidia’s lower-precision capabilities dovetail with AMDs, but Jen-Hsun referred to 4x the FP16 performance in mixed mode compared with standard (he might have been referencing single or double-precision).

http://www.extremetech.com/gaming/2...mance-gains-from-upcoming-pascal-architecture
Pascal has 2x the throughput in FP16 compared to FP32. The 4x comes because Pascal can do 2x the FP32 throughout at the same power, or 4x the FP16 throughput. Jensen referred to mixed precision because most algorithms will still use FP32 for a few numerically critical parts of the application. I certainly will - FP16 is great for training deep neural networks, except in the reductions where we overflow and need FP32.
 
Pascal is Nvidia’s follow-up to Maxwell, and the first desktop chip to use TSMC’s 16nmFF+ (FinFET+) process. This is the second-generation follow-up to TSMC’s first FinFET technology — the first generation is expected to be available this year, while FF+ won’t ship until sometime next year.
July 2015 for volume FF+ is what I've seen the last four months. Not that it matters at all for Pascal, if nVidia says 2016 at this point in time, we have no real reason to doubt that this is their (realistic) target.

The window is annoyingly wide though, 9-21 months from now. HBM2 may be just as much of a restriction timing wise.
 
Pascal will be the first Nvidia product to debut with variable precision capability. If this sounds familiar, it’s because AMD appears to have debuted a similar capability last year. It’s not clear yet how Nvidia’s lower-precision capabilities dovetail with AMDs, but Jen-Hsun referred to 4x the FP16 performance in mixed mode compared with standard (he might have been referencing single or double-precision).
Tonga (GCN 1.2) though does not get any peak flops throughput increase due to fp16 support, unlike pascal. Might still be faster in practice due to needing less registers, however. (And of course doesn't mean that future gcn iterations couldn't support 2xfp16 packed operations.)
 
Tonga (GCN 1.2) though does not get any peak flops throughput increase due to fp16 support, unlike pascal. Might still be faster in practice due to needing less registers, however. (And of course doesn't mean that future gcn iterations couldn't support 2xfp16 packed operations.)

I assume the use of FP16 would have to be something specifically supported/implemented by developers and won't just automatically work in existing games?
 
The window is annoyingly wide though, 9-21 months from now. HBM2 may be just as much of a restriction timing wise.
My guess would be 12 months on the nose (if all goes well), 18 if it doesn't. One thing I am curious about is they did not mention the 64bit throughput from mixed precision.
 
I assume the use of FP16 would have to be something specifically supported/implemented by developers and won't just automatically work in existing games?
Theoretically games could already support this, since d3d11.1 supports specifying variables which are allowed to have less precision (the lesser precision isn't guaranteed, so this should always work as the driver is free to ignore this). Doubtful though anyone already does this...
It is also possible a driver compiler could try to recognize some operations which don't require full precision, but this is probably more or less impossible to get fully correct in some way which would really help. Of course, in the past nothing stopped certain IHVs from implementing shaders requiring fp24 precision with some lower precision version or just replacing them with something different providing "similar" results (or probably in some cases, really the same result) which would be one reason how one could benefit from this without explicit app support - unless this is some high-profile game used a lot in benchmarks, I wouldn't really expect much, though...
 
Status
Not open for further replies.
Back
Top