Nvidia Volta Speculation Thread

Benetanegia · Jun 11, 2018

I can only think of it meaning:

Power of sram + sys ram > Power of sys ram

Alexko · Jun 11, 2018

I guess, but energy is what matters, and you're going to use much less of it while fetching something from an SRAM cache than from off-chip DRAM in all cases that I know of. :-|

Benetanegia · Jun 12, 2018

Alexko said:
I guess, but energy is what matters, and you're going to use much less of it while fetching something from an SRAM cache than from off-chip DRAM in all cases that I know of.

Yes, but they are talking about an special type of applications. And my assumption is that the processor does have L2 cache, while the additional SRAM functions as an L3. I'm guessing that such L3 acts just an intermediate step that simply consumes energy while doing almost nothing in the cases that they are describing.

nnunn · Jun 12, 2018

Volta steps up:

"ORNL researchers have figured out how to harness the power and intelligence of Summit’s state-of-art architecture to successfully run the world’s first exascale scientific calculation. A team of scientists led by ORNL’s Dan Jacobson and Wayne Joubert has leveraged the intelligence of the machine to run a 1.88 exaops comparative genomics calculation relevant to research in bioenergy and human health. The mixed precision exaops calculation produced identical results to more time-consuming 64-bit calculations previously run on Titan."

https://www.scientificcomputing.com/news/2018/06/ornl-launches-summit-supercomputer

Faster and smarter. Nice work, IBM, NV and Volta.

Geeforcer · Jun 12, 2018

"ORNL scientists were among the scientific teams that achieved the first gigaflops calculations in 1988, the first teraflops calculations in 1998, the first petaflops calculations in 2008 and now the first exaops calculations in 2018."

I sense... a pattern (although I am pretty sure first gigaflop system went up in 1985).

firstminion · Jun 14, 2018

Geeforcer said:
"ORNL scientists were among the scientific teams that achieved the first gigaflops calculations in 1988, the first teraflops calculations in 1998, the first petaflops calculations in 2008 and now the first exaops calculations in 2018."

I sense... a pattern (although I am pretty sure first gigaflop system went up in 1985).

Were the previous records all on double precision?

iMacmatician · Jun 21, 2018

NVIDIA announced the TITAN V CEO Edition at the Computer Vision and Pattern Recognition conference yesterday. 20 of these GPUs were given away at the conference, but there is no general release or pricing information at this time.

I honestly thought the name was a joke when I first saw it (from a secondary source).

The TITAN V CEO Edition has specs similar to those of the Tesla V100:

32 GB memory,
125 Tensor Core TFLOPS.

AnandTech has a spec table and speculation here.

I wonder if bandwidth was a big reason for the CEO Edition. From the AnandTech article, "bandwidth-bound scenarios are more common than one might think, as the regular Titan V can fully saturate its memory bandwidth on compute alone and still come up short," which is not surprising to me after reading posts on Beyond3D. If this product gets a wider release in the future then the TITAN line would have a higher bandwidth option.

CSI PC · Jun 21, 2018

The 12GB Titan V is around 9-17% slower in Amber for Solvent FP32 compute than the 16GB V100 PCIe.
Gives a bit of an indicator, but only partially helpful.

Deleted member 2197 · Jun 24, 2018

Deep Learning SDK Documentation
June 18, 2018
https://docs.nvidia.com/deeplearning/sdk/index.html

Deleted member 2197 · Jul 10, 2018

New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500

In the latest TOP500 rankings announced this week, 56 percent of the additional flops were a result of NVIDIA Tesla GPUs running in new supercomputers – that according to the Nvidians, who enjoy keeping track of such things. In this case, most of those additional flops came from three top systems new to the list: Summit, Sierra, and the AI Bridging Cloud Infrastructure (ABCI).

Summit, the new TOP500 champ, pushed the previous number one system, the 93-petaflop Sunway TaihuLight, into second place with a Linpack score of 122.3 petaflops. Summit is powered by IBM servers, each one equipped with two Power9 CPUs and six V100 GPUs. According to NVIDIA, 95 percent of the Summit’s peak performance (187.7 petaflops) is derived from the system’s 27,686 GPUs.
...
As dramatic as that 56 percent number is for new TOP500 flops, the reality is probably even more impressive. According to Ian Buck, vice president of NVIDIA's Accelerated Computing business unit, more than half the Tesla GPUs they sell into the HPC/AI/data analytics space are bought by customers who never submit their systems for TOP500 consideration. Although many of these GPU-accelerated machines would qualify for a spot on the list, these particular customers either don’t care about all the TOP500 fanfare or would rather not advertise their hardware-buying habits to their competitors.
...
While company’s like Intel, Google, Fujitsu, Wave Computing, Graphcore, and others are developing specialized deep learning accelerators for the datacenter, NVIDIA is sticking with an integrated AI-HPC design for its Tesla GPU line. And this certainly seems to be paying off, given the growing trend of using artificial intelligence to accelerate traditional HPC applications. Although the percentage of users integrating HPC and AI is still relatively small, this mixed-workflow model is slowly being extended to nearly every science and engineering domain, from weather forecasting and financial analytics, to genomics and oil & gas exploration.
...
And, thanks in large part to these deep-learning-enhanced V100 GPUs, mixed-workload machines are now popping up on a fairly regular basis. For example, although Summit was originally going to be just another humongous supercomputer, it is now being groomed as a platform for cutting-edge AI as well. By contrast, the ABCI system was conceived from the beginning as an AI-capable supercomputer that would serve users running both traditional simulations and analytics, as well as deep learning workloads. Earlier this month, the MareNostrum supercomputer added three racks of Power9/V100 nodes, paving the way for serious deep learning work to commence at the Barcelona Supercomputing Centre. And even the addition of just 12 V100 GPUs to the Nimbus cloud service at the Pawsey Supercomputing Centre was enough to claim that AI would now be fair game on the Aussie system.

https://www.top500.org/news/new-gpu...rs-change-the-balance-of-power-on-the-top500/

Deleted member 2197 · Jul 11, 2018

AI Can Now Fix Your Grainy Photos by Only Looking at Grainy Photos

Researchers from NVIDIA, Aalto University, and MIT developed a deep learning based method that can fix photos by simply looking at examples of corrupted photos only.

Deleted member 2197 · Jul 18, 2018

The NVIDIA Titan V Deep Learning Deep Dive: It's All About The Tensor Cores
July 3, 2017

The most eye-catching of Volta’s new features are the new specialized processing blocks – tensor cores – but as we will see, this is very much integrated with the rest of Volta's microarchitectural improvements and surrounding software/framework support for deep learning (DL) and high performance compute (HPC). Matching up with the NVIDIA Titan V are the Titan Xp and GeForce GTX Titan X (Maxwell), with the AMD Radeon RX Vega 64 also present for some tests.

https://www.anandtech.com/show/12673/titan-v-deep-learning-deep-dive

iMacmatician · Nov 19, 2018

NVIDIA has introduced a new DGX-2H with 450 W Tesla V100 GPUs and some other upgrades, according to ServeTheHome.

Regarding the 2 PFLOPS listed for the DGX-2H,

Patrick Kennedy (ServeTheHome) said:
We reached out to NVIDIA regarding the 2 petaflop number. NVIDIA said that it should be 2.1 petaflops and will be updated accordingly.

The number of SPs is still the same so the new 450 W V100 appears to have a clock speed of ~1.6 GHz.

UPDATE: NVIDIA's DGX-2H data sheet now states 2.1 PFLOPS.

DavidGraham · Nov 19, 2018

iMacmatician said:
The number of SPs is still the same so the new 450 W V100 appears to have a clock speed of ~1.6 GHz.

So that one is SXM4 then? SXM3 was 350w, SXM2 is 300w (the NVLink version) and SXM1 is 250w (the PCI-E version).

nnunn · Nov 21, 2018

RichReport asks Nvidia about that overclocked DGX-2H

Deleted member 2197 · Nov 21, 2018

Impressive! Just one DGX-2H will place you about # 62 in the TOP 500 list.
I was surprised when I heard Brookhaven National Laboratory was getting one but now it makes sense.

RecessionCone · Nov 21, 2018

pharma said:
Impressive! Just one DGX-2H will place you about # 62 in the TOP 500 list.
I was surprised when I heard Brookhaven National Laboratory was getting one but now it makes sense.

I believe that was 36 DGX-2H systems, not one. They chose 36 because that’s the number of ports in the normal Infiniband switch.

PizzaKoma · Jul 19, 2019

https://twitter.com/x/status/1151150357902503948

#Intel reverse engineered the #Nvidia Volta Tensor Core and found that it actually does a lot more rounding and truncation than what it seems to most. Nvidia is throwing away some precision in the tensor core for additional efficiency and speed up.

http://www.lab3.kuis.kyoto-u.ac.jp/arith26/slides/session5/5-5.pdf

Deleted member 2197 · Jul 19, 2019

Good stuff, probably goes along with being a first gen tensor core product and would expect some change in later products. I'd like to see the results of a reverse engineered Turing since the tensor cores are different, and for next year's Ampere product.

Malo · Jul 20, 2019

pharma said:
Good stuff, probably goes along with being a first gen tensor core product and would expect some change in later products. I'd like to see the results of a reverse engineered Turing since the tensor cores are different, and for next year's Ampere product.

Why would they change it when it was apparently good enough for a high end HPC focused GPU and Turing is far more consumer oriented? We know that the Tensors are different in Turing?

Nvidia Volta Speculation Thread

Benetanegia

Alexko

Benetanegia

nnunn

Geeforcer

Harmlessly Evil

firstminion

iMacmatician

CSI PC

Deleted member 2197

Guest

Deleted member 2197

Guest

Deleted member 2197

Guest

Deleted member 2197

Guest

iMacmatician

DavidGraham

nnunn

Deleted member 2197

Guest

RecessionCone

PizzaKoma

Deleted member 2197

Guest

Malo

Yak Mechanicum

Similar threads