Nvidia Volta Speculation Thread

I guess, but energy is what matters, and you're going to use much less of it while fetching something from an SRAM cache than from off-chip DRAM in all cases that I know of. :-|
 
I guess, but energy is what matters, and you're going to use much less of it while fetching something from an SRAM cache than from off-chip DRAM in all cases that I know of. :-|

Yes, but they are talking about an special type of applications. And my assumption is that the processor does have L2 cache, while the additional SRAM functions as an L3. I'm guessing that such L3 acts just an intermediate step that simply consumes energy while doing almost nothing in the cases that they are describing.
 
Volta steps up:

"ORNL researchers have figured out how to harness the power and intelligence of Summit’s state-of-art architecture to successfully run the world’s first exascale scientific calculation. A team of scientists led by ORNL’s Dan Jacobson and Wayne Joubert has leveraged the intelligence of the machine to run a 1.88 exaops comparative genomics calculation relevant to research in bioenergy and human health. The mixed precision exaops calculation produced identical results to more time-consuming 64-bit calculations previously run on Titan."​

https://www.scientificcomputing.com/news/2018/06/ornl-launches-summit-supercomputer

Faster and smarter. Nice work, IBM, NV and Volta.
 
"ORNL scientists were among the scientific teams that achieved the first gigaflops calculations in 1988, the first teraflops calculations in 1998, the first petaflops calculations in 2008 and now the first exaops calculations in 2018."

I sense... a pattern (although I am pretty sure first gigaflop system went up in 1985).
 
"ORNL scientists were among the scientific teams that achieved the first gigaflops calculations in 1988, the first teraflops calculations in 1998, the first petaflops calculations in 2008 and now the first exaops calculations in 2018."

I sense... a pattern (although I am pretty sure first gigaflop system went up in 1985).
Were the previous records all on double precision?
 
NVIDIA announced the TITAN V CEO Edition at the Computer Vision and Pattern Recognition conference yesterday. 20 of these GPUs were given away at the conference, but there is no general release or pricing information at this time.

d-fabio-ramos.jpg


I honestly thought the name was a joke when I first saw it (from a secondary source).

The TITAN V CEO Edition has specs similar to those of the Tesla V100:
  • 32 GB memory,
  • 125 Tensor Core TFLOPS.
AnandTech has a spec table and speculation here.

I wonder if bandwidth was a big reason for the CEO Edition. From the AnandTech article, "bandwidth-bound scenarios are more common than one might think, as the regular Titan V can fully saturate its memory bandwidth on compute alone and still come up short," which is not surprising to me after reading posts on Beyond3D. If this product gets a wider release in the future then the TITAN line would have a higher bandwidth option.
 
Last edited:
The 12GB Titan V is around 9-17% slower in Amber for Solvent FP32 compute than the 16GB V100 PCIe.
Gives a bit of an indicator, but only partially helpful.
 
New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500
In the latest TOP500 rankings announced this week, 56 percent of the additional flops were a result of NVIDIA Tesla GPUs running in new supercomputers – that according to the Nvidians, who enjoy keeping track of such things. In this case, most of those additional flops came from three top systems new to the list: Summit, Sierra, and the AI Bridging Cloud Infrastructure (ABCI).

Summit, the new TOP500 champ, pushed the previous number one system, the 93-petaflop Sunway TaihuLight, into second place with a Linpack score of 122.3 petaflops. Summit is powered by IBM servers, each one equipped with two Power9 CPUs and six V100 GPUs. According to NVIDIA, 95 percent of the Summit’s peak performance (187.7 petaflops) is derived from the system’s 27,686 GPUs.
...
As dramatic as that 56 percent number is for new TOP500 flops, the reality is probably even more impressive. According to Ian Buck, vice president of NVIDIA's Accelerated Computing business unit, more than half the Tesla GPUs they sell into the HPC/AI/data analytics space are bought by customers who never submit their systems for TOP500 consideration. Although many of these GPU-accelerated machines would qualify for a spot on the list, these particular customers either don’t care about all the TOP500 fanfare or would rather not advertise their hardware-buying habits to their competitors.
...
While company’s like Intel, Google, Fujitsu, Wave Computing, Graphcore, and others are developing specialized deep learning accelerators for the datacenter, NVIDIA is sticking with an integrated AI-HPC design for its Tesla GPU line. And this certainly seems to be paying off, given the growing trend of using artificial intelligence to accelerate traditional HPC applications. Although the percentage of users integrating HPC and AI is still relatively small, this mixed-workflow model is slowly being extended to nearly every science and engineering domain, from weather forecasting and financial analytics, to genomics and oil & gas exploration.
...
And, thanks in large part to these deep-learning-enhanced V100 GPUs, mixed-workload machines are now popping up on a fairly regular basis. For example, although Summit was originally going to be just another humongous supercomputer, it is now being groomed as a platform for cutting-edge AI as well. By contrast, the ABCI system was conceived from the beginning as an AI-capable supercomputer that would serve users running both traditional simulations and analytics, as well as deep learning workloads. Earlier this month, the MareNostrum supercomputer added three racks of Power9/V100 nodes, paving the way for serious deep learning work to commence at the Barcelona Supercomputing Centre. And even the addition of just 12 V100 GPUs to the Nimbus cloud service at the Pawsey Supercomputing Centre was enough to claim that AI would now be fair game on the Aussie system.
https://www.top500.org/news/new-gpu...rs-change-the-balance-of-power-on-the-top500/
 
AI Can Now Fix Your Grainy Photos by Only Looking at Grainy Photos

Researchers from NVIDIA, Aalto University, and MIT developed a deep learning based method that can fix photos by simply looking at examples of corrupted photos only.
 
The NVIDIA Titan V Deep Learning Deep Dive: It's All About The Tensor Cores
July 3, 2017
The most eye-catching of Volta’s new features are the new specialized processing blocks – tensor cores – but as we will see, this is very much integrated with the rest of Volta's microarchitectural improvements and surrounding software/framework support for deep learning (DL) and high performance compute (HPC). Matching up with the NVIDIA Titan V are the Titan Xp and GeForce GTX Titan X (Maxwell), with the AMD Radeon RX Vega 64 also present for some tests.
https://www.anandtech.com/show/12673/titan-v-deep-learning-deep-dive
 
NVIDIA has introduced a new DGX-2H with 450 W Tesla V100 GPUs and some other upgrades, according to ServeTheHome.

NVIDIA-DGX-2H-v-DGX-2-Specs.jpg


Regarding the 2 PFLOPS listed for the DGX-2H,
Patrick Kennedy (ServeTheHome) said:
We reached out to NVIDIA regarding the 2 petaflop number. NVIDIA said that it should be 2.1 petaflops and will be updated accordingly.
The number of SPs is still the same so the new 450 W V100 appears to have a clock speed of ~1.6 GHz.

UPDATE: NVIDIA's DGX-2H data sheet now states 2.1 PFLOPS.
 
Last edited:
Impressive! Just one DGX-2H will place you about # 62 in the TOP 500 list.
I was surprised when I heard Brookhaven National Laboratory was getting one but now it makes sense.
 
Good stuff, probably goes along with being a first gen tensor core product and would expect some change in later products. I'd like to see the results of a reverse engineered Turing since the tensor cores are different, and for next year's Ampere product.
 
Good stuff, probably goes along with being a first gen tensor core product and would expect some change in later products. I'd like to see the results of a reverse engineered Turing since the tensor cores are different, and for next year's Ampere product.
Why would they change it when it was apparently good enough for a high end HPC focused GPU and Turing is far more consumer oriented? We know that the Tensors are different in Turing?
 
Back
Top