Nvidia shows signs in [2019]

Status
Not open for further replies.
I guess "building something new" is a good source of motivation too. New teams, new projects, a lot of ressources... A nice change of pace if you feel like stagnating in another place...
And I'll be the first to admit that when they got Raja and Chris I thought they had actually scored a real team that could do it, but I never expected them to go this far with it and go in this deeply. I am excited! :D
 
NVIDIA Brings CUDA to Arm, Enabling New Path to Exascale Supercomputing
June 17, 2019
NVIDIA today announced its support for Arm CPUs, providing the high performance computing industry a new path to build extremely energy-efficient, AI-enabled exascale supercomputers.

NVIDIA is making available to the Arm® ecosystem its full stack of AI and HPC software — which accelerates more than 600 HPC applications and all AI frameworks — by year's end. The stack includes all NVIDIA CUDA-X AI™ and HPC libraries, GPU-accelerated AI frameworks and software development tools such as PGI compilers with OpenACC support and profilers.

Once stack optimization is complete, NVIDIA will accelerate all major CPU architectures, including x86, POWER and Arm.
https://www.nasdaq.com/press-release/nvidia-brings-cuda-to-arm-enabling-new-path-to-exascale-supercomputing-20190617-00072
 
Last edited:
Nvidia's Lead Exceeds Intel's in Cloud

Nvidia’s GPUs now account for 97.4% of infrastructure-as-a-service (IaaS) instance types of dedicated accelerators deployed by the top four cloud services. By contrast, Intel’s processors are used in 92.8% of compute instance types, according to one of the first reports from Liftr Cloud Insights’ component tracking service.

AMD’s overall processor share of instance types is just 4.2%. Cloud services tend to keep older instance types in production as long as possible, so we see AMD increasing its share with expected deployments of its second-generation Epyc processor, aka Rome, in the second half of this year.

Among dedicated accelerators, AMD GPUs currently have only a 1.0% share of instance types, the same share as Xilinx’s Virtex UltraScale+ FPGAs. AMD will have to up its game in deep-learning software to make significant headway against the Nvidia juggernaut and its much deeper, more mature software capabilities.

Intel’s Arria 10 FPGA accounts for only 0.6% of dedicated accelerator instance types. Xilinx and Intel must combat the same Nvidia capabilities that AMD is facing, but FPGAs face additional challenges in data center development and verification tools.

Lftr2-min1.png


Lftr1-min.png

https://www.eetimes.com/author.asp?section_id=36&doc_id=1334812&_mc=RSS_EET_EDT
 
Summit, the world’s fastest supercomputer Triples Its Performance Record

DOING THE MATH: THE REALITY OF HPC AND AI CONVERGENCE
June 17, 2019

There is a more direct approach to converging HPC and AI, and that is to retrofit some of the matrix math libraries that are commonly used in HPC simulations so they can take advantage of dot product engines such as the Tensor Core units that are at the heart of the “Volta” Tesla GPU accelerators that are often at the heart of so-called AI supercomputers such as the “Summit” system at Oak Ridge National Laboratories.

As it turns out, a team of researchers at the University of Tennessee, Oak Ridge. And the University of Manchester, led by Jack Dongarra, one of the creators of the Linpack and HPL benchmarks that are used to gauge the raw performance of supercomputers, have come up with a mixed precision interative refinement solver that can make use of the Tensor Core units inside the Volta and get raw HPC matrix math calculations like those at the heart of Linpack done quicker than if they used the 64-bit math units on the Volta.

This underlying math that implements this iterative refinement approach that has been applied to the Tensor Core units is itself not new, and in fact it dates from the 1940s, according to Dongarra.
...
The good news is that a new and improved iterative refinement technique is working pretty well by pushing the bulk of the math to the 4×4, 16-bit floating point Tensor Core engines and doing a little 32-bit accumulate and a tiny bit of 64-bit math on top of that to produce an equivalent result to what was produced using only 64-bit math units on the Volta GPU accelerator – but in a much shorter time.

To put the iterative refinement solver to the test, techies at Nvidia worked with the team from Oak Ridge, the University of Tennessee, and the University of Manchester to port the HPL implementation of the Linpack benchmark, which is a 64-bit dense matrix calculation that is used by the Top500, to the new solver – creating what they are tentatively calling HPL-AI – and ran it both ways on the Summit supercomputer. The results were astoundingly good.

Running regular HPL on the full Summit, that worked out to 148.8 petaflops of aggregate compute, and running the HPL-AI variant on the iterative refinement solver in mixed precision it works out to an aggregate of 445 petaflops.

And to be super-precise, about 92 percent of the calculation time in the HPL-AI run was spent in the general matrix multiply (GEMM) library running in FP16 mode, with a little more than 7 percent of wall time being in the accumulate unit of the Tensor Core in FP32 mode and a little less than 1 percent stressing the 64-bit math units on Volta.

Now, the trick is to apply this iterative refinement solver to real HPC applications, and Nvidia is going to be making it available in the CUDA-X software stack so this can be done. Hopefully more and more work can be moved to mixed precision and take full advantage of those Tensor Core units. It’s not quite like free performance – customers are definitely paying for those Tensor Cores on the Volta chips – but it will feel like it is free, and that means Nvidia is going to have an advantage in the HPC market unless and until both Intel and AMD add something like Tensor Core to their future GPU accelerators.


ISC19-HPL-AI-Jack-Dongarra.jpg


https://www.nextplatform.com/2019/06/17/doing-the-math-the-reality-of-hpc-and-ai-convergence/







 
Last edited:
Researchers at VideoGorillas Use AI to Remaster Archived Content to 4K Resolution and Above
August 23, 2019
With 4K as the current standard and 8K experiences becoming the new norm, older content doesn’t meet today’s visual standard. The remastering process aims to revitalize older content to match these new standards. It has become a common practice in the industry, allowing audiences to revisit older favorites and enjoy them in a modern viewing experience.

Los Angeles-based VideoGorillas develops state-of-the-art media technology that incorporates AI techniques built on NVIDIA CUDA-X and Studio Stack. By integrating GPU-accelerated machine learning, deep learning, and computer vision, their techniques allow studios to achieve higher visual fidelity and increased productivity when it comes to remastering.

A recent innovation they’re developing is a new production-assisted AI technique called Bigfoot super resolution. This technique converts films from native 480p to 4K by using neural networks to predict missing pixels that are incredibly high quality, so the original content almost appears as it was filmed in 4K.

The networks are trained with Pytorch using CUDA and cuDNN with millions of images per film. However, loading thousands of images is creating a bottleneck in their pipeline. VideoGorillas is thus integrating DALI (NVIDIA Data Loading Library) to accelerate training times.

A cornerstone of video is the aggregation of visual information across adjacent frames. VideoGorillas uses Optical Flow to compute the relative motion of pixels between images. It provides frame consistency and minimizes any contextual or style loss within the image.

This new level of visual fidelity augmented by AI is only possible with NVIDIA RTX, which delivers 200x performance gains vs CPUs for their mixed precision and distributed training workflows. Video Gorillas trains super resolution networks with RTX 2080, and NVIDIA Quadro for larger-scale projects.
https://news.developer.nvidia.com/r...-archived-content-to-4k-resolution-and-above/
 
Google and Nvidia Post New AI Benchmarks
October 7, 2019
In the second slate of training results (V 0.6) released today, both Nvidia and Google have demonstrated their abilities to reduce the compute time needed to train the underlying deep neural networks used in common AI applications from days to hours.

The new results are truly impressive. Both Nvidia and Google claim #1 performance spots in three of the six “Max Scale” benchmarks. Nvidia was able to reduce their run-times dramatically (up to 80%) using the identical V100 TensorCore accelerator in the DGX2h building block. Many silicon startups are now probably explaining to their investors why their anticipated performance advantage over Nvidia has suddenly diminished, all due to Nvidia’s software prowess and ecosystem.

So, who “won” and does it matter? Since the companies ran the benchmarks on a massive configuration that maximizes the results with the shortest training time, being #1 may mean that the team was able to gang over a thousand accelerators to train the network, a herculean software endeavor.

Since both companies sell 16-chip configurations, and provided those results to mlperf, I have also provided that as a figure of normalized performance.

REVISEDTABLEV2ScreenShot2019-07-10at11.35.07AM.png


I find it interesting that Nvidia’s best absolute performance is on the more complex neural network models (reinforcement learning and heavy-weight object detection with Mask R-CNN), perhaps showing that their hardware programmability and flexibility helps them keep pace with the development of newer, more complex and deeper models. I would also note that Google has wisely decided to cast a larger net to capture TPU users, working now to support the popular PyTorch AI framework in addition to Google’s TensorFlow tool set. This will remove one of the two largest barriers to adoption, the other being the exclusivity of TPU in the Google Compute Platform (GCP).
https://www.eetimes.com/author.asp?section_id=36&doc_id=1334907#
 
Nvidia Introduces Aerial — Software to Accelerate 5G on NVIDIA GPUs
October 21, 2019

5G offers plenty of speed, of course, delivering 10x lower latency, 1,000x the bandwidth and millions of connected devices per square kilometer.
5G also introduces the critical concept of “network slicing.” This allows telcos to dynamically — on a session-by-session basis — offer unique services to customers.

Traditional solutions cannot be reconfigured quickly, therefore telco operators need a new network architecture. One that’s high performance and reconfigurable by the second

With NVIDIA Aerial, the same computing infrastructure required for 5G networking can be used to provide AI services such as smart cities, smart factories, AR/VR and cloud gaming.

Aerial provides two critical SDKs — CUDA Virtual Network Function (cuVNF) and CUDA Baseband (cuBB) — to simplify building highly scalable and programmable, software-defined 5G RAN networks using off-the-shelf servers with NVIDIA GPUs.

The NVIDIA Aerial SDK runs on the
NVIDIA EGX stack, bringing GPU acceleration to carrier-grade Kubernetes infrastructure.

The NVIDIA EGX stack includes an NVIDIA driver, NVIDIA Kubernetes plug-in, NVIDIA Container runtime plug-in and NVIDIA GPU monitoring software.

https://blogs.nvidia.com/blog/2019/10/21/aerial-application-framework-5g-networks/

 
Last edited:
Sharp rise in global GPU shipments in Q3’19 reports Jon Peddie Research

Nvidia’s shipments showed a giant 38.3% increase from last quarter. The company leads in discrete GPUs and this quarter shipped more than all of AMD’s total GPUs (including APUs).

For comparison, AMD’s overall unit shipments increased 8.7% quarter-to-quarter, Intel’s total shipments increased 5.4% from last quarter, and as mentioned, Nvidia’s increased 38.3%. Last quarter was transitional for Nvidia and the company ramped up at the end of the quarter. Channel inventory is now healthy says the company, Q4 notebook will be seasonally down. Nvidia says RTX doing well and represents 66% of its gaming revenue.

  • The attach rate of GPUs (includes integrated and discrete GPUs) to PCs for the quarter was 128% which was up 1.7% from last quarter.
  • Discrete GPUs were in 32.1% of PCs, which is up 5.13% from last quarter.
  • The overall PC market increased by 9.21% quarter-to-quarter and increased by 3.7% year-to-year.
  • Desktop graphics add-in boards (AIBs) that use discrete GPUs increased 42.2% from last quarter.
23e4d99c-d728-4a2f-93c8-e76885850f6a.png

In seasonal cycles of the past, overall graphics shipments in the third quarter are typically up from the previous quarter. For Q32'19, it increased by almost 11% from last quarter and was greater than the ten-year average.

https://www.jonpeddie.com/press-rel...shipments-in-q319-reports-jon-peddie-research
 
Nvidia Touts Chip Deals With China's Alibaba, Baidu and Didi
December 17, 2019

Nvidia Corp on Wednesday said it has won a series of deals in which some of China's biggest technology companies are using its chips to make product recommendations and to develop self-driving vehicles.

Nvidia told reporters that e-commerce giant Alibaba Group Holding Ltd and search engine provider Baidu Inc have started using its chips to run systems that make recommendations to users with the aim of increasing the number of times users click on those recommendations.

Nvidia also said ride-hailing service Didi Chuxing has adopted its chips both for developing self-driving cars on the road as well as in its back-end data centres.

At its event, Nvidia also plans to announce tools that will let participating car makers from Germany, China and North America learn from each others' training data without having to share data directly - a system called "federated" learning.

"This is a way to aggregate different types of data sets form different companies," Danny Shapiro, senior director of automotive at Nvidia, said at the briefing. "The key thing here is that each (carmaker) or each region can maintain and protect their own data. It's owned wholly. It's not shared."

https://www.reuters.com/article/nvi...h-chinas-alibaba-baidu-and-didi-idUSL1N28S029
 
Tencent partners with Nvidia to launch a game streaming service

While console and PC gaming remains the current way that most gamers choose to play their AAA games, many companies are taking an interest in introducing cloud gaming as an alternative. While PlayStation has PS Now, Xbox has Project xCloud, and Google has Stadia, it seems that Nvidia and Tencent want a piece of the pie in China, as the two companies have announced a partnership to launch Tencent’s ‘START’ cloud game streaming service in the country.

Making the announcement yesterday, they stated that “NVIDIA’s GPU technology will power Tencent Games’ START cloud gaming service, which began testing earlier this year. START gives gamers access to AAA games on underpowered devices anytime, anywhere. Tencent Games intends to scale the platform to millions of gamers, with an experience that is consistent with playing locally on a gaming rig”. START appears to be making similar promises to that of other streaming services such as Stadia – although so far these promises have yet to come to fruition.

https://www.kitguru.net/tech-news/mustafa-mahmoud/tencent-partners-with-nvidia-to-launch-a-game-streaming-service/

 
Status
Not open for further replies.
Back
Top