Nvidia Turing Product Reviews and Previews: (Super, TI, 2080, 2070, 2060, 1660, etc)

Benchmarks: Premiere Pro with NVENC: Rendering videos in 20% of the time (
May 26, 2020
With version 14.2 for Premiere Pro and the Media Encoder, Adobe has added a new hardware acceleration. In export, H.264 and H.265 (HEVC) are accelerated via the built-in GPU. This is generally supported on all AMD and NVIDIA GPU platforms and diverse GPU models.
...
We did the tests on different platforms. On the one hand, an Intel Core i9-10900K was used, which could work via Quick Sync. We also exported the test sequence to an AMD Ryzen Threadripper 1950X. To show hardware acceleration, we used a GeForce RTX 2080 Ti in the reference version and a Radeon Pro WX8200.
...
To export the video to H.264, the GeForce RTX 2080 Ti takes only one-fifth of the time compared to the Intel Core i9-10900K. Compared to the AMD Ryzen Threadripper 1950X, admittedly a not quite fresh model, the NVIDIA card is even faster. With a Radeon Pro WX8200 you can save a little bit of time, but NVIDIA's hardware encoder seems to be much faster.

For an encoding in H.265, the image changes slightly. A Core i9-10900K is about as fast as a Radeon Pro WX8200. The GeForce RTX 2080 Ti does the work about half the time. The Ryzen Threadripper 1950X can't quite keep up with its 16 cores, but it doesn't fall as far away as it does for H.264.
https://www.hardwareluxx.de/index.php/news/software/anwendungprogramme/53246-premiere-pro-mit-nvenc-videos-rendern-in-20-der-zeit.html



 
Dual NVIDIA Quadro RTX 8000 Review with NVLink Performance
July 6, 2020
Today we round out our high-end graphics cards with two NVIDIA Quadro RTX 8000’s in NVLINK. The Quadro RTX 8000 matches many of the Titan RTX specifications. Perhaps the biggest delta is with memory, which gets boosted from 24GB to 48GB with ECC support. In an NVLINK configuration, we now have a massive amount of memory at 96GB total.
NVIDIA-Quadro-RTX-8000-NVLINK.jpg



https://www.servethehome.com/dual-nvidia-quadro-rtx-8000-review-with-nvlink-performance/
 
When Turing appeared, its 12nm process tech was already in a very comfortable spot in the yield curve. Partly due to it's very close relationship to TSMCs 16nm (some say it's virtually identical, so 16nm++). With a salvage part like the 2080 Ti was, you could already afford up to 5 crippling errors in the SMs and Memory Controllers - maybe there were not so many fully dysfunctional dies after all.
 
When Turing appeared, its 12nm process tech was already in a very comfortable spot in the yield curve. Partly due to it's very close relationship to TSMCs 16nm (some say it's virtually identical, so 16nm++). With a salvage part like the 2080 Ti was, you could already afford up to 5 crippling errors in the SMs and Memory Controllers - maybe there were not so many fully dysfunctional dies after all.

Maybe. That would be pretty amazing for such a large chip even on a mature process.
 
Maybe. That would be pretty amazing for such a large chip even on a mature process.
Yeah. And maybe, knowing how large the chip would end up anyway, Nvidia did some more finegrained redundancy as well in less replicated areas of the chip.
 
18-Way NVIDIA GPU Performance With Blender 2.90 Using OptiX + CUDA
September 6, 2020
A few days ago I published a deep dive into the CPU and GPU performance with Blender 2.90 as a major update to this open-source 3D modeling software. Following that I kept on testing more and older NVIDIA GPUs with the CUDA and OptiX back-end targets to now have an 18-way comparison from Maxwell to Turing with the new Blender 2.90.

embed.php


embed.php

https://phoronix.com/scan.php?page=news_item&px=Blender-2.90-18-NVIDIA-GPUs
 
Some of the performance deltas there just don't make sense. 2070 vs. 2070S? Almost no improvement for 2080Ti over 2080S?
 
Some of the performance deltas there just don't make sense. 2070 vs. 2070S? Almost no improvement for 2080Ti over 2080S?
I think this has something to do with TU104 having 8SM's in a GPC compared to TU102 and TU106 having 12SM's in a GPC.

There are some TU104 based RTX 2060's out there which outperformed the regular TU106 variants in workstation tasks, but showed no difference in game benchmarks.

 
You're right, perf seems grouped by # of GPCs with Cuda, while in Optix, RT cores y/n seem to play a more dominant role, but yet 3-GPC 1660S is at the level of 4-GPC 1070/1080.
That's quite strange, given that cycles is a pathtracer.
 
A factor here could be that some GPUs (eg. 2080ti) are actually power limited in this usage scenario. Would be interesting to see power usage, clock rates, and utilization measurements during this tests and also in comparison to a gaming work load. The 2080ti for instance has much more hardware resources compared to the 2080 Super relative to the actual power available between the two (almost the same).

As an aside I have problem with how power consumption is tested (including for CPUs) which also influences how people look at power consumption. With how it's mostly currently done what you're really testing and showing is just the behavior of the GPU (or CPUs) power limiter. Which further just creates this illusion that all work loads utilize the same amount of power and behavior.
 
I checked a 2080 Ti FE under Windows 10 in the classroom scene. It averages at just shy of 200 Watts in Cuda and around 175 Watts via Optix, consistently boostig to north of 1900 MHz. Power budget should not be the main culprit here.
 
You're right, perf seems grouped by # of GPCs with Cuda, while in Optix, RT cores y/n seem to play a more dominant role, but yet 3-GPC 1660S is at the level of 4-GPC 1070/1080.
That's quite strange, given that cycles is a pathtracer.

GPC and cash had some major improvments between 1660s and 1070. If you look at this list, it looks havely frontend bound.

Thats why also the 2080ti is in front of the Titan because the clockspeed is higher.
 
GPC and cash had some major improvments between 1660s and 1070. If you look at this list, it looks havely frontend bound.

Thats why also the 2080ti is in front of the Titan because the clockspeed is higher.
Not so sure about that. At least compared to the Founder's Edition (1635), the Titan RTX clocks higher (1770) on paper. And it has not only more ALUs, but more control logic for them too (72 vs. 68 SMs). Maybe though those were not
 
Back
Top