Nvidia Ampere Discussion [2020-05-14]

You think NVidia, the game's sponsor, has the press copy?

I would say, wait until tomorrow for performance benchmarks. It's one more day. No point in discussing unreliable internet rumours at this point. Even if that number is true we don't know how ultra compares visually to something lighter. Often ultra is huge perf hit with very minor increase in quality. This game has a ton of settings and effects to configure. There is multiple ray tracing effects, not only ray tracing on/off.
 
Rumor: NVIDIA's Next Generation 5nm GPU Architecture To Be Called Lovelace, Hopper MCM GPU Delayed? (wccftech.com)
According to the leaker (@kopite7kimi via Videocardz) singlehandedly responsible for pretty much all of the Ampere specs, NVIDIA is working on a next generation GPU architecture based on the mathematician Ada Lovelace. At the same time, it appears that the MCM based Hopper architecture has been delayed for now as it is nowhere to be seen and Lovelace architecture might take its place instead.
...
NVIDIA-GTC-2018-Heros-Ada-Lovelace.jpg

Videocardz actually managed to find a major hint in NVIDIA's own merchandise store that appear to confirm this rumor about Lovelace architecture being the next generation of GPUs from the company. If you look at the heroes showcased during GTC's 2018 keynote you find not only Ada Lovelace but what are potentially all future architectural codenames from NVIDIA.
...
There are now multiple rumors which seem to suggest that Lovelace architecture will be based on a 5nm process. Since NVIDIA has transitioned to Samsung's foundry, it is unclear whether 5nm refers to a TSMC process or Samsung's. Keep in mind however, that a recent report out of Korea had also confirmed an order on 6nm from NVIDIA - which means that either there is another generation from NVIDIA before Lovelace or that the 6nm process was for the refresh lineup.
 
Besides MCM is there anything exciting on the horizon? I imagine the next few generations will focus on performance and not features as lots of new features haven’t seen much usage yet (RT, mesh shaders, VRS).

The verdict’s still out on big GPU caches but maybe things will move in that direction.
 
Besides MCM is there anything exciting on the horizon? I imagine the next few generations will focus on performance and not features as lots of new features haven’t seen much usage yet (RT, mesh shaders, VRS).

The verdict’s still out on big GPU caches but maybe things will move in that direction.

Don't even know what features programmers would want anymore, hardware wise.

Also I'm not surprised at the mcm delay. It's a fundamentally different type of arch, and Nvidia hasn't really laid any groundwork for it.
 
Besides MCM is there anything exciting on the horizon? I imagine the next few generations will focus on performance and not features as lots of new features haven’t seen much usage yet (RT, mesh shaders, VRS).

The verdict’s still out on big GPU caches but maybe things will move in that direction.

I wonder how nVidia (and others) will deal with bandwidth needs. Big caches like rdna2 is not THE solution imo, because it take not a small amount of space, needed for more compute power I guess... That's why i've still Imagination Tech in mind and happy to see the TBDR still alive with series A and B. But I kind of live on the 90' where TBDR was a huge BW saver. Don't know how such an approach would do now.
 
Rumor: ASUS Confirms GeForce RTX 3080 Ti 20 GB & GeForce RTX 3060 12 GB ROG STRIX Custom Graphics Cards
December 25, 2020
Both graphics cards were spotted by HXL (@9550pro) within ASUS's support and services page for graphics cards. The list includes the GeForce RTX 30 series ROG STRIX variants and also mentions the unreleased GeForce RTX 3080 Ti & GeForce RTX 3060 graphics cards which we talked about in previous leaks.
...
The leak confirms that ASUS is working on both overclocked and non-overclocked variants for their ROG STRIX series. The GeForce RTX 3080 Ti will feature 20 GB of GDDR6X memory and the GeForce RTX 3060 will come equipped with 12 GB of GDDR6 memory. Since both of these cards are custom variants, they will be making use of the brand new ROG STRIX cooling solution that we have seen on the existing GeForce RTX 30 ROG STRIX lineup si expect top of the line cooling performance and a beefy custom PCB design.


ASUS-ROG-STRIX-GeForce-RTX-3080-Ti-20-GB-GeForce-RTX-3060-12-GB-Graphics-Cards-_1-1480x602.jpg
ASUS Confirms GeForce RTX 3080 Ti 20 GB & GeForce RTX 3060 12 GB ROG STRIX Custom Graphics Cards (wccftech.com)
 
Blender 2.91: Best CPUs & GPUs For Rendering & Viewport
December 23, 2020
With Blender 2.91 recently released, as well as a fresh crop of hardware from both AMD and NVIDIA, we’re tackling performance from many different angles here. On tap is rendering with the CPU, GPU, and CPU+GPU, as well as viewport – with wireframe testing making a rare appearance for important reasons. Let’s dig in.
...
An interesting thing about CPU rendering in Blender is that for most of the software’s life, that was the only option. That of course meant that the more powerful the CPU, the faster your render times would be. Fast-forward to today, though, and current GPUs are so fast, that it almost makes the CPU seem irrelevant in some ways.

Take, for example, the fact that it takes AMD’s 64-core Ryzen Threadripper 3990X to hit a 43 second render time with the BMW project, a value roughly matched by the $649 Radeon RX 6800 XT. However, that ignores NVIDIA’s even stronger performance, allowing the $399 GeForce RTX 3060 Ti to hit 34 seconds with CUDA, or 20 seconds with OptiX. You’re reading that right: NVIDIA’s (currently) lowest-end Ampere GeForce renders these projects as fast or faster than AMD’s biggest CPU.
Blender 2.91: Best CPUs & GPUs For Rendering & Viewport – Techgage
 
Last edited:
NVIDIA AD102 (Lovelace) GPU rumored to offer up to 18432 CUDA cores - VideoCardz.com
This is according to the new tweet from a well-known and proven leaker @kopite7kimi. This means that the GPU could hold as many as 72 Texture Processor Clusters (TPCs) and 144 Streaming Multiprocessors. With 144 SMs, the GPU could see as many as 18432 CUDA Cores (144 x 128), 71% more than GA102 GPU.
...
NVIDIA Lovelace AD102 GPU may arrive under GeForce RTX 40 series, possibly after RTX 30 SUPER refresh (which has not been confirmed yet). NVIDIA is now expected to refresh its lineup (only 3 months after release) with RTX 3080 Ti and RTX 3070 Ti graphics cards next quarter.


 
btw kopite7kimi didn't say it is a leak or something.

"GA102 has a "7*6" structure. Maybe AD102 will get a "12*6" structure." So it's not a news or a rumor, it's nothing...
 
Wheter true rumor or not, they probably go all out with the RTX4000 series yes, doubling rasterization performance again, whilest ray tracing performance could recieve a massive jump even larger then last. Wouldnt surprise me if NV launches those to the end of 2021. Theres more competition and Intel seemingly trying to enter the market. This is the result of NV going on full steam.

Im going to wait untill the latter half of 2021 and get a new system. Theres no need to upgrade for anyone having a 2018 high-er end Turing pc for now, aside from 2077 on pc everything on PS5/XSX is basically cross-gen aswell.

A fast zen3 12core, PCIE4 nvme 14gb/s and higher (DS should be mature by then), and something like a 4070/4080 or rdna3 if it has proper RT by then seems a nice machine for next gen.
 
Well you still need a good process to make all that... And the bandwidth...

It’s weird, I’m not seeing a whole lot of bandwidth usage on the 3090 in a few games I’ve looked at in the profiler. As in it doesn’t go over 10% at anytime during the entire frame. Hopefully I’m interpreting the stats wrong.

None of the functional units on the chip really get anywhere near to full utilization though. Register usage gets maxed out sometimes but that’s not necessarily a bottleneck if there’s enough work to do. L1 usage seems to be the most obvious bottleneck as it hovers between 60 and 80% at times. FP and ALU usage basically max out around 50%.

It’s not really clear why they should consider scaling up the same architecture. There already seems to be a severe underutilization problem.
 
It’s weird, I’m not seeing a whole lot of bandwidth usage on the 3090 in a few games I’ve looked at in the profiler. As in it doesn’t go over 10% at anytime during the entire frame. Hopefully I’m interpreting the stats wrong.

None of the functional units on the chip really get anywhere near to full utilization though. Register usage gets maxed out sometimes but that’s not necessarily a bottleneck if there’s enough work to do. L1 usage seems to be the most obvious bottleneck as it hovers between 60 and 80% at times. FP and ALU usage basically max out around 50%.

It’s not really clear why they should consider scaling up the same architecture. There already seems to be a severe underutilization problem.

I'm still curious to see if a next-gen geometry pipeline like mesh shaders or a software rasterizer like UE5 will improve utilization by being able to generate fragments faster.
 
I'm still curious to see if a next-gen geometry pipeline like mesh shaders or a software rasterizer like UE5 will improve utilization by being able to generate fragments faster.

There were rumblings of a 3dmark mesh shader test on the horizon but I think Nvidia's Asteroid demo is the only thing out there so far. I just tried it and SM usage is much higher but it doesn't seem to be due to mesh shaders directly.

The most efficient workloads were the GBuffer and Fog passes with SM usage at 80-90% (purple graph). Mostly pixel shader (green) and compute (orange) under SM occupancy. Mesh shaders were relatively light (blue).

The deferred lighting pass seems to be very register limited. L2 hit rates are high throughout the frame.

FPS: ~31
Resolution: 5120*2880
Total asteroids: 302 thousand per frame
Drawn triangles: 92 million per frame
Max LOD triangles: 3.4 trillion per frame

 
@trinibwoy I wonder if Nvidia is pusing for 4k and now 8k because it's the easiest way to increase utilization. It does seem weird to essentially double ampere when utilization is so low, especially at common resolutions like 1080p and 1440p.
 
Back
Top