Nvidia Ampere Discussion [2020-05-14]

That leads me to believe that even Nvidia doesn’t think the inflated numbers are worth talking about.

It leads me to believe they are not a false reflection of their performance either. They instead let the benchmarks speak, it's that what impressed people. 80 to % 100% increase in raw raster performance in multiple modern famous games this early on. 200% increase in ray tracing? And then the DLSS going berserk mode.
All that with prices that seem humane again. Let's not forget all the other features they added on their way.
The GPU design seems to be very good too, with the airflow and build quality on top of that.
I'm sure those 20-36TF's can come in handy with many UE5 games.

People expected 30% increase before that day.
 
It leads me to believe they are not a false reflection of their performance either. They instead let the benchmarks speak, it's that what impressed people. 80 to % 100% increase in raw raster performance in multiple modern famous games this early on. 200% increase in ray tracing? And then the DLSS going berserk mode.
All that with prices that seem humane again. Let's not forget all the other features they added on their way.
The GPU design seems to be very good too, with the airflow and build quality on top of that.
I'm sure those 20-36TF's can come in handy with many UE5 games.

People expected 30% increase before that day.

No doubt, it's the end result that matters.
 
No doubt, it's the end result that matters.

And most importantly, price. I think NV feels the heat from AMD. They are advancing i think in the GPU department. I wouldn't mind another '9700 pro' moment, shaking up the pc gpu space abit more.

Edit: very good memories of the 9700pro, that thing teared through DX9 games like HL2 which was the game of the century. Held up very well in other titles too.
 
Yeah, the compiler knows for sure but I'm not sure it matters. The compiler's job is to statically schedule math instructions within a warp. It can do so because it knows how many cycles each math operation will take and when the output of that operation will be available for input to the next math op. The dispatcher then has a bunch of ready warps to choose from each cycle based on hints from the compiler. Presumably, none of that changes with Ampere.

The only thing that changes is that now there are more opportunities for a ready FP32 instruction to be issued each clock by the dispatcher. With Turing those instructions could be blocked because the lone FP32 pipeline was busy.

Well, the compiler can construct different instructions for different GPU arch as well, and in practice the compiler can reorganize instructions to fit the pipeline better for various arch, and a good programmer can also take the most of the resource for the target arch by allocating resource, modifying read/write pattern and hide latency better for a particular GPU arch, and giving compiler many hints for optimization without the need of writing machine codes.

Benchmarking via OpenCL/CUDA is quite abit different than game benchmark, in the latter case, the programmer just deal with lots APIs and usually don't need to get involved with the underline computing resources, whilst the codes behind the APIs are already being fine tuned and optimized by the vendor.
 
And most importantly, price. I think NV feels the heat from AMD. They are advancing i think in the GPU department. I wouldn't mind another '9700 pro' moment, shaking up the pc gpu space abit more.

Edit: very good memories of the 9700pro, that thing teared through DX9 games like HL2 which was the game of the century. Held up very well in other titles too.

R300 was special and unfortunately for NVIDIA NV30 was different kind of special. While both companies had their share of misses, R300/NV30 and G80/R600 pairings happen... well, once a decade. Quite frankly I don’t think we will ever another perfect alignment of great and terrible in the same generation from the duopoly.
 
Here we go again ...

Ethereum Miners Eye NVIDIA’s RTX 30 Series GPU as RTX 3080 Offers 3-4x Better Performance in Eth


https://www.hardwaretimes.com/ether...x-3080-offers-3-4x-better-performance-in-eth/
shouldn't be worried for now because I think the gold fever towards bitcoin is so 2017 and things arent the same as they were. Even some gamers back then used to say: "I dont care about gamers" as soon as they saw a dime.

Some people started to mine again as of recently now that bitcoin has recovered slightly. I've built some mining rigs for others back in the day and mining is something that tires me, it's a boring topic. I'd rather invest on bitcoin and buy some than mining, but to each its own.
 
Perf/area maybe in this case. It’s interesting that Nvidia didn’t showcase specific workloads or games that benefit from the change. They had a slide or 3 during the Turing launch showcasing the speed up from the separate INT pipeline.

Strangely Nvidia didn’t talk about raw shader flops much at all. You would think the first consumer GPUS to break 20 and 30 Tflops would be a big deal from a marketing standpoint. That leads me to believe that even Nvidia doesn’t think the inflated numbers are worth talking about.
I think it's a matter of target audience. The general (gamer) public is a way larger group than the tech nerds. Look at LTT's success in terms of subscribers and YT-views. Those people care more about 4K gaming and are hyped by 8k thrown in the mix i guess.

Meanwhile, Nvidia did show something on their virtual techday sessions, that at least hint at the possibilities. I'll link the slide here:
https://www.hardwareluxx.de/images/...on-00143_81D3C28EF3204D7D87D13319AC4CDDD8.jpg

[In other news: I actually am aware, that it's also caches and memory as well as datapaths inside the SMs, that sizeably can affect those results.]
 
Perf/area maybe in this case.
I remember FP64 energy cost per ALU was very minor in comparison with data movement per Bill Dally's presentations. The same goes for ALU area cost.
Also, there are much more computationally dense tensor cores and there are no troubles with energy since they don't move more bits around in comparison with SIMD units.
Data paths were already there in Turing, so I guess it didn't cost anything area wise and energy wise to add these ALUs and it couples well with doubled L1 bandwidth and other changes.
 
Luxmark results are astonishing and as far as i remember it doesn't even support hw-accelerated RT.
Nope, that's purely OpenCL (and I don't think it has an extension for RT). Goes to to show, how much the combination of 2x FP32, larger and 2x faster L1 (IMO the main factors here) can yield when you're not rasterizing.
 
Power consumption matters, it just doesn't matter more than price. The other thing is people are only looking at the numbers theoretically right now, and they haven't actually put the gpu in their case and had to deal with the impacts on their cpu, or the noise of their fans.
 
https://www.bilibili.com/video/BV1m5411b7NG?from=search&seid=11312601183193646556

Benchmarks in the video.

The CPU usage with DLSS on is interesting though not surprising. The framerate and CPU usage are much higher with DLSS on. Maybe a hint as to why Nvidia gave reviewers tools to isolate GPU power.
I only skimmed over the video and did not notice, but did he in fact show the card once? A graphics card read out through the driver, I can driver-mod you in sub 2 minutes (+ install time).
 
Nvidia gets the 1.9X figure not from fps/W, but rather by looking at the amount of power required to achieve the same performance level as Turing. If you take a Turing GPU and limit performance to 60 fps in some unspecified game, and do the same with Ampere, Nvidia claims Ampere would use 47% less power.

That's not all that surprising. We've seen power limited GPU designs for a long time in laptops. The RTX 2080 laptops for example can theoretically clock nearly as high as the desktop parts, but they're restricted to a much lower power level, which means actual clocks and performance are lower. A 10% reduction in performance can often deliver a 30% gain in efficiency when you near the limits of a design.

AMD's R9 Nano was another example of how badly efficiency decreases at the limit of power and voltage. The R9 Fury X was a 275W TDP part with 4096 shaders clocked at 1050 MHz. R9 Nano took the same 4096 shaders but clocked them at a maximum of 1000 MHz, and applied a 175W TDP limit. Performance was usually closer to 925MHz in practice, but still at one third less power.
https://www.tomshardware.com/features/nvidia-ampere-architecture-deep-dive
 

I would be pretty hesitant to read anything into these Ashes of the Singularity results.
The MSI Gaming X Trio 2080 Ti has the same memory clocks versus the FE 2080Ti, and only ~7% higher boost clocks on the core.

... and somehow almost 15% higher FPS? I think people are comparing apples to oranges here, on different CPU / system memory configs.
 

"The set was spotted by videocardz who posted this first. They also mention that this channel where all this was posted has already been caught using fake review samples and publishing a review earlier (they hid the name of the Ryzen processor). This time leakers do not show any review sample, so we cannot confirm if they actually tested the card or have done a bit of 'guestimation'".
https://www.guru3d.com/news-story/first-alleged-benchmark-results-geforce-rtx-3080-surface.html
 
Last edited by a moderator:
Back
Top