Nvidia Ampere Discussion [2020-05-14]

PSman1700 · Sep 8, 2020

trinibwoy said:
That leads me to believe that even Nvidia doesn’t think the inflated numbers are worth talking about.

It leads me to believe they are not a false reflection of their performance either. They instead let the benchmarks speak, it's that what impressed people. 80 to % 100% increase in raw raster performance in multiple modern famous games this early on. 200% increase in ray tracing? And then the DLSS going berserk mode.
All that with prices that seem humane again. Let's not forget all the other features they added on their way.
The GPU design seems to be very good too, with the airflow and build quality on top of that.
I'm sure those 20-36TF's can come in handy with many UE5 games.

People expected 30% increase before that day.

trinibwoy · Sep 8, 2020

PSman1700 said:
It leads me to believe they are not a false reflection of their performance either. They instead let the benchmarks speak, it's that what impressed people. 80 to % 100% increase in raw raster performance in multiple modern famous games this early on. 200% increase in ray tracing? And then the DLSS going berserk mode.
All that with prices that seem humane again. Let's not forget all the other features they added on their way.
The GPU design seems to be very good too, with the airflow and build quality on top of that.
I'm sure those 20-36TF's can come in handy with many UE5 games.

People expected 30% increase before that day.

No doubt, it's the end result that matters.

PSman1700 · Sep 8, 2020

trinibwoy said:
No doubt, it's the end result that matters.

And most importantly, price. I think NV feels the heat from AMD. They are advancing i think in the GPU department. I wouldn't mind another '9700 pro' moment, shaking up the pc gpu space abit more.

Edit: very good memories of the 9700pro, that thing teared through DX9 games like HL2 which was the game of the century. Held up very well in other titles too.

LiXiangyang · Sep 8, 2020

trinibwoy said:
Yeah, the compiler knows for sure but I'm not sure it matters. The compiler's job is to statically schedule math instructions within a warp. It can do so because it knows how many cycles each math operation will take and when the output of that operation will be available for input to the next math op. The dispatcher then has a bunch of ready warps to choose from each cycle based on hints from the compiler. Presumably, none of that changes with Ampere.

The only thing that changes is that now there are more opportunities for a ready FP32 instruction to be issued each clock by the dispatcher. With Turing those instructions could be blocked because the lone FP32 pipeline was busy.

Well, the compiler can construct different instructions for different GPU arch as well, and in practice the compiler can reorganize instructions to fit the pipeline better for various arch, and a good programmer can also take the most of the resource for the target arch by allocating resource, modifying read/write pattern and hide latency better for a particular GPU arch, and giving compiler many hints for optimization without the need of writing machine codes.

Benchmarking via OpenCL/CUDA is quite abit different than game benchmark, in the latter case, the programmer just deal with lots APIs and usually don't need to get involved with the underline computing resources, whilst the codes behind the APIs are already being fine tuned and optimized by the vendor.

Geeforcer · Sep 8, 2020

PSman1700 said:
And most importantly, price. I think NV feels the heat from AMD. They are advancing i think in the GPU department. I wouldn't mind another '9700 pro' moment, shaking up the pc gpu space abit more.

Edit: very good memories of the 9700pro, that thing teared through DX9 games like HL2 which was the game of the century. Held up very well in other titles too.

R300 was special and unfortunately for NVIDIA NV30 was different kind of special. While both companies had their share of misses, R300/NV30 and G80/R600 pairings happen... well, once a decade. Quite frankly I don’t think we will ever another perfect alignment of great and terrible in the same generation from the duopoly.

Cyan · Sep 9, 2020

Rootax said:
Same link 8 posts above ; )

BRiT said:
Literally posted 8 posts before yours, but with helpful context from the linked story, https://forum.beyond3d.com/posts/2152883/

sry, you are right. Posted the news in a hurry before reading the rest of the posts.

Cyan · Sep 9, 2020

pharma said:
Here we go again ...

Ethereum Miners Eye NVIDIA’s RTX 30 Series GPU as RTX 3080 Offers 3-4x Better Performance in Eth

https://www.hardwaretimes.com/ether...x-3080-offers-3-4x-better-performance-in-eth/

shouldn't be worried for now because I think the gold fever towards bitcoin is so 2017 and things arent the same as they were. Even some gamers back then used to say: "I dont care about gamers" as soon as they saw a dime.

Some people started to mine again as of recently now that bitcoin has recovered slightly. I've built some mining rigs for others back in the day and mining is something that tires me, it's a boring topic. I'd rather invest on bitcoin and buy some than mining, but to each its own.

CarstenS · Sep 9, 2020

trinibwoy said:
Perf/area maybe in this case. It’s interesting that Nvidia didn’t showcase specific workloads or games that benefit from the change. They had a slide or 3 during the Turing launch showcasing the speed up from the separate INT pipeline.

Strangely Nvidia didn’t talk about raw shader flops much at all. You would think the first consumer GPUS to break 20 and 30 Tflops would be a big deal from a marketing standpoint. That leads me to believe that even Nvidia doesn’t think the inflated numbers are worth talking about.

I think it's a matter of target audience. The general (gamer) public is a way larger group than the tech nerds. Look at LTT's success in terms of subscribers and YT-views. Those people care more about 4K gaming and are hyped by 8k thrown in the mix i guess.

Meanwhile, Nvidia did show something on their virtual techday sessions, that at least hint at the possibilities. I'll link the slide here:
https://www.hardwareluxx.de/images/...on-00143_81D3C28EF3204D7D87D13319AC4CDDD8.jpg

[In other news: I actually am aware, that it's also caches and memory as well as datapaths inside the SMs, that sizeably can affect those results.]

OlegSH · Sep 9, 2020

Luxmark results are astonishing and as far as i remember it doesn't even support hw-accelerated RT.

OlegSH · Sep 9, 2020

trinibwoy said:
Perf/area maybe in this case.

I remember FP64 energy cost per ALU was very minor in comparison with data movement per Bill Dally's presentations. The same goes for ALU area cost.
Also, there are much more computationally dense tensor cores and there are no troubles with energy since they don't move more bits around in comparison with SIMD units.
Data paths were already there in Turing, so I guess it didn't cost anything area wise and energy wise to add these ALUs and it couples well with doubled L1 bandwidth and other changes.

CarstenS · Sep 9, 2020

OlegSH said:
Luxmark results are astonishing and as far as i remember it doesn't even support hw-accelerated RT.

Nope, that's purely OpenCL (and I don't think it has an extension for RT). Goes to to show, how much the combination of 2x FP32, larger and 2x faster L1 (IMO the main factors here) can yield when you're not rasterizing.

Scott_Arm · Sep 9, 2020

Power consumption matters, it just doesn't matter more than price. The other thing is people are only looking at the numbers theoretically right now, and they haven't actually put the gpu in their case and had to deal with the impacts on their cpu, or the noise of their fans.

trinibwoy · Sep 9, 2020

https://www.bilibili.com/video/BV1m5411b7NG?from=search&seid=11312601183193646556

Benchmarks in the video.

The CPU usage with DLSS on is interesting though not surprising. The framerate and CPU usage are much higher with DLSS on. Maybe a hint as to why Nvidia gave reviewers tools to isolate GPU power.

DavidGraham · Sep 9, 2020

trinibwoy said:
The CPU usage with DLSS on is interesting though not surprising. The framerate and CPU usage are much higher with DLSS on. Maybe a hint as to why Nvidia gave reviewers tools to isolate GPU power.

Naturally, once you increase fps, CPU utilization increases accordingly.

CarstenS · Sep 9, 2020

trinibwoy said:
https://www.bilibili.com/video/BV1m5411b7NG?from=search&seid=11312601183193646556

Benchmarks in the video.

The CPU usage with DLSS on is interesting though not surprising. The framerate and CPU usage are much higher with DLSS on. Maybe a hint as to why Nvidia gave reviewers tools to isolate GPU power.

I only skimmed over the video and did not notice, but did he in fact show the card once? A graphics card read out through the driver, I can driver-mod you in sub 2 minutes (+ install time).

pharma · Sep 9, 2020

Nvidia gets the 1.9X figure not from fps/W, but rather by looking at the amount of power required to achieve the same performance level as Turing. If you take a Turing GPU and limit performance to 60 fps in some unspecified game, and do the same with Ampere, Nvidia claims Ampere would use 47% less power.

That's not all that surprising. We've seen power limited GPU designs for a long time in laptops. The RTX 2080 laptops for example can theoretically clock nearly as high as the desktop parts, but they're restricted to a much lower power level, which means actual clocks and performance are lower. A 10% reduction in performance can often deliver a 30% gain in efficiency when you near the limits of a design.

AMD's R9 Nano was another example of how badly efficiency decreases at the limit of power and voltage. The R9 Fury X was a 275W TDP part with 4096 shaders clocked at 1050 MHz. R9 Nano took the same 4096 shaders but clocked them at a maximum of 1000 MHz, and applied a 175W TDP limit. Performance was usually closer to 925MHz in practice, but still at one third less power.

https://www.tomshardware.com/features/nvidia-ampere-architecture-deep-dive

trinibwoy · Sep 9, 2020

CarstenS said:
I only skimmed over the video and did not notice, but did he in fact show the card once? A graphics card read out through the driver, I can driver-mod you in sub 2 minutes (+ install time).

Don't think so.

T2098 · Sep 9, 2020

Man from Atlantis said:
Ashes of the Singularity

https://twitter.com/i/web/status/1303401923991801856

I would be pretty hesitant to read anything into these Ashes of the Singularity results.
The MSI Gaming X Trio 2080 Ti has the same memory clocks versus the FE 2080Ti, and only ~7% higher boost clocks on the core.

... and somehow almost 15% higher FPS? I think people are comparing apples to oranges here, on different CPU / system memory configs.

pharma · Sep 9, 2020

https://twitter.com/i/web/status/1303691200701042690

"The set was spotted by videocardz who posted this first. They also mention that this channel where all this was posted has already been caught using fake review samples and publishing a review earlier (they hid the name of the Ryzen processor). This time leakers do not show any review sample, so we cannot confirm if they actually tested the card or have done a bit of 'guestimation'".
https://www.guru3d.com/news-story/first-alleged-benchmark-results-geforce-rtx-3080-surface.html

Cyan · Sep 9, 2020

pharma said:
Here we go again ...

Ethereum Miners Eye NVIDIA’s RTX 30 Series GPU as RTX 3080 Offers 3-4x Better Performance in Eth

https://www.hardwaretimes.com/ether...x-3080-offers-3-4x-better-performance-in-eth/

just came across this news and well, I wouldn't invest on 30XX series to mine

https://cointelegraph.com/news/eth-miners-will-have-little-choice-once-ethereum-20-launches-with-pos

Nvidia Ampere Discussion [2020-05-14]

PSman1700

trinibwoy

Meh

PSman1700

LiXiangyang

Geeforcer

Harmlessly Evil

Cyan

orange

Cyan

orange

CarstenS

Moderator

OlegSH

OlegSH

CarstenS

Moderator

Scott_Arm

trinibwoy

Meh

DavidGraham

CarstenS

Moderator

pharma

trinibwoy

Meh

T2098

pharma

Cyan

orange

Similar threads