Nvidia Volta Speculation Thread

Frenetic Pony · Nov 1, 2017

Infinisearch said:
I just find it interesting and encouraging that Nvidia can get more perf/GFLOP with newer architectures. I am also impressed that they have upped their clocks with both maxwell and pascal. With the quoted example whats the %difference for stream processors? Less stream processors less die size, like you said if only AMD could catch up with clocks... But what I'm really trying to get at is the GFLOPs is directly related to clock speed. So are the Vega64 and the 1080ti comparable in GFLOP's? If so are they comparable in performance?

Gigaflops is just a measure of Linpack. So performance of other things per gigaflop means the architecture is more optimized towards that "other" thing in comparison to the generic linpack benchmark. Games tend to change a lot, so a good optimization now won't necessarily translate to good optimization down the road. Similarly we don't know how much AI might change over time, though perhaps massive vector performance and half precision floats and wotnot will remain around better.

Either way Nvidia seems to be on it for the moment, at least for those 2 things. Cryptocurrency is another matter, but then whether anyone wants to optimize for that volatile market is another question.

Infinisearch · Nov 1, 2017

Frenetic Pony said:
Gigaflops is just a measure of Linpack. So performance of other things per gigaflop means the architecture is more optimized towards that "other" thing in comparison to the generic linpack benchmark. Games tend to change a lot, so a good optimization now won't necessarily translate to good optimization down the road. Similarly we don't know how much AI might change over time, though perhaps massive vector performance and half precision floats and wotnot will remain around better.

Theoretical Teraflops/Gigaflops is FLOPs = clockspeed * processors * 2
That is the number quoted by websites and IHV's. Perf/Gflop in different applications is a good measure of compute and other functionality and inherently takes into account how something was programmed. As far as AI goes are we talking about GV100 with the tensor cores or one of the consumer chips?

gamervivek · Nov 2, 2017

Infinisearch said:
I just find it interesting and encouraging that Nvidia can get more perf/GFLOP with newer architectures. I am also impressed that they have upped their clocks with both maxwell and pascal. With the quoted example whats the %difference for stream processors? Less stream processors less die size, like you said if only AMD could catch up with clocks... But what I'm really trying to get at is the GFLOPs is directly related to clock speed. So are the Vega64 and the 1080ti comparable in GFLOP's? If so are they comparable in performance?

AMD might have more stream processors but there is more to a gpu than stream processors. And nvidia has a bigger advantage there than AMD's GFLOPs advantage on paper, which as you can see with real-world boost clocks, is illusory. So when you're just looking at GFLOPs, theoretical one at that, and performance you're losing the bigger picture.

A well optimized gamed for AMD that is more shader-centric can allow AMD to be competitive on the GFLOPs/fps metric, and if you have clockspeeds to go along with it, you can come close to 1080Ti and even beat it,

Infinisearch · Nov 2, 2017

gamervivek said:
A well optimized gamed for AMD that is more shader-centric can allow AMD to be competitive on the GFLOPs/fps metric, and if you have clockspeeds to go along with it, you can come close to 1080Ti and even beat it,

It is an interesting comparison but IIRC wolfenstein 2 uses async, intrinsics and fp16. Are intrinsics available yet for nvidia? Anyway this is getting a bit OT for a volta thread but I just end with saying something like this is the exception and not the norm... at least for now and in a way aren't comparable since the flop rate for fp16 is double for the part of the rendering that it is used.

DavidGraham · Nov 2, 2017

gamervivek said:
A well optimized gamed for AMD that is more shader-centric can allow AMD to be competitive on the GFLOPs/fps metric, and if you have clockspeeds to go along with it, you can come close to 1080Ti and even beat it,

This comparison is not valid though, The NVIDIA footage is not his own, it's from another channel, so different systems, different CPU and RAM speed, different Windows, and the Vega 64 LC GPU is OC'ed as well. While the 1080Ti is not.

seahawk · Nov 2, 2017

If you test Vega and Pascal at fixed 1300Mhz, Vega shows what it can do and eats the Geforces for Lunch in Wolfenstein II. One of the few game engines to be designed for modern GPUs.

MDolenc · Nov 2, 2017

So? Why not test Fury, Vega and Pascal at fixed 1000MHz then?

Infinisearch · Nov 2, 2017

seahawk said:
One of the few game engines to be designed for modern GPUs.

I know this is kinda OT but I can't help but ask can you please justify this statement please.

seahawk · Nov 2, 2017

MDolenc said:
So? Why not test Fury, Vega and Pascal at fixed 1000MHz then?

That would be equally interesting.

Infinisearch said:
I know this is kinda OT but I can't help but ask can you please justify this statement please.

Well, it uses Vulkan, makes use of FP16 and lots of async - lots of new API stuff in real use there.

CSI PC · Nov 2, 2017

seahawk said:
If you test Vega and Pascal at fixed 1300Mhz, Vega shows what it can do and eats the Geforces for Lunch in Wolfenstein II. One of the few game engines to be designed for modern GPUs.

So you then are going below the base clock of the Pascal GPU, meaning crippling one design over the other where 1300MHz is the sweet spot for Vega according to some publications analysis.
Unfotunately one cannot tell much from clock comparison when the architectures are so widely different and also have widely different performance envelopes when it comes to voltage/clocks/power demand/performance.
As an example the Tesla P40 is below the base clock and enables a GP104 to have 5.5 TFLOPs FP32 at 75W....
So comparing Vega to Pascal at fixed 1300MHz is seriously skewed without taking all variables into account.
As reference setting 1300MHz on a reference consumer 1080 model in a demanding game with most demanding setting the power consumption was only 100W isolated and measured with a scope.
How much performance would one get from Vega if set to 100W for same game and settings, quite a lot less than what one sees at 1300MHz; point being it is very difficult to compare using fixed matching clock rates due to very different performance envelope and architectures for each manufacturers' GPU design.

Sort of reminds me of the situation when some others tried to do a clock/clock comparison between Maxwell and Pascal and used a 980ti to a 1080 that is not same core spec design and also ignoring all of the performance envelope that has context; unfortunately some youtube commentators still compare a 980ti to a 1080 and this is just wrong when done from a technical context (different number of cores but importantly different number GPC/Polymorph engine/etc) let along ignoring the voltage/clocks/power/performance envelope for context.

Geeforcer · Nov 2, 2017

seahawk said:
If you test Vega and Pascal at fixed 1300Mhz, Vega shows what it can do and eats the Geforces for Lunch in Wolfenstein II. One of the few game engines to be designed for modern GPUs.

If you test Vega and Pascal fixed at 2560 SP, Pascal shows what it can do and eats Radeons for Lunch in Wolfenstein II. One of the few game engines to be designed for modern GPUs.

gamervivek · Nov 2, 2017

Infinisearch said:
It is an interesting comparison but IIRC wolfenstein 2 uses async, intrinsics and fp16. Are intrinsics available yet for nvidia? Anyway this is getting a bit OT for a volta thread but I just end with saying something like this is the exception and not the norm... at least for now and in a way aren't comparable since the flop rate for fp16 is double for the part of the rendering that it is used.

I didn't mean to put it as any kind of rule or so, even I'm saying it's an exception. Just looking at GFLOPs isn't useful for predicting gaming performance when there are quite a few other things that dictate gaming performance as well.

https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/18

So perf/GLFOPs isn't a useful metric in isolation.

DavidGraham said:
This comparison is not valid though, The NVIDIA footage is not his own, it's from another channel, so different systems, different CPU and RAM speed, different Windows, and the Vega 64 LC GPU is OC'ed as well. While the 1080Ti is not.

The comparison is valid in that I'm making a point about clockspeeds between AMD and nvidia, so your point about Vega 64LC being OC'ed doesn't matter here. I'd suggest to follow the conversation.
The systems might be different but I don't think you've that kind of variability in system performance with same CPU, 7700k at 4.5Ghz which you can clearly see in the video. Different windows, really?

Even if you disregard that video, the original video comparison between Vega64, 1080 and 1080Ti is good enough, 1.4-1.5Ghz Vega64 keeping close to 1860Mhz 1080Ti, it'll easily blaze past the latter if it were running at the same clocks. In the other video of AC Origin, it'd be within 10% of 1080Ti in a game that favors nvidia.

Infinisearch · Nov 2, 2017

gamervivek said:
I didn't mean to put it as any kind of rule or so, even I'm saying it's an exception. Just looking at GFLOPs isn't useful for predicting gaming performance when there are quite a few other things that dictate gaming performance as well.

https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/18

So perf/GLFOPs isn't a useful metric in isolation.

I didn't say in predicting gaming performance, although it is useful in that sometimes. Its also for potentially discussing shader unit efficency if you can match other things like ROP's or texture units. Even if you can't exactly match it it's still useful since shaders are becoming dominant in regards to performance. You're right there are all sort of things that can affect gaming performance but if you're comparing architectures there are some comparisons that don't make sense. How can you compare Nvidia FP32 Perf/TFLOP vs AMD mixed FP16/FP32 Perf/TFLOP and call it a fair comparison for comparing shader efficiency?

DavidGraham · Nov 2, 2017

gamervivek said:
The systems might be different but I don't think you've that kind of variability in system performance with same CPU, 7700k at 4.5Ghz which you can clearly see in the video. Different windows, really?

Yep, Different RAM speed also. It can affect result.

I also suggest that you rely less on 10 seconds snap shot YouTube videos, to prove a point. Sites usually test 30 seconds or more runthroughs to include a variety of scenes and scenarios within a single area, they also will repeat them several times to exclude any variabilities, I see none of that here, and to add insult to injury you are using two different systems where one is having an OC'ed CPU, RAM and GPU. The 1080Ti is also running without Async Compute because the developers disabled it temporarily.

seahawk · Nov 3, 2017

Or because NV cards still can not handle it?

DavidGraham · Nov 3, 2017

seahawk said:
Or because NV cards still can not handle it?

They can (about 3% gain from it), the problem is limited to 1080Ti only, the rest of Pascal lineup is fine.
https://www.computerbase.de/2017-10/wolfenstein-2-new-colossus-benchmark/

Gelanin · Nov 3, 2017

Personally i think it could be very interesting to test/know the results of the following:

Comparing cards at say 1000Mhz, 1250Mhz, 1500Mhz, 1750Mhz, etc
Comparing cards at say 100W, 150W, 200W, 250W, etc

Knowing the results at those test could tell alot about the different architectures and their efficiency

CSI PC · Nov 3, 2017

Gelanin said:
Personally i think it could be very interesting to test/know the results of the following:

Comparing cards at say 1000Mhz, 1250Mhz, 1500Mhz, 1750Mhz, etc
Comparing cards at say 100W, 150W, 200W, 250W, etc

Knowing the results at those test could tell alot about the different architectures and their efficiency

Worth looking at Tom's Hardware reviews when a new model is launched as they tend to do this for the reference cards or what initially launches as that model, ironically though they have not for the 1070ti

But you will see the envelope mapped for Vega/1070/1080/Maxwell, just will need to correlate to either fps or voltage charts but the good news is that they are consistent in using Witcher 3 at 4k as for them it is one of the more power demanding games.

ieldra · Nov 3, 2017

Gelanin said:
Personally i think it could be very interesting to test/know the results of the following:

Comparing cards at say 1000Mhz, 1250Mhz, 1500Mhz, 1750Mhz, etc
Comparing cards at say 100W, 150W, 200W, 250W, etc

Knowing the results at those test could tell alot about the different architectures and their efficiency

Don't quite understand the point of comparing at clock parity, if you want to glean the relative "IPC" (or shader efficiency rather) of the designs you'd be better off comparing the performance at a set compute throughput (fixed clocks). At the end of the day if you impose clocks across the board what you'll end up comparing is the size of the shader array + shader efficiency fused together.

Would make more sense to ,say, fix Vega for 13 tflops, same with 1080Ti. Compare.

entity279 · Nov 3, 2017

But then you'd want to achieve parity at other metrics such as fillrate & texturing and whatnot

Nvidia Volta Speculation Thread

Frenetic Pony

Infinisearch

gamervivek

Infinisearch

DavidGraham

seahawk

MDolenc

Infinisearch

seahawk

CSI PC

Geeforcer

Harmlessly Evil

gamervivek

Infinisearch

DavidGraham

seahawk

DavidGraham

Gelanin

CSI PC

ieldra

entity279

Similar threads