Nvidia Volta Speculation Thread

Don't quite understand the point of comparing at clock parity, if you want to glean the relative "IPC" (or shader efficiency rather) of the designs you'd be better off comparing the performance at a set compute throughput (fixed clocks). At the end of the day if you impose clocks across the board what you'll end up comparing is the size of the shader array + shader efficiency fused together.

Would make more sense to ,say, fix Vega for 13 tflops, same with 1080Ti. Compare.
Clock parity has meaning if one has the other performance envelope data to give it context; specifically power demand/voltage/performance.
But makes more sense if shown as an envelope over a broad performance/frequency/power demand range, which Tom's Hardware does.
 
But then you'd want to achieve parity at other metrics such as fillrate & texturing and whatnot
Certainly worth considering, memory bandwidth as well at that point. The point is they all increase proportionally unless there is clock gating on the die, except for bw naturally. Eh, it's interesting looking at clock parity if the shader array is of similar size/throughput. Vega vs Fiji comes to mind. Past that I'm uncertain why clock parity would be a convenient way to look at things. Take pascal vs maxwell for instance, would you push maxwell well beyond it's efficient clock/voltage range or push pascal well into diminishing returns territory in terms of efficiency ? It's still going to skew your results in terms of perf/w etc. I guess if you underclock a card far enough you can remove any semblance of memory bandwidth limitations so that's a bonus.
 
Theoretical Teraflops/Gigaflops is FLOPs = clockspeed * processors * 2
That is the number quoted by websites and IHV's. Perf/Gflop in different applications is a good measure of compute and other functionality and inherently takes into account how something was programmed. As far as AI goes are we talking about GV100 with the tensor cores or one of the consumer chips?

Huh, I'd heard from plenty it was just "run linpack". Besides, the formula make absolutely no sense and would in no way take IPC into account. I mean what makes a "processor" anyway? A software thread, a hardware thread? Gigaflops is used for CPUs as well. The top 500 supercomputers list is Linpack, so I'm not seeing it; or if it's true it's absurdly useless instead of only somewhat useless. Besides, Linpack gives you a "(G)flops" number after its run. It's a benchmark of floating point operations per second, theoretical input is useless compared to the most basic emperical test, which is what linpack is.
 
Last edited:
Huh, I'd heard from plenty it was just "run linpack".
Nope theoretical FLOPs for GPU's is the equation I posted. If you don't believe me multiply Vega 64's peak clock * 4096 * 2 and you'll get the quoted 13.7TFLOP's.
If you still don't believe me https://www.google.com/search?clien...graphic+card&sourceid=opera&ie=UTF-8&oe=UTF-8
edit - that google "how to compute gflops of a graphic card"
Besides, the formula make absolutely no sense and would in no way take IPC into account.
Actually it sort of does... that what the times two is for (IIRC multiply and add). If you want IPC like cpu IPC I suppose the best metric would be Perf/TFLOP in a compute only workload.
I mean what makes a "processor" anyway?
IIRC Nvidia calls them CUDA cores and AMD calls them streaming processors...
Gigaflops is used for CPUs as well. The top 500 supercomputers list is Linpack, so I'm not seeing it;
Yeah it never made much sense to me at first but I got used to it after a while, especially after I started paying attention to console specs.
or if it's true it's absurdly useless instead of only somewhat useless. Besides, Linpack gives you a "(G)flops" number after its run. It's a benchmark of floating point operations per second, theoretical input is useless compared to the most basic emperical test, which is what linpack is.
I guess that's why it's "theoretical peak FLOP's", you'll only achieve/approach it in a micro-benchmark created to do just that. Thats why I like the Perf/GFLOP metric for GPU's especially for compute workloads. But even for games benchmarks it is still useful.
 
Last edited:
Search for "FlopsCL" and "FlopsCUDA" from "Kamil Rocki", they measure the raw performance values for GPUs, in GFLOP/s, pretty accurately.
 
CUTLASS: Fast Linear Algebra in CUDA C++
Our CUTLASS primitives include extensive support for mixed-precision computations, providing specialized data-movement and multiply-accumulate abstractions for handling 8-bit integer, half-precision floating point (FP16), single-precision floating point (FP32), and double-precision floating point (FP64) types. One of the most exciting features of CUTLASS is an implementation of matrix multiplication that runs on the new Tensor Cores in the Volta architecture using the WMMA API. Tesla V100’s Tensor Cores are programmable matrix-multiply-and-accumulate units that can deliver up to 125 Tensor TFLOP/s with high efficiency.
https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda/#more-8708
 
The most impressie part is how leak-proof Nvidia has been as of late. Nothing-nothing-Bam, brand new card, in stock.
 
The most impressie part is how leak-proof Nvidia has been as of late. Nothing-nothing-Bam, brand new card, in stock.
Acktually it was leaked ages ago (there was a photo of a card with very similar looking golden shroud from an nVidia intern).
So about as leak-proof as AMD (looks at Threadripper).
 
Acktually it was leaked ages ago (there was a photo of a card with very similar looking golden shroud from an nVidia intern).
So about as leak-proof as AMD (looks at Threadripper).

A single picture with no specks or anything else is hardly a leak: no one knew what the heck that thing really was. Heck, Titan Xp (which also came out of nowhere) was released since then. I, for one, had no idea what this new card is or when it was coming until AFTER it went on sale. That’s completely unlike Threadripper, IMO, which was well know long before the official announcement.
 
A single picture with no specks or anything else is hardly a leak: no one knew what the heck that thing really was.
Everyone inferred a new Titan.
It was the new Titan.
The end result is basically the same.
At least you can now buy somewhat reasonably priced V100 if you need one.
That’s completely unlike Threadripper, IMO, which was well know long before the official announcement.
TR leaks appeared in April, roghly month before FAD announcement.
It didn't exist on the roadmaps, and what makes it even sillier was AMD denying their future entrance into HEDT market back on Ryzen launch day.
 
Last edited:
Ah well not the Volta Titan I was looking to buy in December lol.
Kinda blows for general consumers but not bad if wanting to play around with the Tensor cores or DL or want FP64, half the price of the previous gen Quadro GP100 that has less functionality-performance so I wonder how Nvidia will manage this to ensure sales are not too cannibalised by this Titan.
They improved the drivers for Geforce to be better with professional visual applications, albeit obviously this Titan in that situation would not be using the Tensor cores.
That aside it will be a strong card for universities and various other labs.
 
Last edited:
  • Like
Reactions: HKS
NVIDIA out of the blue, just launched their Titan V (Volta GV100 based GPU) with 12GB HBM 2 memory! clocked at 1200MHz base and 1450MHz boost. It also comes with Tensor cores.

https://nvidianews.nvidia.com/news/nvidia-titan-v-transforms-the-pc-into-ai-supercomputer

So you're telling me Nvidia is charging $3000 for the new "rose gold" color?

Ridiculous.

tv2_678x452.jpg


tv6.jpg


Seriously though, we all knew there would be a new Titan for Q1 2018, but a $3000 G@100 Titan is a surprise. I guess we know the cost of 1/2 rate DP and full tensor now, eh?
 
Back
Top