The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.
For those, confused by the advertised DP rate.Anand said:This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former assumes that boost is enabled, while the latter is calculated around GPU Boost being disabled. The actual execution rate is still 1/3.
For those, confused by the advertised DP rate.
And yet, Anand is wrong.
896*2*837 != 1.3 TFLOPS but rather 1.5 TFLOPS.
1.3 would indicate a clock-rate around 725 MHz for the DP-Units.
The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times.
The $999 asking price appears to be prohibitive to the gamer but can also be construed as a Tesla K20X on the (relative) cheap. This fact is further substantiated by the knowledge that, just like Tesla K20X, TITAN can run double-precision compute at 1/3rd of single-precision speeds, leading to over 1TFLOPS DP throughput. However, being a gamer's card at heart, TITAN's DP rate is set to 1/24th of SP, just like GTX 680, as no games use double-precision calculations. The full 1/3rd ratio can be set via the control panel, yet doing so forces the GPU's clocks down.
NVIDIA hasn’t gone into depth for launch quantities, but they did specifically shoot down the 10,000 card rumor; this won’t be a limited run product and we don’t have any reason at this time to believe this will be much different from the GTX 690’s launch (tight at first, but available and increasingly plentiful).
And yet, Anand is wrong.
896*2*837 != 1.3 TFLOPS but rather 1.5 TFLOPS.
1.3 would indicate a clock-rate around 725 MHz for the DP-Units.
So, virtualization and dynamic parallelism are off the table, I guess.
As you’ll soon see, the combination generally falls between a GeForce GTX 690 and Radeon HD 7970 GHz Edition in our benchmarks.
The DP units don't have any special own clockrealm, the whole chip is limited to that 837MHz max when DP is enabled
I have a request in to NVIDIA for clarification on that. 1.3 TFLOPS shows up multiple times, but so far I'm always hitting 837MHz when in full speed DP mode.Indeed. Either that or the 1.3 TFLOPS figure in the slides is wrong.
I didn't ask about virtualization, but it wasn't mentioned as being removed. Dynamic parallelism was also not on the list of disabled features.So, virtualization and dynamic parallelism are off the table, I guess.
I have a request in to NVIDIA for clarification on that. 1.3 TFLOPS shows up multiple times, but so far I'm always hitting 837MHz when in full speed DP mode.
if you are on a budget, 2 x 680s are the way to go especially if one is a gamer. If not on a budget, then knock yourself out with 2 x titans.