Nvidia BigK GK110 Kepler Speculation Thread

Remi · Feb 20, 2013

elect said:
I am also interested in this, can anyone confirm it?

Ryan Smith have some answers here: http://www.anandtech.com/show/6760/nvidias-geforce-gtx-titan-part-1/4

Ryan Smith @ Anandtech said:
But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

xDxD · Feb 20, 2013

psolord said:
Guru3D reports 800€ + VAT. That means +23% in my country, so 984 euros. LOL!

7950 CFX at 500 euros (game bundle value deducted) here I come!

ohhh yes, and good luck with driver support.

UniversalTruth · Feb 20, 2013

xDxD said:
ohhh yes, and good luck with driver support.

You always have to sacrifice something. Either to follow the big corporation and their charge to take from you a kidney or something from your body, or to pay much less but wait better days software wise. I would prefer to stay with my whole body healthy.

Seriously, you think a marginally better driver support is worth of several hundreds of dollars?

Blazkowicz · Feb 20, 2013

keldor314 said:
Titan appears to be targeted very much toward the GPGPU crowd. Full speed dp support and dynamic parallelism especially are killer features. Think of it as more a consumer friendly (relatively) version of Tesla or Quadro, lacking just the various esoteric features that are needed for supercomputers and render farms.

Perhaps Nvidia is moving toward something like Tesla -> supercomputing, Quadro -> render farms, and Titan -> workstation?

Come to think of it, Mac Pro is due for an update. Hmm...

Is there evidence for this beside your own speculation? nvidia can always dial down artificially the double precision performance, as was done on GF100 and GF110. To me it's almost a given and I wonder if they will go as far as disabling dynamic parallelism - maybe not, it's a nice new defining features, perhaps all future Maxwell will have dynamic parallelism and HyperQ etc., consumer or not.

My opinion : geforce Titan has slowed down FP64, and disabled ECC support (as you imply). It's a gaming monster, and a FP32 computing monster (for e.g. offline rendering and other media tasks) but for the latter a GTX 690 or a pair of GTX 670/680 with 4GB could be usable too.
"Serious" users still need a Quadro or Tesla.

xDxD · Feb 20, 2013

UniversalTruth said:
You always have to sacrifice something. Either to follow the big corporation and their charge to take from you a kidney or something from your body, or to pay much less but wait better days software wise. I would prefer to stay with my whole body healthy.

Seriously, you think a marginally better driver support is worth of several hundreds of dollars?

A spot in my country said: "power is nothing without control". Perfect.

trinibwoy · Feb 20, 2013

Does anyone have hard info on Kepler's register file bandwidth? The maximum throughput I'm seeing on a 680 is 128 instr/clk per SM using gpubench. Same goes for these guys - http://hal.inria.fr/docs/00/78/99/58/PDF/112_Lai.pdf.

Is this another case of the "missing MUL" where it's hard to find any evidence of dual-issue actually taking place? Can the register file even support 192 instr/clk?

Alexko · Feb 20, 2013

NVIDIA claims about 93% of peak efficiency in LINPACK, so it must be doable somehow.

trinibwoy · Feb 20, 2013

Alexko said:
NVIDIA claims about 93% of peak efficiency in LINPACK, so it must be doable somehow.

Yeah on DGEMM

elect · Feb 20, 2013

trinibwoy said:
Does anyone have hard info on Kepler's register file bandwidth? The maximum throughput I'm seeing on a 680 is 128 instr/clk per SM using gpubench. Same goes for these guys - http://hal.inria.fr/docs/00/78/99/58/PDF/112_Lai.pdf.

Is this another case of the "missing MUL" where it's hard to find any evidence of dual-issue actually taking place? Can the register file even support 192 instr/clk?

Same SMX same bandwidth I guess

psolord · Feb 20, 2013

xDxD said:
ohhh yes, and good luck with driver support.

I am pretty conscious regarding dual gpu shortcomings as well its advantages. Been using dual gpu since the 4870X2.

I still use my old 5850 cfx system and actually have been using it alongside my 570 sli system for testing/benchmarkig purposes and it was performing quite fine since 2009. I gave the 13.2 beta 5 a thorough testing the other day, when I heard about the Titans price.

The driver operates admirably on most of the games I tested. I only saw stuttering on Rage, Anno 2070 which probably was due to the old cpu which could not keep a high framerate and some minor Battlefield 3 stuttering, which was surprising since I didn't notice it before. All in all, the gpu limited games I tested, worked pretty well with cfx.

Truth be told Nvidia was a bit better on the driver support all this time, but we are talking a score of 5-3 in favor of Nvidia.

Still, 1000 euros vs 500 euros for essentially the same performance (Gigabyte WF3s) is a no brainer. Not to mention the sweet games that come with the cfx bundle.

Xalion · Feb 20, 2013

Blazkowicz said:
Is there evidence for this beside your own speculation? nvidia can always dial down artificially the double precision performance, as was done on GF100 and GF110. To me it's almost a given and I wonder if they will go as far as disabling dynamic parallelism - maybe not, it's a nice new defining features, perhaps all future Maxwell will have dynamic parallelism and HyperQ etc., consumer or not.

My opinion : geforce Titan has slowed down FP64, and disabled ECC support (as you imply). It's a gaming monster, and a FP32 computing monster (for e.g. offline rendering and other media tasks) but for the latter a GTX 690 or a pair of GTX 670/680 with 4GB could be usable too.
"Serious" users still need a Quadro or Tesla.

There are some posts earlier in the thread that address this, but from what we have been told Titan has full FP64 and dynamic parallelism. It does not have ECC or some of the multi-card features (like MPI). The review at anandtech and Ryan's answer to the FP64 discrepency confirmed this.

From my point of view, this essentially limits the number of Titan cards that could be used for a given problem to 3 (the 3 you can have in SLI). In reality, it probably is best to not use more than 1. So you cannot use Titan cards to build a new super computer. However, it seems to have all single card features enabled - meaning if you have a problem that only needs a small amount of power and/or you want a machine that can be used for training the Titan is a really good fit.

flopper · Feb 20, 2013

xDxD said:
ohhh yes, and good luck with driver support.

I would hold AMD as the better one currently.
The changes they are doing behind the scenes indicate one thing, driver support is becoming a class A act.

Alexko · Feb 20, 2013

trinibwoy said:
Yeah on DGEMM

Whoops, good point.

trinibwoy · Feb 20, 2013

Alexko said:
Whoops, good point.

I'm pretty convinced that nVidia's SP flop numbers for Kepler are pure bullshit. A Fermi SM had sufficient register bandwidth for 64 MADs/clk. nVidia claims that a Kepler SMX has 2x the bandwidth so 128 MADs/clk makes sense. I have no idea where they're getting 192 from.

CarstenS · Feb 20, 2013

trinibwoy said:
Does anyone have hard info on Kepler's register file bandwidth? The maximum throughput I'm seeing on a 680 is 128 instr/clk per SM using gpubench. Same goes for these guys - http://hal.inria.fr/docs/00/78/99/58/PDF/112_Lai.pdf.

Is this another case of the "missing MUL" where it's hard to find any evidence of dual-issue actually taking place? Can the register file even support 192 instr/clk?

With scalar code and repeating patterns, I am seeing up to 69,x% efficiency in GK104, GF110 was at 99, GF114 at 68 (max.).

So, dual issue seems to be the same as seen with GK104.

Kyyla · Feb 20, 2013

ams said:
True, but on the flip side, Bugatti also doesn't have a full lineup of cars that are priced between 1/10 to 1/2 the price of the Veyron, and Bugatti also doesn't have any other car in it's lineup that has higher performance at the same price as the Veyron, so the comparison is a bit imperfect

Well actually bugatti is a VAG group brand, so it comes from the makers of Seat, Skoda and VW.

Grall · Feb 20, 2013

Also, Audi, which is a premium brand, as we all probably know. (And, I'm off-topic, which I apologise for.)

DuckThor Evil · Feb 20, 2013

Also Bentley, Lamborghini and Porsche

UniversalTruth · Feb 20, 2013

Design wise, eye-candy, I don't think anyone can beat Italian cars- Lamborghini and Ferrari.

Kaotik · Feb 20, 2013

UniversalTruth said:
Design wise, eye-candy, I don't think anyone can beat Italian cars- Lamborghini and Ferrari.

You mean Audi and Fiat?

Nvidia BigK GK110 Kepler Speculation Thread

Remi

xDxD

UniversalTruth

Blazkowicz

xDxD

trinibwoy

Meh

Alexko

trinibwoy

Meh

elect

psolord

Xalion

flopper

Alexko

trinibwoy

Meh

CarstenS

Moderator

Kyyla

Grall

Invisible Member

DuckThor Evil

UniversalTruth

Kaotik

Drunk Member

Similar threads