Nvidia BigK GK110 Kepler Speculation Thread

trinibwoy · Oct 29, 2013

Jawed said:
OK, my mind is blown. This would appear to imply that Titan is at about the same SGEMM performance as HD5870 (nearly 2 TFLOPS).

DGEMM should be fine though (should be 90%+ of theoretical).

Hmmmm, actually Titan throws a wrench into my theory. It gets ~3.2 TFLOPS in CUBLAS SGEMM. That's ~72% of peak, similar to K20. Maybe there's more to GK110 than nVidia is letting on but I can't find anything on observed peak instruction throughput.

http://on-demand.gputechconf.com/gtc-express/2012/presentations/inside-tesla-kepler-k20-family.pdf

http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3

RecessionCone · Oct 29, 2013

trinibwoy said:
Hmmmm, actually Titan throws a wrench into my theory. It gets ~3.2 TFLOPS in CUBLAS SGEMM. That's ~72% of peak, similar to K20. Maybe there's more to GK110 than nVidia is letting on but I can't find anything on observed peak instruction throughput.

GK110 has a different instruction encoding that allows it to address 255 registers, while GK104 can only address 63. Although the aggregate register file space and the overall SM architecture is the same for both, SGEMM can get better performance by using more registers. The extra flops you mentioned are real, they're just near impossible to access - they can only be used in limited circumstances in carefully scheduled instruction sequences.

Jawed · Oct 29, 2013

trinibwoy said:
Hmmmm, actually Titan throws a wrench into my theory.http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3

Ah, that's much better. I was surprised by the very small per work-item register allocation in that slide deck. SGEMM thrives on registers.

Tridam · Oct 29, 2013

trinibwoy said:
Sorry for the self-quote but this is still a mystery to me. It's pretty clear at this point that a Kepler SMX can only issue 128 ALU instr/clk peak, regardless of what NVidia claims.

Does anyone have any idea why they would have 6 SIMDs per SMX when only 4 warps can be issued to them? Do they have some sort of CPU like setup where instruction issue ports are shared by the execution units and having more SIMDs helps to resolve port conflicts?

I'm sure they serve some purpose - just wish I knew what it was!

ftp://ftp.lal.in2p3.fr/pub/PetaQCD/talks-ANR-Final/Review_Junjie_LAI.pdf

The compiler and the GPU need to extract at least 50% of "vec2" ops to reach the full rate. Kepler SIMD units behavior is not strictly scalar as it was with previous NV GPUs (except GF114).
FMUL R0.x, R0.x, R1.y -> ~66% of max throughput
FMUL R0.xy, R0.xy, R1.xy -> 100% of max throughput achievable

trinibwoy · Oct 29, 2013

Tridam said:
The compiler and the GPU need to extract at least 50% of "vec2" ops to reach the full rate. Kepler SIMD units behavior is not strictly scalar as it was with previous NV GPUs (except GF114).
FMUL R0.x, R0.x, R1.y -> ~66% of max throughput
FMUL R0.xy, R0.xy, R1.xy -> 100% of max throughput achievable

Right, the problem is that I can't find any evidence of instruction issue rates anywhere close to 100%, even in code with lots of co-issue opportunities. This was trivial on Fermi and pretty much all AMD architectures (VLIW and GCN).

I'm betting that RecessionCone is right about it being nearly impossible to realize peak flops on Kepler. The limiting factor is something before the ALUs - the reg file or scheduler maybe.

DSC · Oct 30, 2013

http://videocardz.com/47420/nvidia-updates-geforce-gtx-780-ghz-edition

GTX 780 GHz edition coming soon also? Using the B1 revision of GK110.

fellix · Oct 30, 2013

GTX 780 Ti Specifications Leaked – Full Blown Gk110 Core with 2880 SP

Deleted member 13524 · Oct 30, 2013

fellix said:
GTX 780 Ti Specifications Leaked – Full Blown Gk110 Core with 2880 SP

So it's still 3GB in a >$650 card.. Launching in the year where next-gen consoles will have about 5-6GB worth of gaphics memory available.

UniversalTruth · Oct 30, 2013

ToTTenTranz said:
So it's still 3GB in a >$650 card.. Launching in the year where next-gen consoles will have about 5-6GB worth of gaphics memory available.

Unless it is a photoshopped job

How many percents improvement over the normal 780 do you expect?

3 GB would be enough, unless AMD decides to push developers to advise for 4 GB in some of their games.... As they have already done with some game recommendations being 3 GB, so nvidia's 2 GB get morally obsolete

pjbliverpool · Oct 30, 2013

ToTTenTranz said:
So it's still 3GB in a >$650 card.. Launching in the year where next-gen consoles will have about 5-6GB worth of gaphics memory available.

I'm sure 3GB will be sufficient but I'm even more sure that 3GB will simply be the reference model. I'd be extremely surprised not to see 6GB variants out there too.

On a side note, this thing should be a MONSTER! Comparisons to the 290x at 4K and using Mantle at lower resolutions will be interesting though. AMD may still have the edge in those scenarios.

Plus AMD has TruAudio, I won't let anyone forget that

pjbliverpool · Oct 30, 2013

Actually doing the calculations on clock speeds I wouldn't be surprised to see this thing beat the 290x in 4K as well and even with Mantle seems like a real possibility at least.

On clock speeds, assuming they both hit full boost its between 16-18% faster than Titan in both memory and core. Add in the extra SMX and you're looking at a 25% shader/texture boost over Titan. That's a serious boost for a same generation product. I can see why a 780Ghz Edition is needed now since the gap between the normal 780 and 780Ti would otherwise be massive.

How did my 670 start feeling slow slow all of sudden??

boxleitnerb · Oct 30, 2013

ToTTenTranz said:
So it's still 3GB in a >$650 card.. Launching in the year where next-gen consoles will have about 5-6GB worth of gaphics memory available.

Custom and 6 GB models should be available as well afaik.

LittleJ · Oct 30, 2013

pjbliverpool said:
On clock speeds, assuming they both hit full boost its between 16-18% faster than Titan in both memory and core. Add in the extra SMX and you're looking at a 25% shader/texture boost over Titan. That's a serious boost for a same generation product. I can see why a 780Ghz Edition is needed now since the gap between the normal 780 and 780Ti would otherwise be massive.

They should have called the 780GHz edition the 780 Ti and the full GK110 sku could have been named GTX 785.

UniversalTruth · Oct 30, 2013

pjbliverpool said:
How did my 670 start feeling slow slow all of sudden??

Well, with my card which is actually quite a bit slower than yours, I feel very happy running the beautiful Crysis 3 at 1080p and relatively high settings. Also, F1 2013 runs perfectly smooth at max at 1080p.

Yes, those cards deliver quite higher frame rate but the visual satisfaction... will it be also at the same level higher?

pjbliverpool · Oct 30, 2013

UniversalTruth said:
Well, with my card which is actually quite a bit slower than yours, I feel very happy running the beautiful Crysis 3 at 1080p and relatively high settings. Also, F1 2013 runs perfectly smooth at max at 1080p.

Yes, those cards deliver quite higher frame rate but the visual satisfaction... will it be also at the same level higher?

Nah my cards plenty powerful enough (well most of the time anyway). It just feels slow compared to all these recent behemoths!

iMacmatician · Oct 31, 2013

From Videocardz: "NVIDIA GeForce GTX 780 Ti has 3GB memory and GK110 GPU."

xDxD · Oct 31, 2013

edithttp://semiaccurate.com/2013/10/30/tsmc-shows-production-20nm-16nm-dev-wafers/

homerdog · Oct 31, 2013

Holy moly... this 780Ti should easily surpass the 290X with lower noise levels to boot. And unlike Titan vs GTX780, the 780Ti will actually justify its higher price tag.

AMD should have done a little better on the cooler for the 290X, especially considering how much its performance scales with temperature.

lanek · Oct 31, 2013

homerdog said:
Holy moly... this 780Ti should easily surpass the 290X with lower noise levels to boot. And unlike Titan vs GTX780, the 780Ti will actually justify its higher price tag.

AMD should have done a little better on the cooler for the 290X, especially considering how much its performance scales with temperature.

You expect Nvidia who want counter the 290x coming with a card with lower performance ? As for the price at 699$.. all will depend how much the performace are over the 290x for see if the price is effectively well placed. ( 100$ more ) ( i dont care about the cooler, and there's custom AIB cooler for peoples who want use a "stock cooling". )....

Anyway, the ~50mhz gain is not extremely high ( vs Titan ), but the question is still if it board 2688 or 2880 If the card finally show with 2880SP...thats another story

pjbliverpool · Oct 31, 2013

Im dissapointed if the clocks are lower than that first leak. I was stoked for that GPU!

Nvidia BigK GK110 Kepler Speculation Thread

trinibwoy

Meh

RecessionCone

Jawed

Tridam

trinibwoy

Meh

DSC

fellix

Deleted member 13524

Guest

UniversalTruth

pjbliverpool

B3D Scallywag

pjbliverpool

B3D Scallywag

boxleitnerb

LittleJ

UniversalTruth

pjbliverpool

B3D Scallywag

iMacmatician

xDxD

homerdog

donator of the year

lanek

pjbliverpool

B3D Scallywag

Similar threads