Nvidia BigK GK110 Kepler Speculation Thread

OK, my mind is blown. This would appear to imply that Titan is at about the same SGEMM performance as HD5870 (nearly 2 TFLOPS).

DGEMM should be fine though (should be 90%+ of theoretical).

Hmmmm, actually Titan throws a wrench into my theory. It gets ~3.2 TFLOPS in CUBLAS SGEMM. That's ~72% of peak, similar to K20. Maybe there's more to GK110 than nVidia is letting on but I can't find anything on observed peak instruction throughput.

http://on-demand.gputechconf.com/gtc-express/2012/presentations/inside-tesla-kepler-k20-family.pdf

http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3
 
Hmmmm, actually Titan throws a wrench into my theory. It gets ~3.2 TFLOPS in CUBLAS SGEMM. That's ~72% of peak, similar to K20. Maybe there's more to GK110 than nVidia is letting on but I can't find anything on observed peak instruction throughput.

GK110 has a different instruction encoding that allows it to address 255 registers, while GK104 can only address 63. Although the aggregate register file space and the overall SM architecture is the same for both, SGEMM can get better performance by using more registers. The extra flops you mentioned are real, they're just near impossible to access - they can only be used in limited circumstances in carefully scheduled instruction sequences.
 
Sorry for the self-quote but this is still a mystery to me. It's pretty clear at this point that a Kepler SMX can only issue 128 ALU instr/clk peak, regardless of what NVidia claims.

Does anyone have any idea why they would have 6 SIMDs per SMX when only 4 warps can be issued to them? Do they have some sort of CPU like setup where instruction issue ports are shared by the execution units and having more SIMDs helps to resolve port conflicts?

I'm sure they serve some purpose - just wish I knew what it was! :devilish:

ftp://ftp.lal.in2p3.fr/pub/PetaQCD/talks-ANR-Final/Review_Junjie_LAI.pdf

The compiler and the GPU need to extract at least 50% of "vec2" ops to reach the full rate. Kepler SIMD units behavior is not strictly scalar as it was with previous NV GPUs (except GF114).
FMUL R0.x, R0.x, R1.y -> ~66% of max throughput
FMUL R0.xy, R0.xy, R1.xy -> 100% of max throughput achievable
 
The compiler and the GPU need to extract at least 50% of "vec2" ops to reach the full rate. Kepler SIMD units behavior is not strictly scalar as it was with previous NV GPUs (except GF114).
FMUL R0.x, R0.x, R1.y -> ~66% of max throughput
FMUL R0.xy, R0.xy, R1.xy -> 100% of max throughput achievable

Right, the problem is that I can't find any evidence of instruction issue rates anywhere close to 100%, even in code with lots of co-issue opportunities. This was trivial on Fermi and pretty much all AMD architectures (VLIW and GCN).

I'm betting that RecessionCone is right about it being nearly impossible to realize peak flops on Kepler. The limiting factor is something before the ALUs - the reg file or scheduler maybe.
 
So it's still 3GB in a >$650 card.. Launching in the year where next-gen consoles will have about 5-6GB worth of gaphics memory available.

Unless it is a photoshopped job :LOL:

How many percents improvement over the normal 780 do you expect?

3 GB would be enough, unless AMD decides to push developers to advise for 4 GB in some of their games.... As they have already done with some game recommendations being 3 GB, so nvidia's 2 GB get morally obsolete
 
Last edited by a moderator:
So it's still 3GB in a >$650 card.. Launching in the year where next-gen consoles will have about 5-6GB worth of gaphics memory available.

I'm sure 3GB will be sufficient but I'm even more sure that 3GB will simply be the reference model. I'd be extremely surprised not to see 6GB variants out there too.

On a side note, this thing should be a MONSTER! Comparisons to the 290x at 4K and using Mantle at lower resolutions will be interesting though. AMD may still have the edge in those scenarios.

Plus AMD has TruAudio, I won't let anyone forget that ;)
 
Actually doing the calculations on clock speeds I wouldn't be surprised to see this thing beat the 290x in 4K as well and even with Mantle seems like a real possibility at least.

On clock speeds, assuming they both hit full boost its between 16-18% faster than Titan in both memory and core. Add in the extra SMX and you're looking at a 25% shader/texture boost over Titan. That's a serious boost for a same generation product. I can see why a 780Ghz Edition is needed now since the gap between the normal 780 and 780Ti would otherwise be massive.

How did my 670 start feeling slow slow all of sudden??
 
On clock speeds, assuming they both hit full boost its between 16-18% faster than Titan in both memory and core. Add in the extra SMX and you're looking at a 25% shader/texture boost over Titan. That's a serious boost for a same generation product. I can see why a 780Ghz Edition is needed now since the gap between the normal 780 and 780Ti would otherwise be massive.

They should have called the 780GHz edition the 780 Ti and the full GK110 sku could have been named GTX 785.
 
How did my 670 start feeling slow slow all of sudden??

Well, with my card which is actually quite a bit slower than yours, I feel very happy running the beautiful Crysis 3 at 1080p and relatively high settings. Also, F1 2013 runs perfectly smooth at max at 1080p.

Yes, those cards deliver quite higher frame rate but the visual satisfaction... will it be also at the same level higher?
 
Well, with my card which is actually quite a bit slower than yours, I feel very happy running the beautiful Crysis 3 at 1080p and relatively high settings. Also, F1 2013 runs perfectly smooth at max at 1080p.

Yes, those cards deliver quite higher frame rate but the visual satisfaction... will it be also at the same level higher?

Nah my cards plenty powerful enough (well most of the time anyway). It just feels slow compared to all these recent behemoths!
 
From Videocardz: "NVIDIA GeForce GTX 780 Ti has 3GB memory and GK110 GPU."

NVIDIA-GeForce-GTX-780-TI-BIOS.png
 
Holy moly... this 780Ti should easily surpass the 290X with lower noise levels to boot. And unlike Titan vs GTX780, the 780Ti will actually justify its higher price tag.

AMD should have done a little better on the cooler for the 290X, especially considering how much its performance scales with temperature.
 
Holy moly... this 780Ti should easily surpass the 290X with lower noise levels to boot. And unlike Titan vs GTX780, the 780Ti will actually justify its higher price tag.

AMD should have done a little better on the cooler for the 290X, especially considering how much its performance scales with temperature.

You expect Nvidia who want counter the 290x coming with a card with lower performance ? As for the price at 699$.. all will depend how much the performace are over the 290x for see if the price is effectively well placed. ( 100$ more ) ( i dont care about the cooler, and there's custom AIB cooler for peoples who want use a "stock cooling". )....

Anyway, the ~50mhz gain is not extremely high ( vs Titan ), but the question is still if it board 2688 or 2880 If the card finally show with 2880SP...thats another story
 
Last edited by a moderator:
Back
Top