Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

Tags:
  1. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Hmmmm, actually Titan throws a wrench into my theory. It gets ~3.2 TFLOPS in CUBLAS SGEMM. That's ~72% of peak, similar to K20. Maybe there's more to GK110 than nVidia is letting on but I can't find anything on observed peak instruction throughput.

    http://on-demand.gputechconf.com/gtc-express/2012/presentations/inside-tesla-kepler-k20-family.pdf

    http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3
     
  2. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    GK110 has a different instruction encoding that allows it to address 255 registers, while GK104 can only address 63. Although the aggregate register file space and the overall SM architecture is the same for both, SGEMM can get better performance by using more registers. The extra flops you mentioned are real, they're just near impossible to access - they can only be used in limited circumstances in carefully scheduled instruction sequences.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Ah, that's much better. I was surprised by the very small per work-item register allocation in that slide deck. SGEMM thrives on registers.
     
  4. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    The compiler and the GPU need to extract at least 50% of "vec2" ops to reach the full rate. Kepler SIMD units behavior is not strictly scalar as it was with previous NV GPUs (except GF114).
    FMUL R0.x, R0.x, R1.y -> ~66% of max throughput
    FMUL R0.xy, R0.xy, R1.xy -> 100% of max throughput achievable
     
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Right, the problem is that I can't find any evidence of instruction issue rates anywhere close to 100%, even in code with lots of co-issue opportunities. This was trivial on Fermi and pretty much all AMD architectures (VLIW and GCN).

    I'm betting that RecessionCone is right about it being nearly impossible to realize peak flops on Kepler. The limiting factor is something before the ALUs - the reg file or scheduler maybe.
     
  6. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
  7. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    Unless it is a photoshopped job :lol:

    How many percents improvement over the normal 780 do you expect?

    3 GB would be enough, unless AMD decides to push developers to advise for 4 GB in some of their games.... As they have already done with some game recommendations being 3 GB, so nvidia's 2 GB get morally obsolete
     
    #1609 UniversalTruth, Oct 30, 2013
    Last edited by a moderator: Oct 30, 2013
  8. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    I'm sure 3GB will be sufficient but I'm even more sure that 3GB will simply be the reference model. I'd be extremely surprised not to see 6GB variants out there too.

    On a side note, this thing should be a MONSTER! Comparisons to the 290x at 4K and using Mantle at lower resolutions will be interesting though. AMD may still have the edge in those scenarios.

    Plus AMD has TruAudio, I won't let anyone forget that :wink:
     
  9. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    Actually doing the calculations on clock speeds I wouldn't be surprised to see this thing beat the 290x in 4K as well and even with Mantle seems like a real possibility at least.

    On clock speeds, assuming they both hit full boost its between 16-18% faster than Titan in both memory and core. Add in the extra SMX and you're looking at a 25% shader/texture boost over Titan. That's a serious boost for a same generation product. I can see why a 780Ghz Edition is needed now since the gap between the normal 780 and 780Ti would otherwise be massive.

    How did my 670 start feeling slow slow all of sudden??
     
  10. boxleitnerb

    Regular

    Joined:
    Aug 27, 2004
    Messages:
    407
    Likes Received:
    0
    Custom and 6 GB models should be available as well afaik.
     
  11. LittleJ

    Newcomer

    Joined:
    Oct 8, 2012
    Messages:
    54
    Likes Received:
    0
    They should have called the 780GHz edition the 780 Ti and the full GK110 sku could have been named GTX 785.
     
  12. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    Well, with my card which is actually quite a bit slower than yours, I feel very happy running the beautiful Crysis 3 at 1080p and relatively high settings. Also, F1 2013 runs perfectly smooth at max at 1080p.

    Yes, those cards deliver quite higher frame rate but the visual satisfaction... will it be also at the same level higher?
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    Nah my cards plenty powerful enough (well most of the time anyway). It just feels slow compared to all these recent behemoths!
     
  14. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
  15. xDxD

    Regular

    Joined:
    Jun 7, 2010
    Messages:
    412
    Likes Received:
    1
  16. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    Holy moly... this 780Ti should easily surpass the 290X with lower noise levels to boot. And unlike Titan vs GTX780, the 780Ti will actually justify its higher price tag.

    AMD should have done a little better on the cooler for the 290X, especially considering how much its performance scales with temperature.
     
  17. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    You expect Nvidia who want counter the 290x coming with a card with lower performance ? As for the price at 699$.. all will depend how much the performace are over the 290x for see if the price is effectively well placed. ( 100$ more ) ( i dont care about the cooler, and there's custom AIB cooler for peoples who want use a "stock cooling". )....

    Anyway, the ~50mhz gain is not extremely high ( vs Titan ), but the question is still if it board 2688 or 2880 If the card finally show with 2880SP...thats another story
     
    #1619 lanek, Oct 31, 2013
    Last edited by a moderator: Oct 31, 2013
  18. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    Im dissapointed if the clocks are lower than that first leak. I was stoked for that GPU!
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...