Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

Tags:
  1. Cookie Monster

    Cookie Monster Newcomer

    I thought the performance was abit higher than 33% and 25~27% vs GTX680 and 7970GHz Edition respectively?
     
  2. lanek

    lanek Veteran

  3. tviceman

    tviceman Newcomer

    Most reviews show Titan to be 45%+ faster than gtx680, and 33%+ faster than 7970GE.

    Anandtech probably has the most fair review and even has several AMD-friendly games in their benchmarking suite showing 34% faster on average: http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled
    Hardocp also shows 28-43% faster in FC3, 37-40% faster in Hitman, 8% faster in Sleeping Dogs, 25-45% faster in Max Payne, and 47% faster in BF3: http://www.hardocp.com/article/2013/02/21/nvidia_geforce_gtx_titan_video_card_review/#.UUkGBRysh8E
    Techpowerup shows Titan is 32% faster at 2560x1600: http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_Titan/27.html

    Your 25% figure is definitely low balling Titan's performance vs. 7970GE.
     
  4. CarstenS

    CarstenS Legend Subscriber

  5. dbz

    dbz Newcomer

    33% for the 7970GE (@ 1920x1200/1080 and 2560x1440/1600) is pretty much on the money.
    From the 28 site reviews (the single highest game i.q. setting per resolution per review only)
    [​IMG]
     
  6. DSC

    DSC Banned

    Last edited by a moderator: Mar 20, 2013
  7. Blazkowicz

    Blazkowicz Legend

    Wow, where is this chip coming from?
    Alright, it's a dedicated GPU, probably what we sometimes referred to as "GK208", and Logan is a future Tegra made with Cortex A15 and Kepler.. I had no idea they were going to do this :)
    It looks like they intend to compete with Jaguar more than with cell phone SoC, with that one.

    Unusually, there's information on Wikipedia, Logan would be released in Q2 2014 and Denver+Maxwell Tegra in 2015.
    I thought Maxwell would be "Tegra 5", but the next gen silicon process is so incredibly hard to develop that there's enough time for an interim product, it seems.

    I thought GK208 would have only 1 SMX.. Or they might release a GK107 or GK208 card with 1 SMX disabled. Anyway I'd want it for low power, cheap desktop linux PC.
     
  8. Blazkowicz

    Blazkowicz Legend

    BTW, from the screenshot shown on the german article, I can say this thing is running on Ubuntu 12.04 :runaway:
    The panel shown is the usual nvidia-settings, and it says Xorg 1.11 is running.

    In theory, Valve could launch Steam on that platform (only needing linux games to be ported to ARM, all other things equal). In practice, you can at least fully use the future Tegra and successor as a classic desktop/laptop, and it runs the same driver as on Windows x86.
     
  9. pjbliverpool

    pjbliverpool B3D Scallywag Legend

    Faceworks is amazing. That avatar combined with a really advanced AI routine linked into something like google search and good voice recognition could make your computer seem like one of those living AI's from a sci-fi movie. I want it!!
     
  10. psolord

    psolord Regular

    If it can cook, I want it too!
     
  11. Davros

    Davros Legend

    Yes we all remember how awesome Matrox's headCasting was :D
     
  12. LiXiangyang

    LiXiangyang Newcomer

    Doing some non-trivial CUDA programming on GK110 now, I have to say, Fermi is a very efficient design, it is much easier to achieve optimal efficiency on Fermi than on GK110.

    For GK110, ILP has to be at a pretty good level to obtain maximum efficiency, thats not a easy task for non-trivial applications.

    But the reward sometimes justifies the effects, after carefully tunned, cuda codes on GK110 can achieve very significant speed-up comparing to running them on CPU or even MiC, etc.

    Its really remind me the good old days when people do programming with machine codes or assembly, Kepler is really a rough and low-level system, it gives you so much more rooms to achieve maximum efficiency, or to mess things up.
     
  13. CarstenS

    CarstenS Legend Subscriber

    Even Acceleware did not really pay attention to ILP in their optimization course on Kepler at GTC - in their occupancy example they used an algorithm without any ILP, thus achieving only ~2/3 of nominal throughput.
     
  14. Dade

    Dade Newcomer

    I accept bets that a 28nm Fermi would out perform, in GPGPU tasks, Kepler by miles.

    I'm old enough to know that old times were just horrible :wink:
     
  15. iMacmatician

    iMacmatician Regular

  16. Arun

    Arun Unknown. Legend

    Sorry for the late reply - LiXiangyang, are you still focusing on FP32? My understanding is that NVIDIA basically decided to only optimise GPGPU around FP64 and don't care much (enough?) about Kepler's efficiency for FP32 GPGPU. In addition to the lack of ILP requirements, one obvious example is shared memory working in 64-bit bursts now and therefore requiring Vec2 accesses in FP32 mode while it's trivial to use in FP64...
     
  17. LiXiangyang

    LiXiangyang Newcomer

    Not really, GK110's fp32 performance is considerably better than its fp64 counterpart, not just because the raw flops outputs, but also because if fp64 is disabled, then Gk110 can overclock itself quite a bit (for my case, when load is high, and when 64 bit mode is disabled, GK110 will overclock itself around 980MHz), as for the double bandwidth of shared memory in 64 bit mode, that means, in terms of bandwidth measured by number of variables that can be accessed at any given cycle, fp32 is twice that of fp64 if you use packed fp32 access alot.
     
  18. AlexV

    AlexV Heteroscedasticitate Moderator Veteran

    Eh, how many half-warps are (theoretically) reading from LDS each cycle?
     
  19. keldor314

    keldor314 Newcomer

    Kepler has configurable shared memory banking, so you can either use 32 bit banking or 64 bit, as set with cudaDeviceSetSharedMemConfig(). Hence, you can either configure it for fp32 or fp64.
     
  20. LiXiangyang

    LiXiangyang Newcomer

    With that shared memory configuration, I think accessing 8-byte primtive data type or any packed data type that have 8 byte in size will result in doubled shared memory bandwidth, thats why I said packed fp32.
     
Loading...

Share This Page

Loading...