NVIDIA Kepler speculation thread

Discussion in 'Architecture and Products' started by Kaotik, Sep 21, 2010.

Tags:
  1. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,532
    Likes Received:
    957
    It could simply be NVIDIA's answer to AMD calling Barts the 6800 series, Cayman 6900, and so forth.

    Come on! Would NVIDIA ever do that? :D
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,052
    Likes Received:
    4,263
    Location:
    Finland
    Which is ironic, considering apparently moving Barts to 6800 was triggered by nV using "GTX" (highend naming) on midrange 460 (and then 560) :grin:
     
  3. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,532
    Likes Received:
    957
    Just give it a few generations, all SKUs from bottom- to top-end will be called HD N950 to HD N999 on AMD's side and GTX N90 to GTX N99 on NVIDIA's… :razz:
     
  4. Vardant

    Newcomer

    Joined:
    Sep 1, 2009
    Messages:
    96
    Likes Received:
    1
    March.
     
  5. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    I thought April was the magic date with GK110 only following September or so?

    R600?

    Meanwhile, I'm still trying to figure out how nvidia could fit 4 GPC / 16 SMs / 1536 alus on a ~350mm² chip.
    I think it's doable though. Essentially the SMs would be "GF104" style, just instead of 3x16 shader alus they'd be 3x32 to compensate for no hot clock (from a scheduling point of view nothing would actually change). That certainly would make the SMs somewhat larger, so it seems somewhat unlikely it would fit on a chip the same size as GF104/114 (granted ROPs wouldn't double up but almost everything else). There's another trick nvidia could do though, that is to eliminate the SFU and integrate that functionality into the normal shader alus (which amd did too), that should save some transistors. I don't think separate SFU really make sense any longer (and I'm not sure it did for Fermi neither).
     
    #1685 mczak, Feb 13, 2012
    Last edited by a moderator: Feb 13, 2012
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,683
    Likes Received:
    2,601
    Location:
    New York
    The scenario I laid out doesn't have it beating Tahiti.

    Exactly.
     
  7. psolord

    Regular

    Joined:
    Jun 22, 2008
    Messages:
    444
    Likes Received:
    55
    Physx support for Borderlands 2? Yikes!
     
  8. Vardant

    Newcomer

    Joined:
    Sep 1, 2009
    Messages:
    96
    Likes Received:
    1
    Aliens: Colonial Marines more likely.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,683
    Likes Received:
    2,601
    Location:
    New York
    If AIB's have sample boards in hand already then late March should be doable unless they need more time to build inventory.
     
  10. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,659
    Likes Received:
    3,662
    Location:
    Germany
    What exactly is it btw, that keeps GPUs from reaching frequencies that CPUs have been more or less comfortable at for years, i.e. 3 GHz-ish?

    IF - i know that's a big if - Nvidia would manage to triple or even quadruple (more options, probably for mobile with reduced voltages) hot clock from a common base clock, they could reach the performance of a 1536-ALU-part (non hot-clocked) with 512 ALUs and would have easier routing throughout the chip, probably less overhead for instruction scheduling and what-else-not.

    Plus, if leakage and variance still is a big thing on 28nm which apparently it was at least at the start of production, they'd have fewer transistors, leaking proportionally less.
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,484
    Likes Received:
    1,844
    Location:
    London
    Isn't NVidia planning to run its ALUs at 2GHz+ for its exascale GPUs?
     
  12. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,292
    Likes Received:
    1,733
    Location:
    France
    I would guess density and cooling area ? Look at big cpu cooler, you can't put that on a gpu.
     
  13. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,313
    Likes Received:
    485
    Location:
    Romania
    My guess would be that the GPUs lack: proper foundry processes, a fair share of custom logic, complex & deep pipelines.

    Probably, higher power consumption would also be a key here.
     
  14. whitetiger

    Newcomer

    Joined:
    Feb 5, 2012
    Messages:
    57
    Likes Received:
    0
    I'm fairly sure you asked me about the GF110 compared to the GF114, but perhaps I was mistaken
    :wink:

    In which case, I think the GK110 is likely to end up just as bw constrained as the GK104 (proportionately)
    - assuming it has 50% more SPs
    (AFAIK, the current rumor is for 6 GPCs compared to the 4 GPCs on the GK104)
    - and the currently rumored 384-bit bus gives it 50% more bw...

    unless it has a 512-bit bus...
     
  15. keldor314

    Newcomer

    Joined:
    Feb 23, 2010
    Messages:
    132
    Likes Received:
    13
    The big problem with running a GPU at CPU level frequencies is that a high end GPU die has easily 5 times as much surface area than a CPU die. Combine that with the fact that 50% of a CPU is cache, which uses less power/ generates less heat than active logic, whereas a GPU has most of its area dedicated to active logic. Combine those two, and you can see that a GPU at CPU like clocks would consume an order of magnitude more power and generate that much more heat. Ouch.

    Basically, the laws of physics dictate that doubling core count uses less energy, therefore generating less heat, than doubling frequency. Since GPUs are pretty much at the energy consumption limit, at least without requiring exotic cooling methods, it's more efficient, since GPU programming expects very high levels of parallelism, to increase core count than frequency to stay within the power budget.
     
  16. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,532
    Likes Received:
    957
    Dynamic power would explode. Plus, you'd need much deeper pipelines to reach such clocks, which means additional stages, so more transistors and in the end, leakage wouldn't decrease by that much. In fact, you'd need pretty high voltage too, so it might not decrease at all.
     
  17. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    Essentially the same reason why neither Bobcat nor Atom reaches anywhere close to 3Ghz, it's just not power efficient. Both go for dual core with "half the frequency" instead - and that is with a piece of silicon where single-thread performance still matters. Higher frequencies might also require custom logic (which at least bobcat doesn't have neither).
     
  18. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,659
    Likes Received:
    3,662
    Location:
    Germany
    So, general consensus here seems to be that it's got nothing to do with graphics related functions, but rather with power issues.
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,797
    Location:
    Well within 3d
    Power has been the ultimate limiter for the last few generations, and it's getting worse.

    Also, it's not like pushing timings closer to the bleeding edge makes a circuit more resistant to device variation and interference.
    There are inflection points where the required voltage and area needed for a high-speed pipeline make density and power sacrifices far higher than the actual gains, especially if you move towards the upper end of what the process tech can support.

    Let's note that CPUs with their limited core counts and aggressive turbo implementations toe that line as a matter of routine. A few hundred MHz, or at most a few tens of percent extra throughput can take a chip running at a fraction of its TDP to way over.

    The density figures for high-speed logic are not that great, and the scaling across nodes is noticeably worse than density-optimized portions of the chip like memory.
     
  20. Lux_

    Newcomer

    Joined:
    Sep 22, 2005
    Messages:
    206
    Likes Received:
    1
    GPU-s have 4 times more transistors than CPU-s (link) and it is easier to achieve higher "whole package" utilization of GPUs than CPU-s. Take for example FurMark and Linpack - both are high-load benchmarks, but FurMark causes much more trouble.

    GPUs already have the TDP 3 times higher than CPUs, having frequency 3 times lower. I'm sure there are refinements that could be made to GPUs, but so far it seems that the way to greater performance comes from rearchitecting the chip, not optimizing existing GPUs to run with higher frequencies.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...