Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

  1. A1xLLcqAgt0qc2RyMz0y

    Regular

    Joined:
    Feb 6, 2010
    Messages:
    808
    The Big Kepler adds 3.5 billion transistors to the GK104.

    Some of the improvements are:

    reorganized processing cores with new instructions
    an improved memory system with faster atomic processing and low-overhead ECC

    So what are the additional changes that Nvidia has added to the Big Kepler that could use lots of transistors?

    ------------------------


    Source: https://registration.gputechconf.com/?form=schedule
    Change Drop Down date to Wednesday 5/16
     
  2. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    627
    Location:
    United States of America
    Well there's this set of rumors/speculation from 3DCenter (translated) saying 3072 CCs, so that would use lots of transistors. According to that rumor, GK110 seems close to an overall doubled GK104 in terms of basic specs.
     
  3. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    476
    Big K isn't going to just be a doubled GK104. For nVidia, the HPC/workstation segment is bigger than the high-end gpu one. So Big K will likely emphasize 64-bit throughput, with a healthy helping of caches.
     
  4. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    627
    Location:
    United States of America
    I was thinking along the lines of CC count and bus width, but yeah you're right.

    But if the 3072 CC stuff is true, then I'm interested to know how they could squeeze that many CCs into GK110, especially considering the additional compute features would presumably make the die bigger for the same CC count.
     
  5. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    1,244
    512-bit memory bus?
     
  6. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,557
    The number of GPCs, SMX and TMUs are probably not scaling in the same way compared to GK104.

    More interesting questions are power consumption and possibilities of partly deactivated units on top-SKU.
     
    #6 AnarchX, Apr 21, 2012
    Last edited by a moderator: Apr 21, 2012
  7. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    How big will its die be? :???:
    If they keep the same transistor density of ~12.04 MTr/mm2, then this 7000 M transisotors beast will need around 580 mm2. :shock:
     
  8. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,125
    Location:
    Switzerland
    Its cause the 7 Billions of transistors ( 7000M ), are not confirmed yet ...

    I really doubt Nvidia and their experience of 28nm will end with a 550mm2 chips.. In reality dont forget we are absolutely not speaking about Kepler. We are speaking about a card who could see the daylight in 5-6 month.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,641
    Location:
    London
    merge this pointless thread back with the main one
     
  10. A1xLLcqAgt0qc2RyMz0y

    Regular

    Joined:
    Feb 6, 2010
    Messages:
    808
    You mean make this thread disappear in the useless noise of Bitcoin mining, how much tax the EU adds vs the USA, Physics jobs in Germany vs USA, etc , etc, etc.

    If anything the other thread is the bloated pointless thread especially in relation to the Tesla line.

    Having a thread specifically on the BigK GK110 Tesla/HPC GPU without the above mentioned useless posts is useful.

    I expect that the GK110 will be fully dedicated to the professional market and would like to see what others expect the additional 3.5 billion transistors have added over the GK104 GPU.

    And if you really like the other thread so much you can stay and post on that one and ignore this one.

    Back to the speculation on what is added to make up the +3.5 billion transistors here are the guesses so far:

    3072 CCs
    64-bit throughput
    healthy helping of caches
    512-bit memory bus
     
  11. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,170
    Location:
    Treading Water
    So should we expect gk110 to be a lot better at bitcoin mining per transistor?
     
  12. jaredpace

    Newcomer

    Joined:
    Sep 28, 2009
    Messages:
    157
    :yes:
     
  13. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    3,698
    Location:
    Germany
    3072 ALUs
    -> 6x GPCs (à 512 SPs)
    --> 4 SMK to each GPC, 128 ALUs/SMK
    --> each SMK has
    ---> 4 groups of 32 ALUs
    ----> two of which are 64 Bit capable, re-using data-paths from the other ALUs
    ----> two groups share a quad TMU
    ----> 4x 32 kiB L1-Cache shared among the ALU blocks, configurable as scratchpad memory in block sizes of 32 kiB.

    512 Bit MI
    -> 8x 64-Bit memory partitions
    -> 4 GiB default memory size for gaming cards, twice for Tesla, Quadro
    -> (probably) 2048, rather still 1024 kiB L2-Cache

    850 MHz core clock plus advanced turbo (independently clockable GPCs?) and probably 1.40ish MHz GDDR5 speed not pushing the envelope here as much.

    Making close to 7 bln transistors and 550 mm² die size as agreed upon here.
    Hm?

    Plus as Big-K special sauce:
    - one dedicated physx processor per SMK
    - a broken and unfixable design
    *SCNR*
     
  14. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    12,774
    I thought physx was adapted to run on standard shaders, hence there is no dedicated physx unit (unless theyve put the ageia stuff onchip)
    scnr ???
     
  15. Alexko

    Veteran

    Joined:
    Aug 31, 2009
    Messages:
    3,928
    Sorry, Could Not Resist.

    In other words, the PhysX part was a joke. ;)
     
  16. Arun

    Arun Unknown.
    Moderator Veteran

    Joined:
    Aug 28, 2002
    Messages:
    4,971
    Location:
    UK
    Seems reasonable to me. I don't think you're describing the intra-SMX 'groups' correctly (a group of two schedulers share three 32-wide ALUs plus other units in GK104, and there are two such groups per SMX sharing L1/Shared Memory plus a few other things) and who knows how that'll evolve (see GF100 vs GF104) but the final numbers make a fair bit of sense.

    Extremely unlikely, that makes absolutely no sense for a GPU. Independent clocking makes sense on CPUs because single threaded performance is key. That should never matter on GPUs - although in practice it might because of static tile allocation to specific GPCs/SMXs. Fermi was certainly static, I think GK104 is as well, but they haven't really talked about it. They really should just switch to dynamic tile allocation ala SGX! ;)
     
  17. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    3,698
    Location:
    Germany
    I know that GK104 is organized differently, but I think it is possible that Nvidia did not follow the same route for their GPU-Compute optimized chip.

    WRT to advanced GPU-Boost: Depending on how high you could go when enough GPCs idle I think this could make a difference for serial performance. In other words, depending on how power limited Big-K will turn out to be, the higher your possible gains for compiler-identifyable latency-dominated tasks.
     
  18. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,557
    An independent clock for all GPCs or for each GPC?

    With ~850MHz base clock, GK110 could offer a much higher Boost, in cases when the performance is limited by the GPCs.
    On the other hand NV could use this and present a < 3072SPs GeForce version, with ~1GHz clock, since gaming performance favors a faster front-end.
     
  19. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    3,698
    Location:
    Germany
    What I meant was a common clock throughout each GPC, but individually adjustable, possibly based on available power and maybe even on thread priority or type.

    In any case, Nvidia would need to cut down on something if they are going to stay within 300 watts power budget.
     
  20. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    12,712
    I don't think Nvidia are that concerned with gaming performance for big Keplar, hence that last would be doubtful if it impacts compute performance.

    Likewise, the same could be applied to what CarstenS is suggesting with individually clocked GPCs. Don't compute oriented workloads generally push all compute units relatively uniformly? Hence even the current turbo on GK104 might be determined to be not needed and hence a waste of transistors.

    IMO, for big Keplar, compute performance will matter most, with gaming performance being secondary. Unlike GK104 where game performance was king and computer performance secondary.

    Regards,
    SB
     

Share This Page

Loading...