Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

Tags:
  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    So is it really feasible than they can do "almost 2x gk104" in twice the transistors with added caches, added fp64 speed etc? I really don't see that happening
     
  2. Ernestds

    Newcomer

    Joined:
    Dec 10, 2011
    Messages:
    19
    Likes Received:
    0
    I think what he says is that there is no "polymorph engine", it is just that they put some stuff that weren't part of anything inside a box in the slides and called it a day.
     
  3. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    The loss of the hotclock makes that just a 1:2 ratio of the ALU throughput. And the L2 cache bandwidth or the speed for atomics actually went up with GK104 compared to Fermi (bandwidth about +70% compared to GF110, about +150% compared to GF114, atomics quite a bit more). I don't think the culprit can be found in the 33% lower size of the L2.
    At the same time, the requirements of the DirectCompute CS5.0 and also OpenCL limit the usable size of the L1 cache to 16kB for Fermi because it can't be partitioned 32kB : 32kB, while GK104 allows that. That enables twice the L1 cache per SM(X) (32kB instead of 16kB) reflecting the doubling of computing throughput per clock (but I don't know if it is used or if GK104 defaults for a larger shared memory region, considering it has to be shared by more threads).

    The last point is a significant regression for compute (they should have doubled the L1/shared memory to 128kB, GCN has 64kB dedicated shared memory [+16kB L1] for just 64 ALUs, not 48kB max for 192 ALUs, GK110 will probably do it). And connected to that, the L/S capabilities are about halved per ALU (but overall still a plus considering the clock speed). Then you have of course the much more static scheduling reducing performance (but that shouldn't have that much of an effect).
    Another thing which comes to my mind is the quite low performance for some instructions (integer adds and logic operations are fine [albeit not full rate], integer multiplies/mads are somewhat okay with the 1:6 ratio). But just having a loop counter in a normal sized to compact loop is probably slowing down GK104 considerably (it has only a 1:24 ratio for compares! was 1:2 or 1:3 with Fermi). So it's not only the shift performance, which is lacking in GK104.
     
  4. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Isn't the LDS size [per thread-block] limited by the API-mandated exposure, anyway? I'm not sure for OCL, but DC5.0 requires a fixed size of 32KBytes. Does OCL set only a lower limit?

    Of course, there's always a way to increase the aggregate LDS size by simply stuffing more multiprocessors in the GPU, without hitting those API limitations.
     
  5. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    OpenCL requires a minimum of 32KB of LDS. Nvidia exposes 48KB in OpenCL.
     
  6. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Does it matter?
    What I was hinting at, what happens if you set a work group / thread block size of 32 and use the full 32kB per work group? Only a single warp is going to run on a whole SMX!
    Of course, that's a rather extreme example. But generally, the total size of the local memory may limit the number of work groups which can be scheduled to run simultaneously on a CU (at least when using a bit more local memory), that means it limits the amount of threads. IIRC, GCN exposes only 32kB of local memory as the maximum allocation of local memory per work group (and not the full 64kB, which means you can run at least two parallel work groups on each CU). The local memory size per work group is normally not a serious limitation.
     
  7. Dooby

    Regular

    Joined:
    Jul 21, 2003
    Messages:
    478
    Likes Received:
    3
    So, 690 wasn't BigK, it was dual 680. How utterly boring:

    "In total, the GTX 690 has 3,072 CUDA cores, running at a 915MHz base clock and 1,019MHz boost clock, both slightly reduced from the standard values on a standalone GTX 680 graphics card. The memory clock is unchanged at an effective rate of 6GHz, and you get two batches of 2GB of GDDR5 RAM, totalling 4GB of dedicated video buffer. $999"

    So, 100Mhz less and the same price as 2x 680s.
     
  8. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    Yep pretty boring all in all. Looks like we'll have to wait a while for BigK. That's if it's a going to be a consumer level gaming GPU anyway.
     
  9. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    The question is, if 690s are $1000, and one 580mm^2 die costs more than two 290mm^2 dies due to yields, and 512bit interface means 4GB, same amount as on a 690, then... Does "BigK" end up costing $1000 or more?
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The big chip would be targeting compute and professional graphics segments.
    $1000 would be way too low for a top-end Quadro.
     
  11. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    So you think its a compeletly professional/compute product with no consumer desktop component? That was my original thought. Though the buzz seems to be "wait for BigK its going to blow away the 680" ect...
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    From how it's described, it seems like it is going to be tailored to fit those markets.
    It might still fit into a high-end enthusiast single-GPU product as long as it doesn't lose to a single 680.
    The transistor count should give it plenty to work with, and Nvidia has left the upper TDP range empty, which in these power-limited scenarios means a chip running in that range should be able to win.

    The 690 card does mean that the big chip may not have the top gaming bracket.
     
  13. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    17,884
    Likes Received:
    5,334
    I wonder if nv will try a little experiment. release it as a quadro only product and see if the high end gamers buy if.
     
  14. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,470
    Likes Received:
    1,686
    Location:
    Treading Water
    You could build a quad sli system for the price of a high end quadro, so I doubt it.
     
  15. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,426
    Likes Received:
    10,320
    Pretty much exactly what I expected though. BigK is unlikely to show up before the 7xx series which is probably slated for the fall or winter quarter.

    So 680 will be the top single chip solution while 690 will be the top card solution for the 6xx line.

    Heck, it wouldn't even surprise me if BigK was relegated to the ultra enthusiast (~1000 USD) segment when it launches in the 7xx series with the chips focus being on prosumer/professional/HPC markets the consumer space will just be there for inventory bleed off and/or salvage parts. With that Nvidia using smaller dies tailored for consumer use fitting everything from 780 on down. I certainly wouldn't be surprised if Nvidia abandoned the big die strategy for the consumer space.

    Now to see what Nvidia comes up with in the lower segments.

    Regards,
    SB
     
  16. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    They would have to release GeForce drivers for it. The Quadro drivers aren't exactly performant (never mind the update schedule).
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    GK110 doesn't sound like it'll appear for desktop all that soon. If by that time 28nm yields/capacities and in extension manufacturing costs haven't normalized it won't be good news for both AMD and NVIDIA for desktop sales (well it'll be most likely high margins, low volume).

    As for NV abandoning the big die strategy in some way for desktop I wouldn't be much surprised either in the longrun, but for the time being it doesn't seem likely that professional market sales (despite big margins) can absorb the R&D expenses for such a high complexity chip.
     
  18. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    I'm not sure if this has been pointed out before in the long Kepler thread… but someone at the SemiAccurate forums noted that the Kepler GPUs for the Oak Ridge upgrade will have 6 GB memory. That seems to indicate the GK110 will have either a 384-bit bus or a 512-bit bus that's disabled to 384-bit on the particular cards they'll use.
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Nice find. I doubt they would saddle such a high profile deployment with salvage chips so maybe it's 384-bit.

    Damn them all to hell.
     
  20. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,742
    Likes Received:
    152
    Meh, I imagine they could sell them at $799 and still make decent coin... Only reason I can see for delaying consumer availability would be supply constraints (which is probably an issue); why sell for $799 when you can sell the same chip for much much more.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...