Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

Tags:
  1. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    The price is not an issue but I refuse to give in to the AFR gremlins :D

    Sure, if you have no imagination....
     
  2. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    I was describing half of an SMX as 2 warp schedulers have to share half of an SMX ressources (except registers/tex units/tex cache which are tied to each scheduler)
     
  3. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    Yes, just like the first version of Fermi's whitepaper...
     
  4. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    Yes there are. But you're sort of missing my point. Soft errors in on-chip SRAM is a far bigger problem than in DRAM. There is no point having ECC memory when your on-chip SRAMs do not have sufficient reliability. So while they might have ECC for DRAM, it's just for show.

    It's really a reflection of the fact that GK104 isn't meant for compute...while GK110 is.

    DK
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It was not my intent to imply otherwise for DRAM ECC. I was reflecting on the fact that Nvidia must believe there is a sufficient market with workloads that can live without DP and error correction to justify putting together these cards.
    I suppose it could be a freebie checkbox, since Nvidia has little interest in going through with an ECC and non-ECC controller just to keep marketing honest.


    That's why I'm asking why Nvidia is pushing it into that market. If Nvidia's slides are proven accurate in late 2012, K10 has about 2 quarters in the shadow of its bigger cousin.
    Was it pushed there to meet an upgrade cycle, or to put something in the way of Tahiti-based Firestream cards, which could come out in the meantime?
    The downside is that Nvidia's hobbled compute card can nibble at the shins of the market that could have or should have been served by GK110, making the big fish's pond a little smaller.
     
  6. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Btw., in a (German) interview of the ct with two nV guys, they basically insisted that separate FP64 units exist in GK110.
    They also said that the ECC implementation costs quite some space.

    Edit: Google translate inverts the meaning of the first sentence of the second answer. ;)
     
  7. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    961
    Likes Received:
    855
    ^^ Bing works just fine :)

    The seperate SP and DP execution units may cause same level of efficiency for Gaming SP loads compared to GK104.. It seems that NV kept simple SP EUs just to add 960 DP EUs.. GF110 EUs are more efficient than the ones on GF114.. so if the sitiuation would had been like Fermi era i would have guessed more than 2x Performance of GK104 at same clocks, but now it just ~80%, and after adding lower clock speed due to thermals it should be ~50%..
     
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
  9. tmavr

    Newcomer

    Joined:
    Sep 2, 2010
    Messages:
    10
    Likes Received:
    0
    I’d say that

    if you don’t need ECC or DP (i.e. the workload / problem / math / don’t require it)
    then you don’t need a TESLA branded board

    You can get by with an ordinary/gaming card like 690

    If you need some sort of special/ official support, ok you may buy just one T10…
    And then buy dozens of gaming cards to implement the solution.
     
  10. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    We have a 3d editor / render program (ie a regular application, not a service) that can also act as a webserver, so you can set up scenes, request pictures for them etc remote and live from a number of clients. It has a single d3d9 context which is shared among the webclients (only one can have the context thread and render a picture at a time).
    Traditionally (ie on all current installations) we have been running with a local user who auto logins and starts a few instances of the program (for different applications or just to maximize throughput).
    Given that we're not already totally gpu limited (ie reported gpu usage of 100%) we can squeze out more images/sec (on radeons) when benchmarking against 2 instances (with 2+ clients pr instance to hide image compression time) on the same graphics card. On the geforces it seems the context switching eats up any potential utilization gain.
    Ofcourse we would also like to move to a proper remote/virtual context so we can avoid this ugly auto login thing, which is not allowed to go on screenlock, requires VNC (from the remote desktop) for maintenance etc.
     
  11. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    And how exactly do you fit all those Gaming Cards into your your system or rack mount server?

    Tesla's have no external display connectors and thus have much better cooling or can even be fanless when installed into servers that have built in fans.

    If you need the special software it is keyed to only run on quadro/tesla and will not run on gaming gpu's.

    Tesla/Quadro's are also screened for the better parts whereas gaming gpu's are not. Would you really want to run your financial analysis on gaming gpu's that can/will experience an occasional hiccup in the graphics pipeline just to save a few $?
     
  12. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    financial analysis? for this particular domain they'd better ask a donkey what to do (i.e. if it eats this carrot first, then buy, else sell). the results would be cheaper and more accurate.
     
  13. imaxx

    Newcomer

    Joined:
    Mar 9, 2012
    Messages:
    131
    Likes Received:
    1
    Location:
    cracks
    Well, anything that averages over 65/70% is considered good - and it would not be one of the most absurd system I've heard it is used...

    Financial projects/libraries I know can either use SP or DP - however I wonder what would be the point of this TESLA card without on-chip ECC. Your weakest error-rate link is the error-rate you need to consider, and in the financial field... you simply do not want it.

    Probably they can be good for lighting, where you can accept both error rate and precision loss... yet, for geometry such cards would not be good (autocad uses DPFPU since the start of ninethies or so).
    ...a marketing move against upcoming tahiti card?
     
  14. jlippo

    Veteran

    Joined:
    Oct 7, 2004
    Messages:
    1,744
    Likes Received:
    1,090
    Location:
    Finland
    Been wondering about the command processor of GK110 allowing it to launch new jobs, could it be the ARM core that project Denver 'promised'?
    It certainly would be flexible enough for jobs described in the GK110 whitepaper.
     
    #194 jlippo, May 18, 2012
    Last edited by a moderator: May 18, 2012
  15. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    huum, maybe ( it will cost a lot on an gpu architecture anyway, maybe better to implement it on the whole system instead as do AMD with ARM right now to bring compatibility between x86 and ARM ( on memory side specially ) for be able to use ARM and x86 in a same system . ( this is on track, no idea when this interoperability will be seen, maybe 2013-2014 ). AMD and ARM have annonce it just less of 10 months ago they was working together on it.

    Use some ARM part or implement it, will cost a lot in place, transistor, watts and if this is just for launch some command, better to dont have it in the core itself and be inactive 99% of the time . + this ask a lot of developpement for include it.
     
    #195 lanek, May 19, 2012
    Last edited by a moderator: May 19, 2012
  16. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    I don't think realize how many chips there are out there that have ARMs sprinkled all over the die. Obviously not Cortex A9s, but a simple ARM7TDMIS goes a long way. The area and power cost is next to nothing, probably less than 0.1mm2 in 28nm. Add 16KB of RAM and you're good to go: instead of a complex hard-wired state machine that requires a lot of effort to design and verify, you can move the complexity to software with the ability to continuously fix bugs. It's trivial to add such a thing to a chip. No 'lots of development for include it' at all, just the contrary. It doesn't necessarily have to be an ARM, of course, it could be MIPS, Tensilica, 8051, or the rare in-house developed microcontroller, but in all cases, it comes down to replacing an FSM by something programmable for very little area.

    I've worked on chips with more 40 programmable micro-controllers, based off a common macro, with some custom hardware attached to it to accelerate particular instructions.
     
  17. tmavr

    Newcomer

    Joined:
    Sep 2, 2010
    Messages:
    10
    Likes Received:
    0
    A1xLLcqAgt0qc2RyMz0y, If cost is a concern? White box servers - custom made. That’s the way to go for cheap performance. Anyway, as I stated above it depends on the actual ‘problem’, what you need and what you don’t.

    imaxx,I remember the time when I had to use math coprocessor emulation, autocad 2.7, 2.9 or autocad 2.something.
     
  18. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
  19. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Here comes DX11.0c :runaway:
     
  20. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    the question is, do we know it has no on-chip ECC?
    I'd wager it has. it's non-trivial but not hugely expensive, R&D is already sunk and what you don't have from BigK is doubled L2 caches.

    I believe I didn't read anything that said GK104 has no ECC.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...