NVIDIA Kepler speculation thread

Discussion in 'Architecture and Products' started by Kaotik, Sep 21, 2010.

Tags:
  1. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    568
    Likes Received:
    104
    Based on many reviews and forum threads, the stock clock and voltage on the HD7970 is quite conservative. Almost all mention clocks of higher than 1.1Ghz being achievable on stock voltage, and quite a few reach greater than 1.2Ghz on stock voltage. There have also been some impressive undervolting reports on stock clocks.
    Essentially, many of the cards shipped to reviewers and the public already contain 'magic dies'.

    The default clock of the HD7970 is 925MHz.

    True, but those 'bad' dies can still be used in HD7970s and HD7950s.

    Possibly, although the level of overclockability of the HD7970 on stock voltage has been seldom seen on modern non-cut-down GPUs before.
    My suspicion is that the GTX 680 already is such a 'magic die' part. That would explain the naming confusion over the past months (if the 670 Ti was to be the top end part, and the 680 is a higher clocked 670 Ti quickly introduced when Nvidia realised such a part could beat the 7970) and the rumoured specifications showing the 670 not having any components fused off like most salvage parts do.
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    A1xLLcqAgt0qc2RyMz0y didn't Dave already confirm that on Tahiti/7900 series timing was everything instead of getting the "best performance reasonably possible", and that Tahiti is being "looked at again" hinting higher clocked "GHz edition" or "7980"
     
  3. DarthShader

    Regular

    Joined:
    Jul 18, 2010
    Messages:
    350
    Likes Received:
    0
    Location:
    Land of Mu
    So when are we going see the fixed bigK, aka GK100? Or at least GK110? GK112?? ;)
     
  4. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    I know what a cache is. Again, I thought one of the options in the paper was to make the 2 level register hierarchy explicit, in which case it should be possible to arrange for at least some values with short live ranges never to occupy space in the main RF.

    Does using faster GDDR5 mean that latency for off-chip memory access shrinks significantly, or does it just give you a bandwidth boost? Could just be that less threads are needed to hide memory access latency.
     
  5. DarthShader

    Regular

    Joined:
    Jul 18, 2010
    Messages:
    350
    Likes Received:
    0
    Location:
    Land of Mu
    Speaking of bigK, do you guys think it's going to be a "straight" upscale by 1.5x shaders and almost 2x bandwidth (speculated 512bit)? The cache and register capacity seems good enough for gaming, but will it be enough for compute? Wonder if they keep dual dispatch schedulers for the big chip too, so maybe the architecture of a SMX will be a bit different?
     
  6. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    568
    Likes Received:
    104
  7. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    2304SPs was often rumored for GK110.

    Maybe:
    - 4 GPC (12 rastered pixels each?)
    - each 3 SMX (192SPs, 6 warp sheduler)
    - extended caches

    GK110 should be capable of DP @ 1/2 SP.
     
  8. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland

    Actually there's 4 ASIC quality, and each have their stock vcore fixed:
    3 have been reported on different forums ( the last one should be under the 1112mV )

    - 1107mv ??? ( nobody have seen one yet, but they are reported by TSMC, and some even lower )
    - 90% Asic = 1112mV: max reported 1200mv OC
    - 80 to 88% = 1115mV: max reported 12225mv OC ( a rare one, i believe they are pushed in the 1117mV line )
    - 75 to 80% = 1117mV: max reported ( limited by AB on vcore with stock cooling (under 80% Asic quality ) ( the one to choose now for OC )

    Any of them go higher of 1100mhz with stock voltage and can be set at 1150mhz without tweaks. the 1117mV acheving the best overclock tweaked ( 1250mhz+ on stock cooling ), when the 1112mV ( 90% Asic quality ) reach the higher with stock voltage. but cant take the increase of voltage as the other with stock cooling.

    I have 2 cards here: One early Sapphire who is 1112mV and one HIS who is 1117mV .. ( who is a problem on stock cooling, one doing 1275+mhz on stock cooling and the other stuck at 1200-1225mhz, but the difference is removed under watercooling then, both going to 1300+ mhz )


    As for the fight 7970 / 680... Im not even sure yet a better part is needed, there's allready 1000mhz models available ( DirectCUII, MSI etc ). ( the XFX is a bad example )

    If their numbers are true ( BF3 delta of 1.7fps (with 5% admiting difference due to turbo boost), this let LP2, and 3Dmark11... 3Dmark11 is clearly not done on the same system ( score is too low with an I7 3960K for the 7970 ( 6c vs 4 = 5000pnts on physic test difference ) ( the 6cores do a extremely big difference on 3Dmark11, vantage and 3Dmark06 ).

    We will need a complete review and bench...
     
    #2828 lanek, Mar 17, 2012
    Last edited by a moderator: Mar 17, 2012
  9. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I doubt that 40ish Fps would be VSynced. What could be nice is a 120Hz screen at the center and two 60Hz displays at both side - all VSynced (or 60/30Hz, for low-Fps games).
     
  10. OgrEGT

    Newcomer

    Joined:
    Sep 12, 2010
    Messages:
    31
    Likes Received:
    0
    Sorry for OT...
    But why is the XFX a bad example?
     
  11. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    798
    Likes Received:
    1,625
    And there is no hot clock, so relative to gf104 Kepler has same amount of regs per sp (192 per 256 Kb vs 48 @ hot clock = 96 @ base clock per 128 kb in gf104)
     
  12. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland

    It will be needed to test them again, but it seems the memory was not clocked enough high, the result look like bottlenecked by it, specially if you compare with the Asus DirectCUII or the future MSI lightning.
    ( Bad example is a bit hard, let say for the oc applied, the gain should be a bit higher )
    The tests of the XFX have been made at the 7970 release, so with early driver ( 8.921 )maybe this have more impact of what the reality is.
     
  13. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,418
    Likes Received:
    10,311
    A 1050 or 1100 mhz card would be enough to compete with GTX 680 assuming that linked site recently isn't bogus. This already bumps performance up by ~20%.

    Not at the clocks I just mentioned. Those don't require an increase in voltage. And from reviews it results in a very minor increase in board power consumption. Basically still well under 225 watts.

    I'd imagine the same price as the GTX 680. At worst the 7970 base version would be moved down or just replace entirely by the new card. And even if that happens early adopters still got a better value than they did back in the Geforce 6800 Ultra and Geforce 7800 GTX days when card prices generally fell in as short as 2-4 weeks after launch. Plus, more below...

    Easy. All currently clocked 7970's go EOL. If AMD are feeling generous they can potentially release a flashable BIOS that users could use to flash their cards to the "Ghz" version. And even if they don't, it's not like it wouldn't be easily obtainable over the net within minutes of someone getting a "Ghz" 7970.

    Or failing that just OC your current card.

    From what I've seen a measely 1-2 celcius increase in load temp at the clocks I mentioned. For ~20% more perf. Aggressive binning could make 1.2 ghz viable at current voltages, but IMO it isn't really needed.

    Hard to say since not only do we still not know how it really performs, we have no idea how much overclocking headroom there is.

    I agree, but not for the reasons you stated. I agree, because it is kind of pointless with all the AIB OC'd cards available.

    What AMD basically has to do is the same thing Nvidia has done in the past.

    Convince review sites to benchmark GTX 680 against AIB factory OC'd 7970's. Just like Nvidia has done for years now everytime ATI/AMD has launched a new card. I'm expecting lots of people that formerly said it was OK to do this when Nvidia did it to say it is unfair if AMD does it. :p Personally I don't like it when either vendor does it.

    Regards,
    SB
     
  14. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    And even if, it's just a number on a spec sheet. The cooling solution has lots and lots of untapped potential and if they want to, AMD can always use Powertune to implement any TDP no. they might want.
     
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    Yup, no change in compute:mem ratio.
     
  16. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    A Fermi SIMD is physically 16-wide. It only needs 48 input and 16 output regs per hot-clock. If the regfile was providing twice that per clock then it was running at core clock, not hot.
     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Is there any word yet on compute abilities, or more specificly, DP speed? Does it have the "midrange ratio" or similar to that of Tahiti (which was 1:2 dp:sp ratio, limited to 1:4 on consumer boards?)
     
  19. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    You've got a point there. :)

    You know, i'd really love to see a link. All I've been told was, there's physically 1:4, nothing throttled.
     
  20. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    960
    Likes Received:
    853
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...