NVIDIA Kepler speculation thread

Discussion in 'Architecture and Products' started by Kaotik, Sep 21, 2010.

Tags:
  1. xDxD

    Regular

    Joined:
    Jun 7, 2010
    Messages:
    412
    Likes Received:
    1
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,247
    Likes Received:
    4,465
    Location:
    Finland
    The "dynamic clock control"-thingy needs serious investigations, ie which scenarios affect it (ie, is it really only load related, or is there app detection or some such involved, too)
     
  3. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    It was Fermi that actually introduced the fixed clock ratio (2:1) for the ALU domain. All the previous architectures from NV used non-rational clock rate for the shaders, that was user exposed and adjustable in some predefined range.
     
  4. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    We are getting an increasing amount of agreement that there are 1500ish cores and hot-clocks. We're missing something important about those cores -- that's a huge count growth so something must have changed, no?

    Also, aside from more complex heat and power management, I'm intrigued by the 'between fxaa and msaa' hint from one of the previous stories. Was that a mistranslation, or is there some new AA mode in the offing?

    -Dave
     
  5. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Alas this is going to cause quite some user confusion until folks can understand how it really works.
     
  6. Bo_Fox

    Banned

    Joined:
    Jun 3, 2010
    Messages:
    40
    Likes Received:
    0
    That, and the overclockability too.

    It would've been nice if AMD allowed the TDP slider control to be adjusted by more than just 20%. 30-40% would've been nice for most cards, should one wish to over-volt the card and overclock the hell out of it without some clock throttling.

    I'm really hoping that NV isn't going to make things more complicated regarding the true overclockability in all scenarios - my GTX 460 1GB's resetting the clocks whenever I clock it too high in some games (stressing, "some" games) is annoying like hell. I miss the old days when I'd just see the artifacts without the drivers resetting the clocks.
     
  7. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Overvolting won't make a difference to PowerTune, so you already can without changing implied limits. Although voltage can be a variable parameter into the PT calculations, it has been implemented it as a constant because PT is tuned to be deterministic across the range of ASIC's out there, so it assumes the worst case.
     
  8. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    That techpowerup article says there are dozens of power planes... does that mean different parts of the chip will use different voltages?
     
  9. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    But not dozens of different voltages. That is clearly wrong or just some kind of typo. Maybe they wanted to speak of clock domains or the number of individually power gated domains. Or it could mean that there are a lot of power plans (without the "e"), one for each possible combination of clocks in the different clock domains.
     
  10. Fottemberg

    Newcomer

    Joined:
    Mar 8, 2012
    Messages:
    2
    Likes Received:
    0
  11. boxleitnerb

    Regular

    Joined:
    Aug 27, 2004
    Messages:
    407
    Likes Received:
    0
    It probably points to the granularity of this solution. So for instance not only 100 MHz steps but smaller ones. It wouldn't make sense to clock dozens of chip parts differently, would it? How many different domains could there be? ROPS, TMUs, ALUs...that's three.
     
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Funny I read initially plans and obviously missed the "e". Good thing they didn't go for power plants instead :lol:

    If there's any merit to it it sounds more like ROPs/TMUs and other enchilada one, rasters/trisetups another one, one for ALUs and then possibly some others for any possible combination.
     
  13. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    @Gipsel That does seem a lot more realistic.

    Though it does seems like being able to make a leakage/dynamic power tradeoff at sub-chip granularity should have some power benefits. But I'm saying that without having any idea of the costs of making the voltage tunable at that scale are.
     
  14. vking

    Newcomer

    Joined:
    Jun 17, 2007
    Messages:
    15
    Likes Received:
    2
    Individual partitions are power gated (and a bunch of them together can be rail gated since they share the rail). But you won't have different voltages for each partition since the number of power rails itself is not going to be very large.
     
  15. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    Well that explains the different clock numbers we've been hearing about… as well as the hot clocks / no hot clocks conflict. Maybe the CCs can run 1:1 with the core at times.

    I wonder at what clocks its rumored performance levels vs. Tahiti were compared at.
     
    #2235 iMacmatician, Mar 8, 2012
    Last edited by a moderator: Mar 8, 2012
  16. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    @Ailuros - I went back to the original article, and it does indeed talk about "power plans" (whatever that is), not planes - so much for my reading comprehension skills.

    @vking - thanks. I tried reading up about it a bit more, and it looks like you would need different VRMs on the PCB for each different on chip voltage. If I'm not mistaken, it looks like AMD used 2 voltage planes on some K10s (marketed as Dual Dynamic Power Management) 4-5 yrs ago. I wonder if they are doing this on their GPUs as well...
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,062
    Likes Received:
    3,119
    Location:
    New York
    If there are multiple clock domains and each can fluctuate independently then I'm sure everybody will be confused, not just end users. Best of luck to reviewers trying to figure out what's happening under the hood.
     
  18. vking

    Newcomer

    Joined:
    Jun 17, 2007
    Messages:
    15
    Likes Received:
    2
    The way it might work is (a) work load determines voltage (within min/max bounds of course), (b) voltage determines frequency (assuming that we didn't hit thermal limit, and if we hit thermal limit it will result in voltage/frequency throttling).

    So my guess would be that for any given load, minimum spec'ed frequency will be guaranteed except when thermal throttling is necessary. So barring power virus situation minimum perf is guaranteed.

    Essentially this would be a closed loop overclocking (assuming my guess about how they are doing this is correct - and I don't claim any real info, just a guess), and if done right will result in a very nice perf boost over the spec.
     
  19. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    It would also be good if workload measurement involved multiple kinds of units, as bottlenecks shift over the course of rendering a frame (e.g. a deferred shading pass probably has minimal geometry performance requirements compared say rendering shadow buffers). Maybe one could allocate more of the power budget to the bottlenecked bits by up-clocking those, then down-clock the stuff in light use to compensate. No clue if this is what Kepler does or how feasible it is. I guess since you can't tweak your voltages all over the place, I'm assuming there's a fair bit of wiggle room for clock even at some fixed voltage.
     
  20. vking

    Newcomer

    Joined:
    Jun 17, 2007
    Messages:
    15
    Likes Received:
    2
    Psurge,

    I think GPUs are already sophisticated enough to do what you are suggesting. Even if a bunch of clock domains share the same rail (hence run at same voltage), there still is plenty of room to play with such as reducing effective frequency through pulse eating, changing dividers etc. Dynamically reconfigurable PLLs are pretty common these days as well.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...