Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    Yes that's why I mentioned 1.2GHz as being the likely reasonable limit on the process.

    http://www.pcper.com/files/review/2016-01-04/polaris-12.jpg

    Obviously Nvidia can push the clock speeds higher but after a certain point the gains won't be worth the extra power. The extra gains they got from Maxwell due to knowledge of the 28nm node may not carry foward to 16FF.

    Anyway I'm not saying that they won't increase clocks at 16nm, just saying that there may be reasons why they don't or by as much as what some might believe. I don't expect to see the recurrence of anything like what happened on 28nm. Nvidia made more gains there because they tried.
     
  2. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    It's hard to know exactly what they exactly mean by this. I'm going for a combo of some real transistor optimizations as well as using that terms for low level clock gating optimizations.

    When you look at real transistor optimizations, there's not that much you can do unless you're willing to move major parts from a standard cell based flow to full custom. I don't think that's a very likely option. What's left then are minor improvements that can be deployed widely. Think RAM building blocks that are used in generators. Or a few custom standard cells the expand the default library for some specific cases that could be unusually common in a particular design. In any case, these are the kind of optimizations that will gain you a few percentage points in improvement, but since low level power optimization is a long slog for small gains, they are in the same league as low level clock gating. You just need to find enough of those cases.

    But all of that pales in comparison to what you can do with architectural stuff. Low level optimizations is a way to try to do things that have to be done a bit more efficiently, architectural stuff is about not doing stuff at all, or in a completely different way. Low level optimizations have been used extensively for over a decade now, so whatever was done additionally for Maxwell wouldn't be low hanging fruit anymore. No chance of huge gains.

    Finally: your suggestions that they might be exploiting some process improvements along the way. How would that work?

    Clock speed increases on the same process are pretty much guaranteed to be architectural. How could they not be?

    Yeah, I don't buy Mr. Pony's theories at all.
    He basing it off a dimensionless chart that says "chart for illustrative purposes only".
    Nvidia didn't have a power problem in 28nm, and 16nm will be much better no matter what. There is no justification whatsoever to not search the high speed/relatively lower perf/W corner for their next designs.

    If AMD is willing to trade away perf for reduced power, they'll lose (don't worry, they will not do that) and Nvidia will simply play the absolute performance card. And rightfully so. Maxwell was great was not because perf/W was excellent, it was because perf/W and absolute performance were both excellent.

    That's the biggest issue: your whole argument rests on this broken premise.

    What exactly is that secret magic? Why are you choosing something mysterious when there are logical explanations: major architectural changes that are well known and visible for anybody who's willing to look for it?
     
    #922 silent_guy, Mar 24, 2016
    Last edited: Mar 24, 2016
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    You can find plenty of ARM CPUs out there on lower performance, higher perf/W processes that easily be Maxwell clock speeds despite those being on a high performance process. Why? Because it's a completely different architecture. It does not have 'a bit to do with clock speed'. On the contrary, it's the single most important factor.
     
    pharma and Razor1 like this.
  4. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    http://www.guru3d.com/news-story/micron-starts-sampling-gddr5x-memory-to-customers.html

     
  5. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    Same way that for example, AMD can increase clock speeds from 3GHz to 3.7GHz at lower voltage on the same 45nm node with Phenom maybe?

    https://en.wikipedia.org/wiki/List_...2Deneb.22_.28C2.2FC3.2C_45_nm.2C_Quad-core.29

    Not seeing any major architectural changes there.

    I'm quite willing to believe it's a combination of both, it's you who seems determined to dismiss any explanation that doesn't involve 100% incredible Nvidia engineering.
     
  6. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Come on, please don't tell me that, all this time, you were talking about increasingly aggressive binning, helped by tightening process variations?

    You know why they call it "black edition"? Because it's a specialty bin that only covers a fraction of the samples. And the tightening variations reduce the amount of tail end samples that fail.

    Just like there are GTX 980 Ti with base clocks that are 20%+ higher than the reference one.

    What a waste of time.

    Exactly. That's why it's a black edition.

    Whereas every Maxwell GPU in existence needs to run with a faster base clock than one that is Kepler based.

    Sure, when all else fails, pull out the fanboy card.
     
  7. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    Sure SilentGuy, there were no transistor-level improvements regardless of what Nvidia themselves said about it. Is full custom unlikely? I guess, but unless you work for Nvidia or TSMC, the possibility can't be dismissed.

    They are called Black Edition due to having unlocked multipliers. I should know, I had one. The fact is that over the course of a year or two, the clock speeds on the average part increased by more than 20% - and I do mean average as the only parts available later on were the "Black Edition" parts.

    There are ways to improve a process mid-life, as surely you know. If not, check out the low-k stuff back on 45nm. And no, I'm not suggesting anything similar took place on 28nm, but neither am I dismissing anything, given that Nvidia themselves clearly believe they did something.
     
  8. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Then why bring it up?

    Believe what you want to believe. We know the reality soon enough.

    If Pascal runs at 1.2GHz or less, I'll be more than willing to call you the oracle of silicon engineering and myself a complete fool (and Nvidia idiots for not exploiting the main benefit for 16nm.) Are you willing to do the opposite of Pascal runs higher than 1.4GHz?
     
  9. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    Because for the sake of accuracy it's good to know that processes can be improved upon without the need of major architectural changes.

    That's an easy bet for you to make as we can be quite certain of one thing - that is Nvidia will increase clock speeds to "win" on pure performance vs Polaris 10 rather than be slower, with no regard to perf/Watt. That assumes that Polaris launches first of course.

    Let me make a counter offer - I believe that Polaris 10 will beat GP104 in perf/Watt. That should also be a safe bet for you to take, given your confidence in Nvidia's ability to outengineer AMD and given Nvidia's clear lead in that area right now.
     
  10. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    That might also explain recent rumour about AMD also being 256-bit bus as well.
    I must admit after seeing how 980 performs (even overclocked) with 256-bit bus with resolutions beyond 1080 I am a bit leery of this, yeah GDDR5x is a nice theory in the additional bandwidth boost it can provide but will it pan out that way completely in the real world.
    Anyone got or seen real world report showing GDDR5X benchmarks and under what circumstances?
    I get the feeling everyone is looking at GDDR5X from the POV of its theoretical maximum or close to it capabilities of just under 16Gbps being achievable at launch.
    Also wonder what impact/considerations if any there is using this memory in its most efficient way when considering gaming/visual developments/etc.
    Cheers
     
  11. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Ok. You're now weaseling out of your 1.2GHz claim.


    Go look through my posting history. You'll find plenty of cases where I've stated that I expect Nvidia and AMD to be similar in perf/mm2 and perf/W for the next generation. Back when the latest AMD rebrands were cause for the usual outrage, I claimed that it was the right thing for them to do to have more time to get there perf/W house in order.

    Do I really have to make a 180 turn on everything that I've stated earlier just to satisfy to mistaken preconceptions? I'd rather not.

    When one company has similar or better absolute performance, similar or better perf/mm2, and a huge advantage in perf/W on the same process, is it that unreasonable to consider that a case one out engineered the other? But again, irrelevant, since I've always expected AMD to fix most, if not all, of that for the next generation.
     
  12. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    I think, but thats my own opinion, that GDDR5x are a bit over estimate at this moment. you take peak bandwith, but double the access bit will not do miracle. this said the impact is more on the capacity than the peak access speed. Thats maybe due to the "HBM effect".. Peoples seems think that every new gen of memory are somewhat revolutionnary ( but we are really really far of HBM there ).

    In fact, i dont understand why it have not been done before . double the access bit to 64 is not really something who was not feasible with even GDDR3.. You will have not need a " 512bit memory contoller " like on Hawaii.

    I still have to see real world benchmark for see what is capable to do this GDDr5x...
     
    #932 lanek, Mar 24, 2016
    Last edited: Mar 24, 2016
  13. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    GDDR5X transfers 64 bytes per burst vs 32 for GDDR5. With equal command clocks, that means double the BW.

    What is 'a bit over estimate' ?

    How so? A GDDR5X system with 8GB will still have the same capacity as a GDDR5 system with 8GB.
     
  14. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Outside professional gpu`s have we seen a lot of 8GB gpu's ? .. No.. and specially not on 256bit low end gpu's .. This is why i speak about capacity memory impact... take a low end gpu's, with a small bus and put on it an high capacity storage.

    As for the access, over estimate because you will rarely see the peak bandwith achieved... not on the state it is done. I hope im wrong..
     
    #934 lanek, Mar 24, 2016
    Last edited: Mar 24, 2016
  15. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    You're right to be a little bit suspicious: HBM didn't exactly set the world on fire.

    But we're long past the era of explosive growth and steady improvement is where we are today. GDDR5X fills that slot nicely. Good enough to support a class up in resolution.

    I expect as much impact as there was with HBM: nothing at all. Real visual changes need to come from improved algorithms.
     
  16. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yeah would be good to know why before we have not seen 64-bit on GDDR5 for GPU as interface supports it; maybe relating to size considerations?
    Cheers
     
  17. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    No I am not. I believe 1.2GHz is about right for Polaris and a good estimation of where 14FF will top out in perf/Watt for GPU.

    Nvidia might go higher with boost clocks or through throwing out perf/Watt. They might lose some top-end due to mixed precision or improved async capability. Did anyone ever consider that AMD's ACE's could be an issue with their clocks I wonder...

    All I ever said was that people shouldn't assume a "normal" clock increase for Pascal. If it happens, great. They surely deserve it if they can continue on with the excellence of Maxwell over two nodes.

    I never disagreed that Nvidia is far ahead in every metric on 28nm either. It's actually something I point out quite a lot. I do however believe it's far more complex than your reasoning of "better architecture" alone.
     
  18. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    It's just the logical next step. We already have 12GB for TitanX, 8 for 390 and 6 for 980Ti.

    How many times don you see peak BW achieved on today's GPU? How about 'never', because it's not something that people actively monitor. It will be just like this for GDDR5X.
     
  19. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Nothing to do with size ( i mean on a physical level ) ... why it have not been done before? thats plenty illogic
    ANd 32GB on FirePro .. 24GB and 16 on Nvidia Tesla ...

    With your last phrase, You have just just conclude to what i was saying... memory capacity will have more impact than the peak bandwith ... Double the memory ( 4 to 8 ) have never got a big impact so far on gaming gpu''s, somewhat, double the access bit to 64, GDDR5x solve this problem.. hence why i speak aboout memory size instead of bandwith ..

    with the same bus, with the same gpu's before having the 4 or 8GB was doing no difference, but now with GDDR5x, doubling the size of memory of a 4GB gpu will provide a difference. Even with a small bus ..

    And the question remain, if it was so easy, why it have not been done before ? There's no physical constrain, and we are far of the problem who can been made by using something like TSV and interposer with HBM..

    More seriously, im really impatient to test it on OpenCL raytracing for see the real impact. ( as memory bandwith, size, speed have there a real deep impact on performance )
     
    #939 lanek, Mar 24, 2016
    Last edited: Mar 24, 2016
  20. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    "There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe."

    How dumb of me to think that you were talking about 1.2GHz for Maxwell.

    By the standard of your previous post, you're showing some serious Nvidia fanboy tendencies there.

    The only thing you've brought up so far is plain old process maturity (which should have benefited GCN just as much.)

    What else do you have?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...