General Next Generation Rumors and Discussions [Post GDC 2020]

Discussion in 'Console Industry' started by BRiT, Mar 18, 2020.

  1. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,718
    Likes Received:
    703
    Location:
    Somewhere over the ocean
    We will see in a year, but the wording was so vague and confused that for now, the discussion is splitted between who don't understand what cerny said, and who don't understand what cerny said but keep quoting it as if that is the explanation.
    Let's just wait.
     
  2. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,448
    Likes Received:
    10,119
    Location:
    The North
    Right so dynamic power equation is
    P = CV^2 f a
    Where f and V are directly proportional (need more voltage as frequency increases to maintain stability of charge)
    V is squared
    C is the capacitance and geometry of the gates, this doesn't change
    f = frequency
    a = activity change - some may consider this workload, more on this in a second.
    So the reason we see power efficiency nose dive as frequency continues to go up is because power will increase basically as a function of cubic requirements
    ie. 2x the frequency will now increase voltage 2x, but squared that, which is 4x. So you have 8x more power for 2x more frequency.
    Eventually you'll hit thermal limits for the chip due to parametric yield or cooling hits a hard wall (wattage per cm^2) and you can not proceed any further.
    The cooling requirements are very steep the higher power goes, the eventuality is that silicon will start passing the requirments for cooling a nuclear reactor at per cm^2. So cooling becomes a hard wall as well.

    So lets do some interesting math.

    Lets take PS5 at 2230Mhz -- and it's power to be Denoted as Ps5
    And another PS5 at 1825 Mhz (XSX speeds) --- and we can denote its power as Psx

    So 1825 Mhz is actually 80% of 2230. Or 4/5
    So lets plug in f as 2230 Mhz, and V is varied proportionally.

    Psx = C(4/5V)^2 (4/5)f a

    Or if we remove the constants Psx = 65/125 less power than Ps5. Or 50% less power.
    So by downclocking by 405Mhz, we extract a power savings of 50% on the same chip.

    Inversely then going upwards it is 5/4s faster at the cost of 5/4 more voltage squared.
    Cool so lets calculate what that means in power for the inverse
    95% more power, nearly 100% more power to go from 1825 to 2230Mhz.
    Let E = energy to execute, and T = time to execute --- Let PS5 be b and PSX to be a
    Eb = 2Ea
    Tb = 4/5Ta
    So PS5 is 2x the energy required from PSX, but 20% faster, 4/5s

    Lets try to use DVFS to rescale the power requirements of PS5 so that it has the same amount of power as PSX, and see if it's still going to run faster.
    I'm not going to show the math as it will take a while. But using DVFS to try to match the power level from PS5 to PSX obtains a ratio of 0.92. Which is less than 1. Meaning using DVFS it's still slower.
    So this is proof that the boost is not a neutral position and it's heavily overclocked with respect to its own power curve.

    So back to the original formula.
    P = C V^2 f a.
    The big thing is RDNA2 improves performance per watt by up to 50%. This is getting people to napkin math 50% better clocks but it doesn't work like that.
    Using the dynamic power formula, your F and V are proportionally locked such that P is proportional to f^3.
    leaving C, the capacitance and geometry of the gates as variables for change. SInce we know they aren't changing the gates, that leaves activity level.
    or the amount of switching from 0s -> 1s and 1-> 0s.

    If AMD has found better ways to use less power in their chip for certain activities, it can score up to 50% power improvement.. so for specific tasks (1/2)a
    But for other tasks, that will light up everything, it will it likely be a. So the improvement will vary based on activity between 1/2 to 1.0 proportionally.

    Which brings us back to what Cerny was talking about. The idea of locking power output as part of PS5, leaves voltage and frequnecy and C locked, leaving (a).

    Developers must find algorithms that will use less power, or better put, less power per core. So can we parallelize the algorithms over more cores such that each core is doing less operations (thus less power) and therefore not going over the power budget.

    You see, wide has huge power implications if you can parallelize your work. This is why we have multicore processors.
    Using the dynamic power function again:

    A core with 4 Ghz frequency vs a core 1 Ghz frequency
    • The 4Ghz frequency will complete its work 4x faster than the 1Ghz core.
    • But at a rate of 64x more power, this is because of the cubic relationship between power and frequency.
    • But if you make 4 cores at 1Ghz, you can complete the work in the same amount of time, but only 4/64 the amount of power vs the 1 4Ghz core. So you're looking at 1/16 of the power for savings.
    This is why as we approached a Megahertz ceiling we started going multi-core. It saves tons of power.
    And in the same way, if we write algorithms that go parallel instead of single threaded, we will also save tons of power, this will allow the clock rates to keep high on PS5.

    TLDR; the clock speed is very high, calculating for DVFS, it is still burning more power to keep that that clockrate than the chip should provide (within the architecture). It did not fix it. It just allowed them to market a higher number.
    This power curve could affect parametric yield on chips, could affect the amount of cooling required, and it will likely have to be pulling additional power from the CPU to maintain it's clockrate as per their original statement that their design could not hit 2Ghz fixed.
     
  3. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    3,475
    Likes Received:
    2,169
    Location:
    France
    What about a furmack test for CPU + GPU ? Because this is that kind of test Cerny would be using for a sustained 2ghz.
     
  4. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,448
    Likes Received:
    10,119
    Location:
    The North
    I don't know what the quantities Cerny chose for his power limits were.
    But, yes, I would gather your assumption is a fairly safe bet I think (at least along the lines of thinking). You set activity level to maximum and you see where your frequency ends up as a result of putting a cap on your power limit + cooling.

    I don't know what the activity level is for furmark. But i suppose if you set on the power equation of
    a = some substantial value as a function of the number of transistors, and number of operations possible (say all copies to really force it to go to 1->0 continually). You can sort of figure it out without furmark.
     
    PSman1700 and BRiT like this.
  5. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    315
    Likes Received:
    272
    Location:
    Norway
  6. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,488
    Likes Received:
    15,941
    Location:
    Under my bridge
    No-one (here) can answer that. It'd just be a complete guess (though I wouldn't be surprised to see some people guessing some pretty low figures. ;))

    We're told by the guy in charge of designing the thing that it maxes at 2.23 GHz and spends most of its time at or near that. 'Game clock' could be anything from 2.2 to 2.1 to 1.9 to 1.7, and there's absolutely no way of knowing short of having a devkit or being involved in the creation of the thing.
     
    sir doris, BRiT, PSman1700 and 2 others like this.
  7. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    315
    Likes Received:
    272
    Location:
    Norway
    Regarding the talk of giving the developers power budgets for the PS5 feels a little bit weird, isn't likely they just go for the highest budget and if they don't max it out they will just have the extra headroom?
     
  8. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,448
    Likes Received:
    10,119
    Location:
    The North
    Developers will just use what they need. They will program to stay under the power draw limits as much as they can or at least where they can. So moving away from simple loops to parallel random accessing of said memory, changing the way we may approach sorting etc. A bunch of different techniques that could be enabled to spread work over as many cores as possible, to keep the activity level low and thus the clocks high.

    Then you have headroom for times in which you need to light up all the silicon.
     
    sir doris, PSman1700 and Barrabas like this.
  9. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    315
    Likes Received:
    272
    Location:
    Norway
    When you keep the activity level low doesn't that mean also keeping clocks low?
     
  10. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,448
    Likes Received:
    10,119
    Location:
    The North
    No. Activity level is basically the measure of 1s becoming 0s and so forth. Frequency is clockspeed.

    if you can target your code to specific areas of a chip, then the chip will shut unused gates off, or the value won’t change from 0 to 0 for instance and thus lowering your Activity level.

    a popular example is that AVX 256 and 512 instructions have extremely high activity level, it’s using every portion of the chip at once so there is a lot of movement and changes happening.
     
    #1450 iroboto, Jun 10, 2020
    Last edited: Jun 10, 2020
    sir doris and Barrabas like this.
  11. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,427
    Likes Received:
    5,836
    Yep. He also said the max clocks of the cpu and gpu were chosen to keep an identical thermal density across the die. So we have some rough ballpark comparison if we can get some numbers on zen2 at 3.5. We also know they limited thermal density against avx256 which brings a big question about xbsx thermal density and how they dealt with avx256 at a fixed clock. So it seems Cerny have taken all decisions to keep thermal density as easy to deal with as possible, in addition to the dissipated wattage of the entire design being a known value that will not change. This is an engineer's favorite situation for designing a cooling system.

    It still doesn't tell us anything about what they decided on the wattage limit, nor do we have any data about rdna2's better efficiency, or how much further AMD pushed the frequency knee up. Are they close? Behind? Above?

    The voltage/frequency chosen also depends on the worst CU of the bunch. The more CU they have, the worst it will be. They can equally push it high to get really good yield immediately, or be agressive and lose a bit more at launch which would pay off as yield improve since the cooling/psu would be less expensive for the next couple years.
     
    egoless, disco_ and Barrabas like this.
  12. RDGoodla

    Regular Newcomer

    Joined:
    Aug 21, 2010
    Messages:
    496
    Likes Received:
    132
    I asked the same question in another thread.



    My understanding is that:

    1. PS5 has quiet efficient cooling system. It can sustain high GPU load.


    2. Let's say PS5 can sustain 2.23 GHz with 60% of GPU load (an arbitrary number).
    When GPU load increase to 70% and CPU can't offer more power, the GPU must downclock to
    save 1/7 power per transistor.

    However at 2.23 GHz the power curve is quite nonlinear. So if we want to decrease the transistor
    power by 1/7, we may only need to decrease the frequency by a very small amount. Maybe 5% or so.



    But there is a contradiction:
    Since PS5 has very efficient cooling solution to sustain high GPU load at 2.23 GHz, why couldn't GPU reach fixed 2GHz clock?


    I guess there is some limitation on the circuit level. Maybe there will be some problems
    if the GPU suddenly has 80~90% load even for a short period of time.
     
    PSman1700 likes this.
  13. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,091
    Likes Received:
    952
    Location:
    Earth
    There wasn't enough power budget allocated for GPU to do that. Allocating statically all that power to be fed to gpu would have gimped cpu. Hence the dynamic load based allocation of power. Sony wanted a fixed max power consumption so they can design their cooling and power supply around that. It's much easier to design cooling if you know it's 200W than if you are guessing it's 200W but then some game comes out and pulls in 250W causing all kinds of problems.

    Of course sony could have added beefier power supply, beefier cooling and likely bigger box. Sony chose to make a compromise. if that compromise is good/bad/neutral we will see once some games are out. Of course BOM also is likely important and we have to wait a while until we see where BOM estimates land. Might also be that there is some consideration on how the solution price reduces over time.
     
    jgp, Xbat, DSoup and 2 others like this.
  14. zupallinere

    Regular Subscriber

    Joined:
    Sep 8, 2006
    Messages:
    736
    Likes Received:
    86
    One place where you can actually skimp a bit since going above a certain power level isn't an issue.
    Supposedly the difference in cost isn't that much but maybe every little bit helps.
     
  15. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,427
    Likes Received:
    5,836
    The reason given by cerny was about designing based on unpredictable future requirement as the generation age, or devs changing the access pattern in a way that couldn't be predicted accurately. The don't need to overbuild for potentially unused margin, and they don't need to lower the clock for safety margins either. Fixed clocks required both in the past. It's right there in the presentation.
     
    DSoup likes this.
  16. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    12,477
    Likes Received:
    7,724
    Location:
    London, UK
    It will be interesting to see analysis between race-to-idle (ramping down then up compared to clocks scaling back and up vis-a-vis relatively small workload shifts. Race-to-idle generally does mean zero work, variable clocks mean savings when there are more fine grained drops in demand for CPU. Mark Cerny mentioned power draw for more complex instructions.

    The whole shift to planning for maximum power draw will be interesting to review in four years. More complex instructions do consume more power but they also require more CPU resources so you're generally processing less of then. Will this balance out in power terms? ¯\_(ツ)_/¯.
     
    iroboto likes this.
  17. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,031
    Likes Received:
    5,576

    What does "60% GPU load" mean?

    60% occupancy rate in the CUs' registers? 60% of the CUs' ALUs working at the same time?
    The GPU isn't made of CUs only. It isn't a CPU.

    Even if you did mean 60% of total ALUs, that wouldn't represent total "load", nor power consumption which is what drives the PS5's GPU clocks. Power consumption varies on the type of instruction the GPU is running, and it doesn't even have anything to do with visual quality.
    The GPU power viruses like furmark are super simple in visuals, but AFAIK they simply don't allow the full pipeline to be used by forcing the same power and heat-sensitive zones to cycle their activity non-stop.



    For example, in the Road to PS5 presentation, Cerny mentioned that geometry intensive scenes were the ones that required less power, and that low-triangle scenes were the ones that pushed the most power from a GPU. Power in watts, this is an objective measure, not an abstract one.

    Geometry intensive scenes is in line with Epic's direction for Unreal Engine 5, and is probably a sign of what to expect from at least Sony's 1st party games BTW.
    My guess is the next-gen Decima engine will look a whole lot like Unreal Engine 5, considering Cerny's words about 1-triangle-per-pixel being both visually and power-effective.


    I wonder if there is a game clock, or even as much as a base clock. From Cerny's words, developers could implement a power virus that is capable of driving GPU clocks down to 1.5GHz or less, and throw lots of AVX256 instructions that drive the CPU clocks down to 2.3GHz or less. They just wouldn't get anything of value out of it.
     
    milk likes this.
  18. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,048
    Likes Received:
    1,128
    Location:
    Treading Water


    Sony won't be dissipating the most heat.
     
    AzBat and Tkumpathenurpahl like this.
  19. AzBat

    AzBat Agent of the Bat
    Legend Veteran

    Joined:
    Apr 1, 2002
    Messages:
    6,519
    Likes Received:
    2,602
    Location:
    Alma, AR
    I'm hoping it comes with some games like the Burger King ones. Can't wait for Sneak Colonel.

    Tommy McClain
     
    egoless and AlphaWolf like this.
  20. AzBat

    AzBat Agent of the Bat
    Legend Veteran

    Joined:
    Apr 1, 2002
    Messages:
    6,519
    Likes Received:
    2,602
    Location:
    Alma, AR
    While you guys are debating black or white consoles, I'm over here thinking "why stop there"? Love the Xbox Series X design as it gives a huge canvas to make it your own. Why live with what the manufacturer thinks is best?

    https://twitter.com/XboxPope

    Post a few of your favorites.

    I love anything Batman related...





    Tommy McClain
     
    PSman1700 likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...