Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
The context is in the transcript I posted.

The downclock is to avoid having to design the entire power/heatsink to allow AVX256 all the time which is a substantial margin required (we don't know what MS does about this). The reason given for smartshift is to take "unused" power from the cpu which would statistically allow the gpu to peak more often, "to squeeze every last drop of power available".

This makes sense if we think about what happens if it doesn's implement smartshift. It's invariably better with smartshift because when the cpu is waiting on a bunch of cachenmiss or is in a less compute intensive part of the pipeline, the gpu can be signalled to use more power than it would normally be limited to, so it will stay at it's peak more often. Therefore "to squeeze every last drop".
I'm happy they are trying, and if you're right we can all benefit. I'm just not convinced the same cooling solution that was supposedly running into trouble with 2.0/3.0 fixed can now do 98% of 2.23/3.5 "almost all of the time." Unless the occasional hard drop below 2.0/3.0 buys you a lot of time at high clocks. That hasn't been my experience overclocking PC hardware.
 
That might be one way to make clocks more predictable, or perhaps given what we know about the power delivery for the Series X that means Sony's console may run at lower consumption.
There are also other ways to reduce consumption that are like throttling clocks, like instruction issue limits or warm-up periods. Duty cycling could keep the "fixed clock" even though it halts some activity on a routine basis.


We'd need to know more about the hot-spot properties of the chip and process, and the ability of the chip and board to deliver adequate power. The power-based modeling and Vdroop compensation AMD has can make some of these problems more readily handled, especially since temperature-based tracking can lag too much at these scales, whereas activity counters or voltage drop can highlight high consumption in a handful of cycles.


It's possible that Sony pulled an AMD and left voltages higher than necessary for many of its chips. In that scenario, having a dynamic clock method like AMD's can enable higher consumption than would be possible otherwise, since some of the chips that need the higher voltage could have failed more stringent limits. I think Sony would want a better handle on power consumption, but then again if their choice in CU count forced them to higher clocks--which the clocking method makes possible--then they might be worse off in terms of overall power than they would have been otherwise.



Cerny outlined a scenario where the question was whether they would bind the platform clocks to what they estimated was the worst-case consumption scenario. The kinds of operations and the overall utilization can wildly change power consumption, and the designers would need to make a prediction about all the software the chip would ever run, and hope they guessed right.
Should a design cap its frequencies at a level that is safe for some worst-case they cannot predict, and what can they do if they guess wrong. A clocking method that instead catches when they meet a high-consumption scenario and allows higher clocks at all other times would be tempting.
Could Sony be employing something similar to MS's Hovis Method where due to the silicon lottery each chip is has its power profile customized optimally for each chip? So in theory one person's PS5 could offer slightly different performance than another that requires slightly higher voltage to hit the same clock frequency. And going forward as yields and processes advance this gives Sony an opening where future PS5 chip revisions might be able to sustain that 2.23Ghz in most and eventually every scenario?
Getting into the conspiracy theory realm Sony could bin the most capable chips for PS5's they send to pixel counting reviewers like Digital Foundry?
 
That would be absolutely horrible for early adopters especially if a devkit offered better performance than retail units, so the devs think they're offering a great experience but a large subset of their consumers suffer.
 
@MrFox's post got me thinking.

Xsx's PSU is 315 watt, which indicates Xsx's TDP to be around 235W. This almost guarantees that the console is the highest TDP console in history, easily 35W more than the launch PS3s.

If you look at the TDP's that the highest end GPU's draw, they're usually 300-350w.

For a MAX console someday, we'll see a 400-500mm2 SOC, 300-350 watt TDP, 400-500 watt PSU. One can dream.
 
That would be absolutely horrible for early adopters especially if a devkit offered better performance than retail units, so the devs think they're offering a great experience but a large subset of their consumers suffer.
Good point. Maybe with the increased industry use of VRS and dynamic resolution scaling the difference might not occur in framerate. One person's PS5 running a game at 1800p while your neighbor's runs at 1700p might not be noticed and problematic to anyone other than the very hardcore or Digital Foundry.
 
So you're suggesting that PS5 is going to render everything in greater than 4K native and scale down to 4K?

No, I'm suggesting that >4K assets are still needed even in 4K or lower final resolution.

I think sampler feddback is related to texel shading.

Yep. It uses texture space shading. I think it was said in every SFS description or manual.
 
Last edited by a moderator:
@MrFox's post got me thinking.

Xsx's PSU is 315 watt, which indicates Xsx's TDP to be around 235W. This almost guarantees that the console is the highest TDP console in history, easily 35W more than the launch PS3s.

If you look at the TDP's that the highest end GPU's draw, they're usually 300-350w.

For a MAX console someday, we'll see a 400-500mm2 SOC, 300-350 watt TDP, 400-500 watt PSU. One can dream.
I believe it to be a mistake from DF. The lower current 5V rail is almost always shared. So it cannot be added to the total wattage. It's for USB and other 5v devices supply. The USB power today needs to be subtracted from the total as a fixed provisioning.

Edit: just watch it again, it's two 12v rail actually, I thought it was 300W@12 and 5A@5V, but no, so I don't know. There's no way to figure it out without testing it. They split it in two but without any 5V. It's a weird split, why not a single rail? Maybe conduction emission failed so this isolates?

The fat PS3 was higher in the peak consumption, it's still the king of max operating watts and had the correct 480W PSU for it. There are many early tests of some games reaching 230W, which would rise as the generarion advanced. It's the 40/80 that was 200W.

I know because I have a 60GB ps3 from launch. :runaway:
 
Last edited:
Sure if the game doesn't use the CPU much. But is that really going to happen in graphically intensive games ? What's going to happen if a game saturate the CPU at 90% most of the time ? Because from what I gathered ideally all 10GB of fast memory should be dedicated to the GPU.
The GPU is more bound by bandwidth and the CPU more bound by latency. I'm sure there might be some edge cases where Series X could be in a position where it's lower minimum bandwidth could be detrimental, but I think given it's higher potential bandwidth, it will be on par, worse case, with PS5.
 
No, I'm suggesting that >4K assets are still needed even in 4K or lower final resolution.
why?
Why not just front load > 4K assets. Downsample them using the best possible downsampling techniques to 4K and use those assets for your game to use.

Why leave ultra large textures to eat up room and waste compute and bandwidth and downsample in real time?
 
So many issues being conflated. Hopefully, some of you find this helpful.

1) There are other constraints that limit the maximum frequency a part can run at besides heat and power. Ultimately you'd be limited by the speed of the transistors and wiring delays. So the chip design and physical signalling components impose hard limits. Please don't think console manufacturers plan clocks around power supply and cooling solution budgets. Everything they do in that regard is within a predetermined window.

2) The PS5 implementation of variable clocks essentially ignores temperature. Sony has set a console standard, or "model", as they refer to it, where typical PS5's are placed in what I'd assume to be a sort of worst-case cooling environment and tested. They then measure the max power draw at which their cooling solution meets their reliability and acoustic thresholds. The result of this testing established a fixed maximum power rating for their SoC which is to be applied to all PS5's regardless of the actual temperature each might run at. So if person A is running their PS5 in a freezer and person B is in the desert, both will have the same power limit imposed.

3) Power draw changes with frequency and load. I think this is causing the most confusion. Processors (CPU and GPU), even when running at their maximum configured clock speed, will draw relatively little power if the majority of its execution units are idle. So even at fixed clock speeds, sitting in a menu doing nothing will draw less power and in turn, run cooler then while in a busy game scenario. Processors are massively parallel units, meaning they can execute lots of operations concurrently. For example, every CU in the GPU is technically able to carry out 128 concurrent operations each clock cycle. Applications, such as games, vary on how effectively they can utilize all the available compute components within a processor simultaneously. As programmers optimize their code and data so the processor can utilize greater and greater amounts of its available compute concurrently, the processing load, and in turn power draw, increases. At some computational utilization threshold (load), the application would reach the defined power limit. In order to be able to concurrently utilize all its available compute and still be under Sony's previously defined power ceiling, the PS5 reduces clock speeds to compensate for that load.

4) Sony has said, when the CPU and GPU aren't being fully utilized, they can run up to 3.5GHz and 2.223 GHz respectively. And that at higher loads, they will run at lower (as of yet undisclosed) clock-speeds. The only sense we have for clock-speeds in a hypothetical 100% load scenario across the entire SOC, are Cerny's comments that 3GHz on the CPU "was causing headaches" and 2GHz on the GPU "was looking like an unreachable target". As 100% utilization across the entire SoC in a practical sense is not possible just due to the inherent inefficiencies of real-world code. data sets, and compute requirements; the expectation Sony has is that they will run at or near the max clocks much of the time.

Ok my impression was rather that the smartshift solution was effectively a seesaw; you could hit 3.5ghz on the CPU or the 2.23 Ghz on GPU but not both simultaneously.

I don't think that the ratio of power transfer of one to the other is linear as some sort of step function so theres no way to calculate what a frequency change in one means to the other.

But I also didn't get the impression that there was any circumstance where both peaks could be either met or sustained at the same time.
 
Could Sony be employing something similar to MS's Hovis Method where due to the silicon lottery each chip is has its power profile customized optimally for each chip? So in theory one person's PS5 could offer slightly different performance than another that requires slightly higher voltage to hit the same clock frequency. And going forward as yields and processes advance this gives Sony an opening where future PS5 chip revisions might be able to sustain that 2.23Ghz in most and eventually every scenario?
Getting into the conspiracy theory realm Sony could bin the most capable chips for PS5's they send to pixel counting reviewers like Digital Foundry?

Sony specifically stated that the PS5 chips would behave identically to an idealized SOC. Every chip as part of its validation testing would be profiled to get per-chip values for the properties that go into how they rate silicon quality. How the units react to various levels of activity at all the points in the voltage/clock curve would be studied and used to calculate power consumption.
In order to keep things consistent, Sony would settle on a fixed set of costs and force all chips manufactured to act as if they had the same silicon quality. Chips with better performance would behave as if they were average. Chips that could not meet that average would be discarded.

Opting to vary PS5 performance based on the week of manufacture would damage its acceptance with developers and perhaps even customers. Developers couldn't trust whether their benchmarking was accurate, and might put in safety margins that negated anything Sony got from the clocking, and customers would be incentivized to not buy the console as long as possible, worrying that early revisions would be inferior.
If a later PS5 chip could hold max clocks all the time, it would be programmed to act like a chip from week one.

As far as using the Hovis method, which what little I've seen involves customizing board or package components to match the electrical variation of the chip:
If following Sony's early discussion of wanting the PS5 to have a fast transition from the PS4 and is a volume launch, tweaking every board and package is a hinderance and a cost-adder versus the more niche Xbox One X. While I do not know the exact details of Sony's method, given that it should be forcing its silicon to meet a constant target, I would be curious if altering the mixture of electrical components per-console would conflict with firmware trying to get a consistent result.
 
Aliasing. Imagine that the angle for your 4K texture to the camera is 10 degrees.
sin(10) is 0.17 which means your one texel is worth 0.17 of an on screen pixel now. You will need 5x resolution to stay ~4K on screen.
If you can't draw a triangle into that space, how would you texture that?
Seems like a pointless case to justify 5x the resolution of a texture.
 
If you can't draw a triangle into that space, how would you texture that?
Seems like a pointless case to justify 5x the resolution of a texture.

Err...okay. You draw a triangle, triangle is 4kx4kx4k pixels in object space.
With our 10 deg camera it's now 4kx0.7Kx0.7k but the u/v coordinates are still the same.
So, effect is the same.

But I also didn't get the impression that there was any circumstance where both peaks could be either met or sustained at the same time.

Obviously you can and will sustain these.
I'm not sure why it's so hard to understand though.
 
Last edited by a moderator:
Err...okay. You draw a triangle, triangle is 4kx4kx4k pixels in object space.
With our 10 deg camera it's now 4kx0.7Kx0.7k but the u/v coordinates are still the same.
So, effect is the same.
are you referring to a texture being 4096x4096 as a 4K texture and a 8K texture as 8192x8192?
Because I'm not referring to that. Though in retrospect I should have.

I was thinking that if you stood the closest you could to a texture with the camera and the resolution of that texture still maintained 1:1 at native resolution with no stretching, then there would be no need to go higher. i don't actually know what texture size that needs to be for that to happen though.

mind you, as texture sizes get higher in size it is murderous on bandwidth with Ansio.

You're suggesting PS5 run 8K and 16K textures respectively?
8192x8192 and 16384x16384?
 
Last edited:
I find it interesting that Sony is using the small triangle argument for having fewer CUs, especially given how the PS4 was theoretically more generous in its CU allocation than strictly necessary in order to give more room for compute. Even if the Series X has more challenges in filling CUs due to small geometry, wasn't that why AMD and Sony touted asynchronous compute and new methods of using compute to improve geometry processing? Then there's all the post-processing and non-graphics compute. Is Cerny's argument now that 36 is already more than enough, or did compute not really turn out to be that big a game changer for Sony?

Mesh shading should help improve saturation across a wider chip also.
 
Of course it is not just a specific PS5 thing, but it's different for PS5 compared to XSX because the latter has locked clocks.
For XSX you just need your chips to reach set clocks, they're locked there and total consumption will vary between consoles.
PS5 instead uses variable clocks and according to Cerny it's power determined, so each individual console needs to be calibrated relative to it's specific power consumption instead of absolute power or each console would clock different on different loads
Probably won't actually calibrate each console in anyway. The clock speed will be set based on the measured activity level of the CPU and GPU, which in turn was based on the power draw of the "model" unit at those activity levels. Power draw and heat will vary between consoles based on environmental factors and chip variations, but their clock speed and performance will be identical given the same compute load.
 
Last edited:
If that’s the case, then those are the types of workloads Sony is currently profiling with, which should increase the confidence in Cerny’s statement about the rarity of downclocking.
Sony's statements would be inclusive of profiling data for all games across its platform, all 4000+. And out of that 4000+, there may be relatively few "Doom Eternal" type games from a utilization perspective since we don't have those statistics. But the "rarity of downclocking" statement could align with the significantly greater numbers of Indie and low budget games released on the platform versus actual usage or playtime in games likely impacted by downclocking, which consists of more optimized, higher load, AAA titles.
 
Ok my impression was rather that the smartshift solution was effectively a seesaw; you could hit 3.5ghz on the CPU or the 2.23 Ghz on GPU but not both simultaneously.

I don't think that the ratio of power transfer of one to the other is linear as some sort of step function so theres no way to calculate what a frequency change in one means to the other.

But I also didn't get the impression that there was any circumstance where both peaks could be either met or sustained at the same time.

Unlikely, at 10% CPU and 10% GPU utilization, I would expect both to run at 3.5 and 2.223 respectively. At some utilization, they would each start to downclock. If the CPU is at a lower utilization, and the GPU is reaching its threshold, then smartshift can kick in and provide additional power to the GPU to avoid downclocking as soon. I question Alex's notion of developers actually having to choose a speed setting. It seems much more straight forward to simply allow the system to adjust dynamically in realtime and optimize performance in given scenes as normal. The only drawback there is mostly on the CPU side where certain game systems may expect a fixed clock. Perhaps that's the element they allow the developer to define.
 
Status
Not open for further replies.
Back
Top