36 CU seems to be very tied to backwards compat.
I'm still unsure of a reason for why it would have capped the CU count. However, it seems reasonable that backwards compatibility is one possible reason why they didn't have fewer.
Honestly, adding this complex boost mode / throttling for some "small" clock adjustment wouldn't make any sense...
The PS5 is on the edge of 10 TF, which a more conservative approach would have dropped below. A downclock of more than 2.5% on the GPU drops it to 9.x.
Primitive shaders are mesh shaders.
Vega's whitepaper proposed a number of future directions that primitive shaders could take, although that generation saw them take direction of being discarded.
Navi has something like the automatically generated culling shaders, although the numbers given fall short of what Vega claimed.
Some of what Cerny discussed might be along the lines of those future directions for Vega. Mesh and primitive shaders do exist in a somewhat similar part of the overall pipeline, but without more details the amount of overlap is still not determined.
By doing last minute chip revisions and clock increases.
Last-minute in this instance would be last-month or last-quarter for chip revisions. That process has long lead times. Just reaching for 10 TF might have been on their mind for longer.
What is disappointing is that only 100 PS4 games will be playable at launch (with no guarantee that others game will be available, only promises).
Seems like they haven't had the resources or the time to test broadly enough. This strikes me as a process that might have been waiting on final silicon or more recent steppings, plus any other components needed like the SSD for full testing. These would be the kinds of tests where the real hardware is needed for appreciable accumulation of testing hours, and those need time and sufficient testers/hardware to accumulate.
As far as a sample goes, going by most hours played isn't really a statistically random sample, so it may not be fully representative. A random sample of 30 games and their overall rate of issues might be a decent indicator of how many games could be expected to work out of the box. Maybe some games like Stardew Valley and Anthem need to be profiled (two games known to hard-lock PS4s and PS4 Pros).
It's not, Cerny just said that to drop 10 % power you just need to drop couple (or was it few) % clocks
That sounds like it's past the comfort zone at least a bit for the hardware.
What will likely happen is that devs will set clock speed targets and stick to them and not rely on boost. This is ideal as you want to deliver a consistent experience across the range and don't want variable based support issues.
While Cerny claims they're using the power as a limiter, it's not that simple. There's no hiding from heat. As the load builds up and heat builds up, you end up drawing more power and needing more voltage to maintain the same frequency. This never scales in your favor due to the relationship.
The boost algorithm probably drops the skin-temp or thermal capacity calculations from AMD's turbo algorithm, which is a likely source of much of the variability. The current-based monitoring and activity counters AMD uses would be the most likely originator's of Sony's idealized SOC, which would be a somewhat conservative model based on the physical characterization of produced chips.
There would be localized hot-spot modeling, but at that time scale overall temperature likely of second-order importance to the algorithm versus the current and activity that's causing a few units to experience accelerated heating in the space of a few ms or us.
I think Sony would need to specify a cooler with a thermal resistance and dissipation capability that is sufficiently over-engineered to let the boost algorithm neglect temps outside of thermal trip scenarios.
Well it saves some money on storage too...not sure how much 175GB of NAND costs...
There would be 50% more NAND chips, though they would be of lower capacity each. Downside for capacity is that the next increment might not be reachable without bumping the capacity of all those chips together.
I do wonder if this will be annoying for a developer to juggle.
You're already having to fit in as much as you can into your game, leveraging your hardware to the maximum. Then when you finally do get the most out of the available cycles of CPU or GPU, you suddenly take a hit and you're playing a weird balancing game.
The algorithm should be more stable than the twitchy mobile or Radeon algorithms, but I'm curious if there are certain complexities in benchmarking performance based on instruction mix, like if certain events or assets might spring a bunch more AVX code rather than total unit counts.