Just wanted to come back to this to suggest it could again simply be about power. 50% / 33 % less work being done should surely mean a somewhat corresponding drop in power used and heat generated in those areas. It might make Sony's strategy of boosting less likely to see huge drops due to AVX operation.
Going by the instruction profiling, it's not just AVX. The FPU is half as effective at 128-bit and 256-bit code, hence the same performance drop in SSE operations.
Vector loads tend to stress AMD's boost speeds the most on the desktop, so power reduction would seem to be the motivator. However, whether this needed such a significant re-plumbing of the FPU points to a very significant constraint, like the GPU leaving an unusually limited amount of power for the CPU section.
Microsoft didn't resort to this, and promises consistent clocks with the apparently standard Zen2 FPUs even with higher clock speeds.
If there's ever a salvage SKU for that, perhaps we can get similar profiling to see if it's really that consistent or other less drastic methods were used to limit power, like instruction issue throttling or duty-cycling of the hardware.
Why something like those measures wouldn't be good enough versus a thinned custom FPU is a point of curiosity for me.
Perhaps AMD's method isn't consistent enough for a fully-featured vector FPU for what Sony wanted for its model SOC, or that power ceiling is notably constrained even against another console APU.
There may even be hints in the die shots that this cut happened during development of the PS5 APU. I think that another one of Nemez's tweets perhaps shows this:
"The full featured Renoir CCXs would only be margin-of-error larger, they would probably fit without major issues or redesigns."
I think, quite possibly, that PS5 started out with full fat FPUs but moved to these skinnier units later, and the footprint is still there. PS5 was probably deep into development and tons of layout work had already been done at this time.
Maybe that's the case, since there may have been at least one notable revision in the PS5 validation hardware leak, with no clear indications as to what was changed.
Another is that Sony may have only paid for a revamping of the FPU, and if AMD kept the rest of the core and CCX with the same layout, there's going to be spare space.
Lets say Sony were at the point of trying to balance performance, area and power with a given set of technologies. The cuts are probably nothing to do with area, and they're actually costing performance (in some areas), so that'd mean the gain was in the peak power they could consume. And that could be benefit maintaining boost locks across the rest of the system.
I'm holding out for more instruction analysis at some point. The cuts are pretty significant even outside the 256-bit realm Cerny mentioned.
Sounds plausible.
My guess would be, that they cut out some of the 256-bit units, because they are not often used but draw much power when they are used.
The 50% loss in SSE points to removing whole ports and the ALUs on them. However, doing this would require rebalancing the units on the remaining ports, as I don't think you can cut one or two ports from the Zen2 FPU without needing to put some functionality on other ports that would be lost entirely, or would lose more than 50%.
The vector division benchmarking so much slower is a sign of potentially other hardware changes in the unit, since AMD's FPUs only have one port for that.
I think this is exactly this. Because with 4 ports of FPU too much power used in a short time would maybe create a drop of frequency (that would impact the whole CPU). So I think the idea is to force developers at doing the same job but slower using 2 ports ideally without dropping the frequency. As 3dilettante wrote the very robust cooling should be enough to take care of heat density.
Which leaves me to wonder how much more generous the Series X power budget is for its Zen2 FPUs, or if they did something else to constrain consumption. They're promising constant and higher clocks without a liquid metal TIM.
Well whatever they're doing I agree it's got to be because of power. Zen 2 is one 256-bit unit per core, so I don't think they could have cut out any of the FPUs as such, but limiting the ability in some other way would physically guarantee lower power demands in some other way. I like the port reduction idea because I don't think it would cause a complete redesign of the entire unit, it would be more like selectively removing duplicated elements. Plus you'd still be left with additional room for any small layout changes (I guess).
I think Zen2 has more than one 256-bit unit. Depending on the instruction mix, it could go to 4 256-bit operations per clock. A 50% drop from that is still 2 256-bit operations per clock. The 50% drop in SSE points to losing whole units, and probably needing a re-balance of what's left.
Could there be some kind of hardware decompression unit in PS4 that's been removed or bypassed in PS5 due to it being superseded? Some kind of single threaded CPU fallback on PS5?
The PS5 has a superset of the PS4's compression support. Perhaps a conservative emulation of the low-level functionality or APIs is going through extra steps, or the backwards compatibility leads to a thicker container or worse data layout than native?
Correct me if I'm wrong but does that mean if developers start making heavy use of AVX instructions in game engines then PS5 will potentially perform worse than XBSS/XBSX/PC?
The raw numbers for non-AVX are substantially worse than similar Zen2 CPUs, not going into other things like higher memory latency and smaller L3 cache. Zen 3 is another class entirely in terms of FP performance.
There are some indications of CPU-limited scenarios where there is sometimes a modest shortfall versus the Series X, but it's not something that shows up as consistently as the FPU numbers would indicate.
There are other bottlenecks that both consoles would have, but we may need to keep an eye out for later games that could push AVX or non-AVX vector throughput in a way that's more obvious than early titles.
''Intel AVX-512 can accelerate performance for workloads and use cases such as scientific simulations, financial analytics, artificial intelligence (AI)/deep learning, 3D modeling and analysis, image and audio/video processing, cryptography, and data compression.''
It seems that while AVX(512) hasnt been used in games all that much but it sure can assist in certain tasks that seem applicable in future modern games.
AVX 512 is unlikely to find much use in games because AMD flat-out doesn't support it and Intel does not consistently implement it in consumer hardware (or even its server hardware for that matter).
If they started pushing FP256 instructions a lot, then yes it would, because the CPU cores would start throttling down heavily.
Realistically they won't, because Sony knows how often these instructions come up and that's why they probably used density-optimized transistors on those blocks.
I'm not sure about the density-optimized transistor claim, or rather I'm not sure if there was an additional tier of high-density transistor beyond the HD process AMD utilized for Zen2 already.
The math shortfall in 256 and 128 bits points to wholesale removal of hardware, which saves in ALU area, wiring for fewer ports, and smaller register cells because they don't need as many bit lines due to the cut in ports.