Grouping draws by state on the CPU only lets you amortize the cost of the API call itself if we take a look at the early days of explicit APIs when they would show tech demos on pushing over a million draws on the GPU ...Is that sorting not possible CPU side today?
Grouping draws according to render states on the GPU could potentially let the driver perform runtime shader linking optimizations hence making "PSO switching" cheap. Instead of having to recompile the exact same set of shaders from previous PSOs, newly formed PSOs with similar render states as before can 'stitch' together shaders from different PSOs that were compiled before. Think of it as having an "implicit form" of Vulkan's graphics pipeline library extension functionality without the ability to explicitly define what render states to change ...
On Xbox, PSOs perfectly match the description of the hardware's render states and it's native shader bytecode which makes it trivial for them to implement GPU driven render state changes (indirect command for PSO switching) via ExecuteIndirect ...