There is only really one person theorising that PS4 Pro has 64 ROP's..and from what I can tell it's based on looking at the DF findings and working backwards to fit the theory rather than definitive facts or evidence about the hardware itself. I don't really have a problem with it as that's a lot of what goes on in this subforum. But as far as I can tell no one from Sony has stated how many ROP's PS4 Pro has..
I'm not sure what specific findings would work back towards the ROP count, but when I've speculated on the Pro's hardware it's from Sony's description of how it transitions between compatibility and Pro mode by activating half of the shader engines.
That pairs with GCN's hard-wired association of ROPs to their shader engine. A succession of GCN architectures and some of the Vega patches for its graphics hardware show a continued association between shader engines and their RBEs.
If the Pro has 32 ROPs with everything on, turning half of the GPU off doesn't standardly lend itself well to having 32 in compatibility mode. I'd imagine that would be noticeable for some legacy titles. There's no iron law that the design couldn't take special measures to change this behavior, just a question about what choice Sony made in a range of possible solutions.
It'd be straightforward to just go with doubled hardware, since Sony's gone ahead and doubled things anyway. It might be something of a waste, although even if bandwidth weren't changed there are some specific scenarios where the doubled hardware and additional caches could be used without immediately hitting the memory bus, or the additional paths could be re-purposed if Sony desired.
However from what I've seen posted by others elsewhere ... its from a single test in a specific scenario (discard mode) that processes at 64 pixels/clock. It's not able to hit those nunbers in a write test. The memory bottlenecks to lower performance.
I suppose the question is whether any tests got results at least substantially above 32/clock, as there tend to be other utilization barriers even in optimal cases.
There are ways of getting more out of the ROPs than memory might allow, some of which have been discussed in other architecture threads. Going back a ways, there were tests done showing particle effects could be ramped up beyond memory bandwidth limits with careful tiling of the pass to fit the small color caches. The ROP caches would miss/evict less often, while the full internal bandwidth of the ROP caches would be in play. It's a possible optimization that would be less sensitive to the modestly improved GDDR5 bus or variable benefits of compression (DCC should interact at the miss handling portion of the pipeline). Having more RBEs means there could be more pixel tiles being evaluated.
On the other hand, many ROP-based methods have lost ground to compute-based solutions, which would seemingly favor an L2 that can host more tiles or thrash less often. If my understanding of the Orbis GPU L2 is correct, that's where Neo would have a shortfall versus Scorpio. It would be an area where the two architectures might have different inflection points.