This confirms 48 ROPs. Either they are "uber" or great efficiency leap happened.
I'm not confident enough in my interpretation the diagram to be sure which block would be the ROPs, but it may still be 64 and consistent with the Linux patches.
The CUs seem certain enough, although the way the shapes are divided might be consistent with a changed orientation. The CU is one long rectangle book-ended by a pair of rectangles towards the center and three rectangles towards the outside.
The rectangles further out seem to be one for every two CUs.
One possible interpretation:
The three rectangles in the CU are the L1, filtering, and l/s blocks, and this time they are not arranged along the center line of the chip. The outer rectangles above and below the CU arrays could be the shared front ends, this time shared between two CUs.
The RBE sections are the long bars on the right and left. The ROP sections seem to be more variable in layout, possibly to most efficiently pack them in the outer margins and around other miscellaneous units.
That would put the L2 as the two 8x3 arrays below the main portion of the GPU, and presumably above the blocks dedicated to the HBM interface and controllers.
The 8 columns in that case might pair up with the tile_pipe values listed in the Linux patches for Vega (the 8 marked with a ???). If the picture's 3 rows are accurate, that may mean a non-power of 2 associativity, or perhaps some other change.
The possible HBM interfaces have 12 columns each, although how much of that is in part due to oversimplifying some of the surrounding silicon around the PHY is uncertain.
The more pronounced spacing that cuts the CU arrays into 4 quadrants might be consistent with the need for connectivity to the L2 at the bottom for the changed layout of the CU caches and the ROPs on either side. Polaris' artistic rendition had a pronounced division along the center line, but that was before the DRAM and L2 was shunted off to one side and the ROPs hooked into the L2.