I think we should give up on the idea of ROVs altogether and that's AMD's opinion on the matter as well ...
To get even remotely acceptable level of performance on an immediate mode GPU architecture, it would involve storing/tracking the entire framebuffer/render target state in hardware which would mean implementing a lot of dedicated on-chip memory to store the entire framebuffer/render target. The other option is designing a tile-based GPU which will automatically come with a small amount of tile memory but I don't think the architects will find that to be an acceptable solution either since it would mean executing duplicated vertex shader invocations or potentially starving the amount of work on the GPUs shader execution units. Tile-based GPUs died out a decade ago on the desktop space for very good reasons ...
Just to give you an idea, two 1080p render targets consisting of the colour+alpha (32 bits/4 bytes) and the depth (32 bits/4 bytes) would total out to 16.588MB worth of memory which is over 4x bigger than Navi 10's L2 cache. That's not even counting the stencil bits, MSAA case, higher resolutions, or needing multiple render targets/more bits per-pixel either. You'd have to spend enormous amounts of die space to make a robust solution for ROVs which could be used for better used elsewhere ...