Yes, the current gen ones.
Kinda wish it was 8 ACE's 64 ROPs, but I understand price points.... and if you have to cut something that makes alot of sense.
The Fury diagrams initially had 8 ACE blocks, then 4 and 2 HWS blocks after AMD got around to changing them.
The number of queues in memory and in hardware do not need to be 1:1 if that is enabled.
Polaris at a high level seems to inherit a number of other items from Fiji (or Carrizo?) like adaptive clocking and enhanced caching of vertices for improved instancing.
Interestingly, TrueAudio's dedicated block appears to be gone, so AMD appears to be more confident in its latency and synchronization than it did when the block was first introduced. Being able to reserve CUs ahead of time in order to ensure they are free for rapid wavefront launch seems like it is what provides the latency guarantee that was sorely lacking then.
It was something I spitballed as an idea back for the PS4. If Polaris does turn out to be in the PS4 refresh, we can look forward to the return of the 14+4 CU speculation, although it would possibly be 28+8 this time.
As an aside, it does seem like AMD is indicating Fury's delta compression did not turn out to be as effective as Maxwell, with Polaris apparently now in that range of effectiveness. Possibly, the sheer amount of unused bandwidth made it difficult to make the efficiency improve.