The uop cache is effectively a cache for one trace. If anything it validates the trace cache concept, - massive issue width while consuming less power.
I don't see it as having any traces. The contents of a trace cache reflects the execution path taken by the processor. If code is stored contiguously in memory as AA if BB else CC, the trace cache may contain AABB and AACC or various combinations of trace fragments.
The uop cache is a post-decode cache that has a fixed relationship to the linearly addressed Icache, and it primarily validates a benefit for a chip running a complex ISA. To a limited extent, it gathers some low-hanging fruit for the original purpose of a trace cache--solving the problem of discontinuities in the fetch stream compromising superscalar issue.
The l1 instruction cache might be quite linked to the whole frontend, but I don't think this poses a fundamental problem for increasing cache size or associativity. Maybe they weren't able to increase associativity while not increasing latency though.
The memory closest to the execution pipeline has less leeway in terms of how it impacts cycle time and the stages in the pipeline that deal with fetch and prediction.
The TLB and tag check logic would be altered if the ratio of tag and index bits changes.
One possible, if unlikely change, would be to significantly change the associativity or reduce capacity so that it matches the size/associativity ratio of Sandy Bridge.
This would eliminate the aliasing problem entirely, and discard a portion of the cache fill pipeline used to invalidate synonyms.
That said are you suggesting AMD is going to stick with 2-way l1 instruction cache associativity for the next 5 years or so?
AMD promised little more than increases in some buffers and minor changes like new instructions for Piledriver coupled with improved clocks at a given power level, and that's what we got.
Some reports say that more change is in store with Steamroller, more so than was promised with Piledriver.
Since Steamroller is also meant to be on a new non-SOI node, more changes could be in the air because various parts of the pipeline will need to be adjusted anyway.
Because tweaked Bulldozer is all that's on the roadmap (well apart from the low-power designs).
fwiw l1i associativity for core2 was 8, 4 on Nehalem and now back to 8 for Sandy/Ivy Bridge. Clearly these things can be redesigned.
Sandy Bridge is a bit more than a tweaked Nehalem.
Bulldozer to Piledriver is something like the SB to IVB transition, without the node jump.
AMD OTOH stuck with 2-way 64KB L1 instruction/data caches for forever (since K7 days - K6 also had two-way caches but only 2x32KB).
The size and associativity haven't changed, but the BD L1 is physically different and it no longer serves as part of the branch predictor.