I agree with that and the rest of what you say, I just can't think of any significant factors.
The only other thing I can think of is that JuniperX2 will be the $200 product. I think a $200 AFR board would kill sales.
Overall board cost can't change substantially as a 256-bit GDDR5 bus isn't going anywhere. So we're left with power-regulation and cooling - which do promise to be cheaper than HD4850, I guess. But, offsetting such a significantly larger die?
Well, in theory the smaller the chip the more bloat the API-improvements add as a percentage of the overall die, if you work on the assumption that the ALUs don't have much (if any) API-improvement cost, and that the smaller dies have less percentage area of ALUs.
We know 32KB shared memory is coming, so that will add a few percent to the cost of a SIMD. A core concept of D3D11 is getting data into/out-of the ALUs by non-TEX/RBE means (or, if you prefer, "these aren't texels coming in and they aren't pixels going out", they're gather/scatter operations). Some of that will scale proportionally with ALU count, some will scale with MC count and some will be just the "new feature bloat". Scheduling the additionally kernel types, HS and DS, adds cost too, a cost that has a price of entry as well as a scaling element
Historically ATI GPUs offer reduced per-unit performance on the smallest GPUs, e.g. reduced capacity for hierarchical-Z, reduced MSAA performance/capability (e.g. no 8xMSAA), no double-precision, reduced ALU:TEX - in a bid to cut fat, and I guess offsetting API-bloat as well as general architectural entry-costs...
Jawed