I feel like I'm falling on deaf ears.
If there is no impact and doing this was fine, we'd see a lot more of gtX550 situations and Microsoft would see no reason to explicitly say this at all.
If this is fine there would be simply four banks at 56GB/s each and we would just add them. the fact that they didn't tells.
Hypothetical situation:
chips A B C D
A is 4GB, other three are 2GB.
Situation A: I left 1.25GB in each of A,B,C, and D.
That's 5GB of data I'm trying to read across 4 chips.
time used would be 1.25 units, and my effective bandwidth is 4GB/unit of time
Situation B: I left 2G of data in A, and 1GB in B,C,and D.
That's ALSO 5GB of data across 4 chips I'm trying to read
Time used would be 2 units, and my effective bandwidth is 2.5GB/unit of time
My effective bandwidth here is 62.5% of what I would have in situation A normally.
Of course you can say that I can find work for the other chips to do while I read A but that's, again, explicit planning you'd have to do.
Of course the best solution is to avoid 2GB at all costs.
Any utilization of the 2G at the same time of the 8G will lead to bandwidth contention because physically the channels for the B C D chips won't be able to access chip A, and by creating an unbalanced workload across 4 chips will create extra inefficiency on top of the usual situation.