Yeah, and its really starting to frustrate me that so many people still believe there will be: huge 4 billion transistor GPUs with 8GB of memory, 700GB bandwidth, and all these ludicrous specs.
I agree that a lot of speculations are unrealistic.
1) the Wii will have an impact both on costumers and stock holders.
2) R&D per mm² goes higher as you cram more and more transistors per mm²
________________________________________________________________
Going with your estimation (~400mm²) or mine (300-350mm²) using a 32nm process , manufacturers should have between 2 and 2.5 billions of transistors to play with (I use a gross approximation based on information Alstrong provide 150mm² ~1 billion transistors).
Ms could use a derivative of xenon there is a lot of room for improvments if I'm to believe some comments here.
I don't think Ms would need to use more than four cores, the focus should be put on making these cores better. There is a lot of opportunities here as xenon is not rumored as a peformer...
It could be made through more cache, faster caches, better cache hierarchy (something closer to cache organisation in phenom or nehalem).
Implementation of an OoO engine
Better branch predictor
Wider simd units.
Fix some broken implementations ( I remember reading that some stuffs were mostly broken her, like some data coming from the altivec/fp pipeline have to go in cache to be availlable to others execution units, same for L2 cache trashing).
________________________________________________________________
It could look like:
Xenon II:
4 cores @3.2 GHz
64KB L1(data + instructions) (128? insight welcome)
4x256KB L2
2MB of L3
SMT support for two hardware per core, may be in an improved manner.
OoO execution
able to issue three instructions per cycle (against two actually)
better branch predictor
256bits wide reworked altivec units (does reintroducing integer support would helps for some tasks?)
This would be <500millions (slightly more than twice actual xenon)
Xenon is~170 millionsof transistors, it's safe to assume that this CPU would be way better than xenon, while being "pretty" tiny and easy enough to cool.
It would actually like a toy in regard to a 2009 Nehalem
and lots will shout this is a super conservative guestimation.
But anyway if it's good enough not to mention really close to PC and BC should be easy.
That would let between 1.5 and 2 billions transistors for the gpu.
So by watching R770 figures should provide a theorical peak performance around the 2 TFlops now (if shaders work ~1GHz).
I think that:
This could be slightly too much power dissipated
GPU manufacturers are likely to trade some ALUs for control logic too make them more flexible
So I would put the figure around 2 TFlops for the system as a whole (obviously it's a "theaorical meaningless peak figure" anyway) as the most optimistic figure.
In regard to edam I don't know as it can put constrain on non standard uses of the gpu (for xenos at least tiling prevents some more exotic uses of the GPU).
And enough to fit a 1080 frame buffer with AA would eat a lot of silicon and complexify the mobo design.
It depends I guess on the cost:
rice of upcoming rambus technology or fast GDDR5/6
vs
price of edram
In fact I think I would favor the most flexible design (no edram) even if it means rounding the corners elsewhere (slightly tinier GPU).
_______________________________________________________________
Anyway for those that might consider 2 TFlops disapointing, well I think that they should consider other parts of the design:
Bandwidth
Amout of RAM
and that we're likely to speak of "real programmable Flop"
At the time next gen are push out of the windows, GPU may well look like a pool of strange CPU with really few fixed functions units.
A R670 is made of four "processors" a cluster of VLIW and SIMD unit.
We could look at a sea of "processors" thus a lot of useful power useful power for non graphical (or exotic graphical works).
Ok I've to find a more ineresting job... I've too much time to think about things I don't well enough...