Actually, in the future, the biggest improvements between piledriver and bulldozer performance might be seen on fp-intensive code;
There is lots of software which is not recompiled to use Bulldozer's FMA4 instructions, but will get recompiled to use haswell's (and piledriver's) FMA3.
So on many programs bulldozer will be using separate fadd and fmul instructions(halving it's theoretical fp performance) but piledriver will use fma instructions.
First , i dont know how my first post you quoted ended up here ( some one moved it :smile
. To me ( a layman) this shouldn't be that hard of a problem to solve, dont move threads 0 through 3 to a thread <0 && >4 (repeat for N number of modules) . Use odd numbered threads before even, If there is a conflict between data locality and issuing on an not used odd thread fall back to the first rule.
The devil is always in the detail, but how hard could it be /Clarkson
But i dont see any advantage to the module system unless they make it wider, otherwise they are always going to be caught in the spot of having to low single/light threaded performance. Something like a 4 thread 3ALU int per core, 3x 256bit AVX2 module would be a floating point monster, but as i said i like wishful thinking.
edit: this kind of slide gives me hope......lol
http://xtreview.com/images/opteron AMD Excavator architecture 01.gif
steamroller to add SMT, excavator to widen the module.
On the L1i, i just dont get what the logic behind it was, did they run into trouble with a new cache design and had to make a judgement call and took the less risky path? But yes, they need to fix L1i. Also add either separate decoders ( has been hinted at) or more decode width and able to decode to both cores on the same clock. Given the deepish nature of bulldozer a L0i/op cache/trace cache would likely help reduce required decode , power and maybe even help performance.
Second i was more thinking current and legacy workloads, i agree that FMA 4 is going to go nowhere. its kinda funny how AMD didn't even want it in the first place.