Tim Sweeney argues 1 byte of memory bandwidth per flop.
Intel stated the goal of Larrabee was 2 bytes of read/write bandwidth per flop.
The point is besides cache, the system memory bandwidth can be achieved to have teraflop chips.
16flops per clock x 16cores x 4000 clock = a terraflop chip
The 360 had 3 of the cores clocked at 3.2ghz...on a modern process a 800mhz increase to 4ghz seems reachable.
2 of these chips would achieve less than the 2.5 terraflop goal Tim Sweeney stated, but they would be fully programmable.
As far costs to make such a chip, Microsoft could just offer IBM royalties and not bother with ATI. Although all the rumors constantly mention ATI so I'm not sure. Heck I'm not sure of the 16 core thing anyway, but thats the fun of speculation.
This is back from 2009.
graphics.cs.williams.edu/archive/SweeneyHPG2009/TimHPG2009.pdf
Hey Brimstone I dared copying your post here so we can further discuss.
I though a bit more a bout it the whole idea behind larrabee or a powerpc larrabee is too run standard X86 Powerpc code ruight. It might not be the best idea but anyway.
I was thinking about how too achieve better density than in larrabee. I considered to pack more resources (SIMD units texture units) in a core a more complex one not by any mean light and build the core around it. I though a bit and you would need to support more than 4 hardware thread, to have more resources, etc.
Then I looked at the GPU world. for the sake of density the L1 cache for four CU is implemented as a block. 4 CU is 4x4 16 wide SIMD, altogether support 40 active threads (out of a pool of 4x256?).
Then you have the number of pending memory request, etc.
I came to the conclusion that you may want to get ths SIMDs tex units, out of your cpu core and then having him to keep track of things, etc. Then you specialize them, etc.
That's pretty much like reinventing the wheel and creating a GPU
At any point in the design the scalar ISA of the design has been that much of a concern.
Even in larrabee the usefulness of the core being compliant to the X86 ISA is questionable to what the chip is intended to achieve.
I got a crazy idea, simple cores tied a SIMD or multiple SIMDs tied a complex cores, what if that is the wrong way to build something based on existing CPU ISA that would be intended to do among other things graphics?
Actually what I though what if Intel for that matter may have taken another road?
The idea I had is that the only I can see for the ISA to have any relevance is do what AMD did for a while with its VLIW designs, having MIMD designs to act in a vectorized fashion.
For Intel that CISC simple X86 cores, for IBM that would be simple RISC POWERPC cores.
Basically is doing some sort of GPU out of CPUs not sticking stuffs to CPU cores.
So Intel used for larrabee p55 cores for IBM it could have been power 1 or 2. In both case a new design would have been really likely.
Still those cores are super tiny, you remove the front end.
Then you create a new front end that forward the same intructions to those different "cores" (not proper but anyway) and different data. pretty much what AMD gpus do.
The challenge for IBM or Intel would be to make the front end more complex than the one in even nowadays CPU so each cluster act more as an autonomous CPU core than to an "assisted" lets say AMD SIMD.
In a worse case scenario (complex data dependency ? or I don't know what
) the thing would act as a single CPU cores. Depending on the number of "cores" in the array it would be a way more severe drop than the worse case scenario in AMD (ie using 1 alu out of five).
Put on purely data parrallel problem it would act as X X86 or PPC cores.
It's a bit ridiculous but that the only way I can see the ISA being relevant in that kind of design and having a chance to compete with GPUs (larrabee may end competing but does that make x86 relevant to the picture?)
It's kind of the only way I can see traditional CPU cores and GPU/throughput one sharing the same ISA. You may want the ISA to support a new integrated DSP the tex units.
So Imagine you would have a SIMD (AMD parlance) of MIMD X86 or PowerPc cores, including texture units powered up by a really complex CPU front end.
Anyway people don't write in assembly anymore and it looks like there should be languages that further hide the difference between the GPU and CPU from the programmer.
Either way what people might want it's not something like larrabee but more complex GPU and nobody cares for the GPU ISA even if one were to use X86 or ppc for it.
But Ithere might a good reason for that to not happen but I though it would be funny to bring the idea as it is somehow what AMD were doing till now MIMD units acting in a vectorized fashion.
EDIT
It's a bit of try to demonstrated 'by absurd' that the idea being larrabee may not be relevant.