Tim Sweeney argues 1 byte of memory bandwidth per flop.
Intel stated the goal of Larrabee was 2 bytes of read/write bandwidth per flop.
The point is besides cache, the system memory bandwidth can be achieved to have teraflop chips.
16flops per clock x 16cores x 4000 clock = a terraflop chip
The 360 had 3 of the cores clocked at 3.2ghz...on a modern process a 800mhz increase to 4ghz seems reachable.
2 of these chips would achieve less than the 2.5 terraflop goal Tim Sweeney stated, but they would be fully programmable.
As far costs to make such a chip, Microsoft could just offer IBM royalties and not bother with ATI. Although all the rumors constantly mention ATI so I'm not sure. Heck I'm not sure of the 16 core thing anyway, but thats the fun of speculation.
This is back from 2009.
graphics.cs.williams.edu/archive/SweeneyHPG2009/TimHPG2009.pdf
That's a different calculation altogether than this one:
16 cores x 4ghz x 4thread x 2bytes = 512gb/s
Anyway 4Ghz is unlikely even Intel don't get there.
As for Sweeney I think he speaks of plain FLOPS (ie4TFLOPS) not specifically FMA or FMAC.
He gives a ball park figure assuming X operation per pixel/vertex, etc.
4GHz won't happen that's sure.
For a throughput design going this far (16 wide SIMD) I find it unlikely. At this stage it may be better to do the calculations on the GPU.
THe thing I don't get in all that rumors is that now the heavy lifting in Kinect is done by the GPU.
I wonder why MS would move that to the CPU and sacrifice four cores.
It's possible that MS want to avoid as much as possible GPGPU programming as it's time extensive?
That could be a reason to stick to throughput design in next generation. Still 16 wide SIMDs sounds like pushing it.
I'm surprised with this 16 cores rumor so far I don't know what to do with it /make sense.
May be it could be simple OoO cores?
I don't know but it smells like throughput/simplistic cores. We need to learn more.
On Durango basically it's hard to figure anything at this point, we hear everything and its contrary.
I'm close to drop the ball on all these talk and W&S as it smells a bit at this stage like big sites are out for click hunting.
I take it as a proof that for the public the ps360 are really getting old and people want to hear about new stuff.
----------------------------
There was a presentation from nvidia where they implemented a software pipeline on a GPU.
It's significantly slower than the traditional pipeline but it's a proof it can be done and you don't need a larrabee design to do so.
It's true that thing could have been faster as some fixed function hardware are not exposed in CUDA. With more time and access to all the hardware software rendering on GPU could be more competitive but it's questionable if it is worth it and that's on a GPU.
Now going to something like larrabee, and reading all the insiders posts here (minus Nick) it sounds like a bad idea. (See the endless discussion between Nick and the other camp).
As I get it there is no way a many core CPU can be competitive, way too much logic per ALU, not enough threads to hide latencies.
IF MSFT and IBM could pull it would be indeed a tour de force but I'm more than septic.
A larrabee is tie a cheap (est) CPU to a 16 wide SIMD. In AMD GCN Compute Unit you have 4 16 wide SIMD executing each acting as a 64 wide one (1/4 speed). The hardware keeps track on a lot of thread.
Each compute unit as access to a shit load of register files + LDS.
There is one scalar unit per 4 CU, with it's own resource.
what I mean is that a group of 4 CU achieve a density in many regard (compute, register, local store) that I can't see a larrabee like to match (or what Nvidia offers).
Then GPU have a lot hardware that 'make it happens' but it's pretty much centralized, specialized and so cost is low (in power and silicon).
Corner Knight (on 22nm? I don't remember) pack +40 16 wide SIMD in a +500sq.mm package, with most likely a really high power consumption. That's ~ a HD 7770. Clock speed is higher but power is on another scale as an effect.
Members with more knowledge could give their opinions on the matter but I find it really really unlikely (and that comes of somebody that was dreaming about some years ago especially after reading the very paper you linked
).
EDIT I would like to speak of a possible larrabe design, it's unliekely but dreaming is cheap. I think there might be an existing thread in the tech section on the matter. It would be a bit of a necro topic but it's might be a better place to discuss than here.