OK so all this stuff about 'coding a software renderer as if its x86' is pure horseshit.
Scheduling & API is all thats 'in software' & the API is already in software for ATI/Nvidia anyway.
Well yes and no. The difference between ATi/nVidia and Intel isn't as large as Intel wants you to believe.
You could code a software renderer with eg Cuda aswell, if you so desire. In fact, nVidia has an offline renderer by the name of Gelato, which has been around since the GeForce 6.
It's not x86, but other than that it's a 'software' implementation of a renderer, implementing various features and rendering techniques that are not natively supported by the hardware, but can be implemented through software routines.
To write your own software renderer to make use of Larrabee, you'd need to write your own everything including a scheduler that can load balance & hide latency across tens of cores & many many threads!
Presumably Intel would provide libraries to provide those functions for you but then you're still using a 3rd party API.
Yes, but the idea behind all that is that you can implement an alternative renderer, and not be bound by the limits of the Direct3D or OpenGL programming model.
So there will be virtually no limits on what kind of drawing primitives you use, what your shaders can and cannot do, what kind of textures you use etc.
Intel has been hinting at raytracing pretty obviously.
So back to the hardware, clock for clock, ignoring special function for the time being (emulated at reduced speed on the x86 int cores?) & assuming that the Larrabee Vec16 only does MADD:
RV770 is 10*(16*(1+1+1+1+1) = 800 SP right?
A 48 core Larrabee would be 48*(16*1) = 768 SP
So one ATI 16*(1+1+1+1+1) VLIW SIMD = 5 Larrabee cores
It doesn't quite work that way. If you apply this logic to ATi vs nVidia, then ATi would have the fastest GPU by far. Yet it is nVidia that comes out on top.
The reason is that nVidia uses its processing units in a completely different way, making it far more efficient than ATi's.
So aside from the number of processing units, one big unknown in this story is how efficient Intel's units will be in practice.
We now know that they will use tile-based rendering, which is quite a different approach from nVidia/ATi. This makes it even harder to make direct comparisons. Basically we know neither the hardware nor the software driving Direct3D/OpenGL applications for Larrabee.
Also, don't forget that each core on Larrabee will get 4-way SMT (HyperThreading). My suspicion is that they will use this 4-way SMT to 'multiplex' shading and 'fixed-function' operations in their rasterizer. One will mostly use the x86 integer units, and the other will use the SIMD unit, so you will get nice parallelism.