You're certainly allowed to make any assumptions you want.
The clock range was 1.6-2.5 GHz, at least back in 2007. Granted, it was for a product planned to have between 16 and 24 cores. The top end would have hit a planned 1 TFLOP DP, and that number was bandied about for a while.
I charitably assumed going to a full 32 would lead to a lower top clock.
What eventually came out was about half that fast, and we then have an Intel employee's statement that amounts to "I meant to do that".
Really? I give everyone one a handlefull of TTL chips and have them write software for it.
I give then an 8086 and have them write software for it.
One led to the personal computer revolution. The other lead to dead ends. Both had lots of programmers. Both had lots of code.
The 8086 had a patron in the form of IBM and a situation where the clone market was basically ceded to that one architecture. That lead to an established platform and infrastructure, one that was marketed and guided by one of the dominant corporations of the day. Then it had the existence of a clone market with widespread adoption, which had advantages of much lower barriers of entry and a rather uniform target platform.
In the case of graphics, there are established platforms and companies willing to engage developers and users at every price segment and anywhere you can stick a piece of silicon.
In nature, examples of explosive speciation happen when something occurs to create an environment that is wide open and missing significant competition. The presence of established competition tends to tamp this flowering down very quickly.
So you are just reinforcing the idea that the current situation has 1000's of devs doing the same thing in the same way and replicating all their work instead of making and modifying pipelines.
The replication is at a higher level, and reduces the need to redo the bulk of the platform that is sufficient for a developer's particular needs.
No there is a lot considered settled because current hardware cannot support anything else. All types of interesting things like different math systems for Z (linear vs log vs cubic vs curve vs exponential, etc). That is something pretty basic and there are all these options that we KNOW are better for so many things that we simply cannot use because of the limitations of current hardware.
This goes back to the question of the gain in utility.
*edit brain fart here, was thinking of exp shadow maps, should have gone for irregular Z
Something like crummy shadows would be helped by exponential, for example. But how much of the total output is improved? It is fine that shadows are more accurate and aren't pre-baked, faked, or blocky, but then users like their frame rate.*
Obviously? We've had consumer native multi-core (> 2 cores) for less than three years. We're really only at the infancy of the mulit-core CPU revolution. They haven't even added AVX, FMA or gather/scatter yet.
For the bulk of the market, it has not and will not start for several generations yet. They're parking at 4 cores, and adding a GPU instead.
Consumers do not need massive numbers of cores, and relying on the software approach is counting on hardware that spends half its time trying to make a good time of running Excel.
What would make gather/scatter particularly important in this context, given the granularity of fully generic DRAM bursts and generic cache lines?
I'm sorry but that's a silly question. It's like asking what particularly innovative application was created after the first stored program computer appeared.
The punch-card guys saw a pretty immediate benefit.
Going with a stored-program approach was not an accident, and even at its conception it permitted a product that could do many things incredibly faster, reliably, and significantly different from what came before.
I may have to admit a lack of imagination, but I do not see a similar marketable gap in this current situation.
Dedicated hardware undoubtedly did the job faster, but you seriously got to look at the sum of all the applications.
There's a massive swath of software out there that finds the incumbent method useful, with sunk costs and established knowledge base included.
And then there's just speed. Quantity is a quality all it's own, and one I think has been glossed over so far.
Creating fully generic hardware is simply the logical next step. Anyone questioning that might as well question whether shader have any use at all.
Unless it's a transputer, a hardware device is going to pick winners and losers. If fully general, there are no winners because the peak is so low.
If one were to try a model like John Carmack's idea for a sparse voxel octree, for example, Larrabee should have been a chip that twiddled bits and performed billions of boolean ops in parrallel.
Graphics card manufacturers added a bit of programmability, and developers instantly pushed it to its limits, asking for longer shaders, more complex instructions, higher precision, flow control, etc.
And when CPU manufacturers added multiple cores, developers immediately got it to work well with two after 4 years.
The multicore revolution is at least so far a reprise of the GHz wars.
The fact that a software implementation on the CPU can't compete with dedicated hardware implementations doesn't say a thing about the successfulness of a software implementation on a massively multi-core fully generic chip.
It still comes down to the hardware implementation. So far the answer until Larrabee III is still that it would not be successful.
Why would they work on the same thing over and over? People would just create libraries/renderers/engines and sell them so that others can spend their time on different layers of the application.
Perhaps it's just a coincidence that Tim Sweeny likes this possible future.
Besides, in the early days graphics APIs were simple and every game had its own engine, while today hardly any games use a custom engine. So with hardware becoming more programmable, developers actually spend less time working on the same thing over and over. I don't see any reason why this would halt or reverse with fully generic hardware.
So we eventually have a handful of middleware vendors that produce renders everyone uses.
Will the market see it as being different from having a few graphics vendors?