Larrabee delayed to 2011 ?

aaronspink · Jun 16, 2010

3dilettante said:
Fully generic and performant multicore hardware.
Obviously, we've had multicore CPUs for years, they are just insufficient.
Larrabee I apparently was not the one to break that trend.

From one mouth of one of the leaders of the project during a stanford ee380 lecture, the issue was apparently software not hardware.

Does this creative flowering of software renderers offer significantly greater utility to the market, or is it searching for a problem?

Does changing DX versions off significantly greater utility to the market and if so why?

So it's better if millions of people work on the same thing over and over versus having a few thousand work on the same thing once?

Not to sound like a jerk, but what makes you think that millions of people aren't working on the same thing over and over already?

MfA · Jun 16, 2010

It's the only thing which keeps engineers employed really.

3dilettante · Jun 16, 2010

aaronspink said:
From one mouth of one of the leaders of the project during a stanford ee380 lecture, the issue was apparently software not hardware.

I'm aware of that claim. I am wary of accepting that as the complete story when the physical implementation had a clock shortfall of 40-50% from initial targets, as hinted by the promised DP performance, the SGEMM demo, and the clocks of the HPC product Intel repackaged Larrabee into.

Does changing DX versions off significantly greater utility to the market and if so why?

It certainly gives the dubious advantage for the manufacturers of the hardware, but that aside...

As the incumbent method, significantly greater utility is not necessary.
It's similar reasoning for why exotic architectures tend to fail in HPC, where an order of magnitude improvement over bog-standard is something needed to justify the hassle.

Depending on the transition, a change like the DX10-11 one permits the modification of existing pipelines and the use of existing pools of expertise, all relatively standard across an industry.
There is an established body of existing tools and agreed upon formats.
When the inflexibility of portions of the pipeline become onerous, a modification can be made as needed.
For example, the use of compute shaders in DX11 to implement order-independent transparency. This was one of the few early examples of Larrabee had of a visible improvement over GPUs at the time. One of the others was a screen full of chrome spheres, which I found less compelling.

A profusion of custom software pipelines does not offer this, which leaves a lot of inertia that has to be overcome. I'd like to know what would be so revolutionary as to justify going back a fragmented situation similar to the early 90's.

Not to sound like a jerk, but what makes you think that millions of people aren't working on the same thing over and over already?

If they are, then the point I was referencing is not going to lead to anything different.
I'm sure many thousands work tirelessly to massage the current graphics pipeline.
On the other hand, they aren't trying to reinvent bilinear filtering or coming up with new and esoteric formats for their personal renderers, either.
There's a lot that is considered "settled", barring something revolutionary that justifies discarding much of what has come before. I am curious what that would be.

Hypothetically, if we were to discover that the software rasterizer used for Larrabee really was the best implementation to be used on the chip, it would lead to a bunch of samey Larrabee-ish software renderers that produce similar visual output to what is already there, at performance levels at or slightly below par what is already out there. That is not the disruptive change I think would be needed to get real momentum.

Andrew Lauritzen · Jun 16, 2010

Are you really arguing that there's nothing compelling that you can't do (with high performance) in DirectX 11? I love DX11 and it's by far the best API we have right now, but come on...

3dilettante · Jun 16, 2010

I'm saying it would have to be very compelling.

It would need to bring significant graphical improvement in a field that has little compunction for fudging or hacking together an approximation or changing art direction if needed, and (or?) it would need to bring a performance improvement enough to counter the traditional order-of-magnitude drop in performance ditching hardware tends to bring.

It would be something that can't be faked that we can't live without (or can't wait for DX12).

Andrew Lauritzen · Jun 16, 2010

Right, but it doesn't take much imagination to come up with compelling stuff that doesn't fit nicely into the current pipeline. In particular, the general compute pipeline is more interesting than the rendering pipeline nowadays and most of the interesting innovation is happening there. However the languages and current hardware models there have some significant problems that you run into in *every* non-trivial application. Thus I think it's pretty naive to make the argument that we're even close to the end-game for these devices, APIs and programming models.

aaronspink · Jun 17, 2010

3dilettante said:
I'm aware of that claim. I am wary of accepting that as the complete story when the physical implementation had a clock shortfall of 40-50% from initial targets, as hinted by the promised DP performance, the SGEMM demo, and the clocks of the HPC product Intel repackaged Larrabee into.

You're certainly allowed to make any assumptions you want.

If they are, then the point I was referencing is not going to lead to anything different.

Really? I give everyone one a handlefull of TTL chips and have them write software for it.

I give then an 8086 and have them write software for it.

One led to the personal computer revolution. The other lead to dead ends. Both had lots of programmers. Both had lots of code.

On the other hand, they aren't trying to reinvent bilinear filtering or coming up with new and esoteric formats for their personal renderers, either.

So you are just reinforcing the idea that the current situation has 1000's of devs doing the same thing in the same way and replicating all their work instead of making and modifying pipelines.

There's a lot that is considered "settled", barring something revolutionary that justifies discarding much of what has come before. I am curious what that would be.

No there is a lot considered settled because current hardware cannot support anything else. All types of interesting things like different math systems for Z (linear vs log vs cubic vs curve vs exponential, etc). That is something pretty basic and there are all these options that we KNOW are better for so many things that we simply cannot use because of the limitations of current hardware.

Andrew Lauritzen · Jun 17, 2010

3dilettante said:
When the inflexibility of portions of the pipeline become onerous, a modification can be made as needed.

That's not as simple as it sounds... most people don't agree on changes and it's hard to let the best ideas rise to the top if you simply can't try it at all since the IHVs (or worse, one IHV) refuses to build it.

3dilettante said:
There's a lot that is considered "settled", barring something revolutionary that justifies discarding much of what has come before. I am curious what that would be.

It's not fair to say that a software pipeline would be "discarding" what has come before. Many of the important lessons for how to write these pipelines and how to make fast hardware definitely has come from GPUs. Making them more flexible and generally useful while vastly expanding what can be done in the graphics pipeline does not constitute throwing out the stuff that worked.

Nick · Jun 17, 2010

3dilettante said:
Is this a hint that there is a massive untapped market for a software DX11 implementation or...?

No, it's a hint that there's a massive untapped market for a wide variety of high performance applications.

Fully generic and performant multicore hardware.
Obviously, we've had multicore CPUs for years, they are just insufficient.

Obviously? We've had consumer native multi-core (> 2 cores) for less than three years. We're really only at the infancy of the mulit-core CPU revolution. They haven't even added AVX, FMA or gather/scatter yet.

I'd like an analysis of this. What particularly innovative things would have a significant material impact on the market?

I'm sorry but that's a silly question. It's like asking what particularly innovative application was created after the first stored program computer appeared. Dedicated hardware undoubtedly did the job faster, but you seriously got to look at the sum of all the applications.

What we need right now is to enable developers to create whatever they want, and all the particularly innovative applications will follow. The fact that dedicated hardware will be able to run one of those applications faster, will be entirely irrelevant. I bet you can run Unreal Tournament 2003 at a million FPS if you dedicated 3 billion transistors to it. But nobody cares.

Creating fully generic hardware is simply the logical next step. Anyone questioning that might as well question whether shader have any use at all. Graphics card manufacturers added a bit of programmability, and developers instantly pushed it to its limits, asking for longer shaders, more complex instructions, higher precision, flow control, etc.

Does this creative flowering of software renderers offer significantly greater utility to the market, or is it searching for a problem?

Both.

It all depends on how you use it. Several geeks may have wasted their time trying to get their ZX81 to iron their shirts. Others wrote the umpteenth Pong clone. And some wrote spreadsheet applications that evolved into a billion dollar software industry. Programmable computers are a tremendous success, but only when you look at the right end of the spectrum.

What is the economic incentive for Microsoft for doing so?

Little to none, and that's exactly the reason why DX11 was first implemented in hardware. But Andrew Richards' conclusion was that fully generic hardware is not the way forward, completely missing the actual reasons why a hardware implementation came first. The fact that a software implementation on the CPU can't compete with dedicated hardware implementations doesn't say a thing about the successfulness of a software implementation on a massively multi-core fully generic chip.

So it's better if millions of people work on the same thing over and over versus having a few thousand work on the same thing once?

Why would they work on the same thing over and over? People would just create libraries/renderers/engines and sell them so that others can spend their time on different layers of the application. It's possible someone sees the opportunity to do better than the competition, but then they wouldn't work on the exact same thing all over again. This will reach a demand and supply equilibrium just like the software and hardware markets already do today. The difference is that there will be a lot more diversity in the applications.

Besides, in the early days graphics APIs were simple and every game had its own engine, while today hardly any games use a custom engine. So with hardware becoming more programmable, developers actually spend less time working on the same thing over and over. I don't see any reason why this would halt or reverse with fully generic hardware.

MfA · Jun 17, 2010

Andrew Lauritzen said:
Making them more flexible and generally useful while vastly expanding what can be done in the graphics pipeline does not constitute throwing out the stuff that worked.

Larrabee threw out some stuff, it's cache is more flexible than shared memory in a lot of ways but it's not a strict superset ... you can't do single cycle column access across the SIMD on Larrabee for instance, for a short horizontal FIR filter you are stuck with transposing.

Nick · Jun 17, 2010

3dilettante said:
On the other hand, they aren't trying to reinvent bilinear filtering...

It's funny you're mentioning this, because recently we ran into an issue where ANGLE was passing all OpenGL ES 2.0 conformance tests, except for one texturing test on ATI cards. We discovered that they use a fixed texture coordinate precision per texel, which leads to large rounding errors on small textures. I also learned of a medical application using voxels where this same issue forced them to do all filtering in the shaders, decimating performance. This was only fixed in the last generation of ATI cards.

So clearly there are very good reasons to want to reinvent bilinear filtering from time to time.

Similarly, perspective correction has also been reinvented a couple times. Older hardware performed the perspective division in a dedicated interpolator, while nowadays it's done in the shader. The advantage is that it can be eliminated when no perspective correction is necessary, like when doing full-screen post-processing passes. It's things like this that make a more software oriented implementation more interesting than dedicated hardware not only because of the programmability but also from a performance perspective.

I'm just saying, it really wouldn't be that horrible to reinvent these kind of things a few times more if it brings new benefits.

Andrew Lauritzen · Jun 17, 2010

MfA said:
Larrabee threw out some stuff, it's cache is more flexible than shared memory in a lot of ways but it's not a strict superset ... you can't do single cycle column access across the SIMD on Larrabee for instance, for a short horizontal FIR filter you are stuck with transposing.

Sure but I'm not talking about the specific case... I was responding to the assertion that software pipelines are fundamentally "throwing out" all the work and learnings from hardware implementations. Obviously the hardware implementations get some things wrong (as they continually iterate each generation) and obviously some things shift when going from hardware to software, but by no means is a software implementation of the rendering pipeline on a general purpose throughput architecture "throwing out" the successes of GPUs to date - quite the contrary!

aaronspink · Jun 17, 2010

MfA said:
Larrabee threw out some stuff, it's cache is more flexible than shared memory in a lot of ways but it's not a strict superset ... you can't do single cycle column access across the SIMD on Larrabee for instance, for a short horizontal FIR filter you are stuck with transposing.

That's more an issue of optimization than anything else. If it becomes a bottleneck, the structure of the cache can change to allow it but it comes at a trade-off of significantly increasing the area and power for something that likely is done enough to justify the cost.

OTOH, current hardware is basically incapable of using different mathematical spaces for things like Z and depth buffers. There are lots of known algorithms to do things better/faster that simply don't work on current GPUs because the structures, etc, required to do them don't exist.

I've said it before, but the only thing I expect to stay FF for any appreciable amount of time going forward is texturing simply because its structures, math, and dataflow are so unlike anything else.

3dilettante · Jun 17, 2010

aaronspink said:
You're certainly allowed to make any assumptions you want.

The clock range was 1.6-2.5 GHz, at least back in 2007. Granted, it was for a product planned to have between 16 and 24 cores. The top end would have hit a planned 1 TFLOP DP, and that number was bandied about for a while.
I charitably assumed going to a full 32 would lead to a lower top clock.

What eventually came out was about half that fast, and we then have an Intel employee's statement that amounts to "I meant to do that".

Really? I give everyone one a handlefull of TTL chips and have them write software for it.

I give then an 8086 and have them write software for it.

One led to the personal computer revolution. The other lead to dead ends. Both had lots of programmers. Both had lots of code.

The 8086 had a patron in the form of IBM and a situation where the clone market was basically ceded to that one architecture. That lead to an established platform and infrastructure, one that was marketed and guided by one of the dominant corporations of the day. Then it had the existence of a clone market with widespread adoption, which had advantages of much lower barriers of entry and a rather uniform target platform.

In the case of graphics, there are established platforms and companies willing to engage developers and users at every price segment and anywhere you can stick a piece of silicon.

In nature, examples of explosive speciation happen when something occurs to create an environment that is wide open and missing significant competition. The presence of established competition tends to tamp this flowering down very quickly.

So you are just reinforcing the idea that the current situation has 1000's of devs doing the same thing in the same way and replicating all their work instead of making and modifying pipelines.

The replication is at a higher level, and reduces the need to redo the bulk of the platform that is sufficient for a developer's particular needs.

No there is a lot considered settled because current hardware cannot support anything else. All types of interesting things like different math systems for Z (linear vs log vs cubic vs curve vs exponential, etc). That is something pretty basic and there are all these options that we KNOW are better for so many things that we simply cannot use because of the limitations of current hardware.

This goes back to the question of the gain in utility.
*edit brain fart here, was thinking of exp shadow maps, should have gone for irregular Z
Something like crummy shadows would be helped by exponential, for example. But how much of the total output is improved? It is fine that shadows are more accurate and aren't pre-baked, faked, or blocky, but then users like their frame rate.*

Nick said:
Obviously? We've had consumer native multi-core (> 2 cores) for less than three years. We're really only at the infancy of the mulit-core CPU revolution. They haven't even added AVX, FMA or gather/scatter yet.

For the bulk of the market, it has not and will not start for several generations yet. They're parking at 4 cores, and adding a GPU instead.
Consumers do not need massive numbers of cores, and relying on the software approach is counting on hardware that spends half its time trying to make a good time of running Excel.

What would make gather/scatter particularly important in this context, given the granularity of fully generic DRAM bursts and generic cache lines?

I'm sorry but that's a silly question. It's like asking what particularly innovative application was created after the first stored program computer appeared.

The punch-card guys saw a pretty immediate benefit.
Going with a stored-program approach was not an accident, and even at its conception it permitted a product that could do many things incredibly faster, reliably, and significantly different from what came before.

I may have to admit a lack of imagination, but I do not see a similar marketable gap in this current situation.

Dedicated hardware undoubtedly did the job faster, but you seriously got to look at the sum of all the applications.

There's a massive swath of software out there that finds the incumbent method useful, with sunk costs and established knowledge base included.
And then there's just speed. Quantity is a quality all it's own, and one I think has been glossed over so far.

Creating fully generic hardware is simply the logical next step. Anyone questioning that might as well question whether shader have any use at all.

Unless it's a transputer, a hardware device is going to pick winners and losers. If fully general, there are no winners because the peak is so low.
If one were to try a model like John Carmack's idea for a sparse voxel octree, for example, Larrabee should have been a chip that twiddled bits and performed billions of boolean ops in parrallel.

Graphics card manufacturers added a bit of programmability, and developers instantly pushed it to its limits, asking for longer shaders, more complex instructions, higher precision, flow control, etc.

And when CPU manufacturers added multiple cores, developers immediately got it to work well with two after 4 years.
The multicore revolution is at least so far a reprise of the GHz wars.

The fact that a software implementation on the CPU can't compete with dedicated hardware implementations doesn't say a thing about the successfulness of a software implementation on a massively multi-core fully generic chip.

It still comes down to the hardware implementation. So far the answer until Larrabee III is still that it would not be successful.

Why would they work on the same thing over and over? People would just create libraries/renderers/engines and sell them so that others can spend their time on different layers of the application.

Perhaps it's just a coincidence that Tim Sweeny likes this possible future.

Besides, in the early days graphics APIs were simple and every game had its own engine, while today hardly any games use a custom engine. So with hardware becoming more programmable, developers actually spend less time working on the same thing over and over. I don't see any reason why this would halt or reverse with fully generic hardware.

So we eventually have a handful of middleware vendors that produce renders everyone uses.
Will the market see it as being different from having a few graphics vendors?

Nick · Jun 17, 2010

aaronspink said:
I've said it before, but the only thing I expect to stay FF for any appreciable amount of time going forward is texturing simply because its structures, math, and dataflow are so unlike anything else.

Texture units may actually be unified with the rest of the execution cores sooner.

Larrabee has generic scatter/gather capabilities in the execution cores, but with the texture units being separate there has to be a lot of duplicate logic and additional data busses. There is probably also some synchronization and latency overhead involved. Anyway, I'm under the impression that Larrabee already does part of the texture addressing work in the generic execution cores.

Nowadays texture sampling is a surprisingly rare operation. Just look at the ratio of texture units to CUDA cores on Fermi. Soon most of the memory accesses will just be plain read operations without any filtering, mipmapping, decompression, etc. Texture units are either the bottleneck or are (partially) idle, depending on the algorithm being executed. Even for Unreal Tournament 2004, which would be texturing heavy for today's standards, the bandwidth usage is less than 20% and varies wildly. Does that really justify all the dedicated data lanes and logic?

Increasing precision demands also close the gap between the math going on the texture units and the math going on in the shader units. I really don't think there's any operation left that would justify creating a fixed-function unit versus more generic instructions.

Programmable texture sampling would allow to only pay for the features you really need, and to implement specific minification and magnification filters for your anti-aliasing needs.

nAo · Jun 17, 2010

Nick said:
Nowadays texture sampling is a surprisingly rare operation. Just look at the ratio of texture units to CUDA cores on Fermi. Soon most of the memory accesses will just be plain read operations without any filtering, mipmapping, decompression, etc.

Since when texture sampling has become a surprisingly rare operation? I am not sure we live in the same universe.

Texture units are either the bottleneck or are (partially) idle, depending on the algorithm being executed. Even for Unreal Tournament 2004, which would be texturing heavy for today's standards, the bandwidth usage is less than 20% and varies wildly. Does that really justify all the dedicated data lanes and logic?

External memory BW used for texturing isn't necessarily a direct measure of the samplers activity (see texture cache). Historically speaking texturing has never been the largest BW consumer.

3dilettante · Jun 17, 2010

Nick said:
It's funny you're mentioning this, because recently we ran into an issue where ANGLE was passing all OpenGL ES 2.0 conformance tests, except for one texturing test on ATI cards. We discovered that they use a fixed texture coordinate precision per texel, which leads to large rounding errors on small textures. I also learned of a medical application using voxels where this same issue forced them to do all filtering in the shaders, decimating performance. This was only fixed in the last generation of ATI cards.

This is an interesting data point. It is similar to a point Linus Torvalds made with regards to working Transmeta's architecture, in that the emulation layer allowed them to fix bugs where a native CPU would have required a new stepping, compiler workaround, or a "no fix planned" entry in the errata.

A raster-output program or software rasterizer problem could be fixed with a patch. Texturing's persistence as a problem that runs against the normal workload of general purpose code would have left Larrabee in a similar state, unless those units were dispensed with.

Similarly, perspective correction has also been reinvented a couple times. Older hardware performed the perspective division in a dedicated interpolator, while nowadays it's done in the shader. The advantage is that it can be eliminated when no perspective correction is necessary, like when doing full-screen post-processing passes. It's things like this that make a more software oriented implementation more interesting than dedicated hardware not only because of the programmability but also from a performance perspective.

This would be on the opposite side of the responsiveness spectrum from the bug mentioned before.
It is the type of modification that happens infrequently enough that hardware generations can come and go in the meantime, so silicon's otherwise poor refresh rate is not as debilitating as dealing with a hardware fault.

Nick · Jun 17, 2010

3dilettante said:
The 8086 had a patron in the form of IBM and a situation where the clone market was basically ceded to that one architecture. That lead to an established platform and infrastructure, one that was marketed and guided by one of the dominant corporations of the day.

Are you honestly saying that the success of microprocessors versus TTL logic is due to chance? You don't think the possibility to create a wide variety of applications was what really made it attractive?

For the bulk of the market, it has not and will not start for several generations yet. They're parking at 4 cores, and adding a GPU instead.

And three years ago you would probably have said they're parking at two cores and the IGP is part of the memory controller...

I'm sorry but you fail to see that things have always and are still evolving towards unification of the CPU and GPU. Sure, it will take several more generations but it's not like the convergence will happen overnight and we shouldn't pay any attention to generic multi-core processors in the meantime.

What would make gather/scatter particularly important in this context, given the granularity of fully generic DRAM bursts and generic cache lines?

As the vector units get wider, the parallel execution of loops actually becomes bottlenecked by memory reads and writes. An AVX execution unit with FMA would be capable of 16 SP FLOPs per cycle but you'd need a dozen instructions or so to transpose the data into the right order. So all those extra FLOPs are rendered useless unless the data accesses scale with them (even if just partially). The data typically has a high locality but swizzling the elements around within and accross registers just isn't efficient. Another application is table lookups. There's a ton of interesting things you can do with lookup tables that fit in one or a few cache lines, that would require tens of instructions without generic gather/scatter operations.

Going with a stored-program approach was not an accident, and even at its conception it permitted a product that could do many things incredibly faster, reliably, and significantly different from what came before.

Exactly. It enabled not just one particularly innovative applications, but many things. And in the same way fully generic hardware will enable a lot of new things. Things we can hardly imagine right now. But just like the punch-card guys realized that a stored program architecture would quickly become indispensable despite some added complexity, I don't have the smallest doubt that in ten years or so from now it will be unthinkable not to have fully generic hardware.

There's a massive swath of software out there that finds the incumbent method useful, with sunk costs and established knowledge base included.

And there's a massive swath of software our there that was written for fixed-function pipelines that today is totally worthless.

I still have a laptop with an old Pentium M that comes in handy from time to time, and still runs a wide variety of old and new applications. The only thing that has become outdated very quickly is its graphics chip. Nobody writes new Direct3D 8 applications, and it can't run Direct3D 9 applications even at the lowest resolution because it's not fully generic. The CPU is the only reason I kept this laptop for so long.

So there's not a single reason to panic about losing an established knowledge base when hardware becomes more generic. The legacy APIs will still remain available for as long as anyone cares to use them. But you won't be restricted to what the hardware can do when you want to create something new.

So we eventually have a handful of middleware vendors that produce renders everyone uses.
Will the market see it as being different from having a few graphics vendors?

There will be a handful of vendors for rasterizers, a handful of vendors for ray-tracers, a handful of vendors for voxel renderers, a handful of vendors for physics engines, a handful of vendors for movie editing software, a handful of vendors for A.I. libraries, a handful of vendors for complete game engines in a certain genre, etc. So yes it will be very different from today. And creative minds like Tim Sweeney have always pushed the hardware to its limits so I very much look forward to the kind of things he will create with generic multi-core hardware. All this will enable an explosive diversity of new applications compared to today.

MfA · Jun 17, 2010

Texture filtering isn't the thing which needs hardware support ... anisotropic is already mostly in the shaders, doing bilinear in there isn't a big thing. Decompression though belongs in hardware, and the actual cache ... 64 bit unaligned accesses don't suit normal caches very well (32 bit banked would do better).

3dilettante · Jun 17, 2010

Nick said:
Are you honestly saying that the success of microprocessors versus TTL logic is due to chance? You don't think the possibility to create a wide variety of applications was what really made it attractive?

The selection of the 8086 for the IBM PC, and a wide range of decisions and market forces made it what it is.
It was neither the first nor best architecture where this would have been possible, not even from Intel at the time. It was economical, though.

And three years ago you would probably have said they're parking at two cores and the IGP is part of the memory controller...

Three years ago AMD had just barely bought ATI and gave little more than theoretical promises for Fusion. I would have gone by whatever the roadmaps said, with a fair bit more salt to be taken with AMD's.
The roadmaps for the mobile and mainstream markets have dual and quad core chips with a GPU on die.
The promise for a fully integrated core has not been spelled out in detail, and for several chip generations it remains as I've said. If AMD and Intel had released a roadmap at the time, I would have likely capped the number of cores to 2 because at the time AMD hadn't delayed Fusion by a full process node and would have been unable to fit more cores on it.

I'm sorry but you fail to see that things have always and are still evolving towards unification of the CPU and GPU. Sure, it will take several more generations but it's not like the convergence will happen overnight and we shouldn't pay any attention to generic multi-core processors in the meantime.

For research and development purposes, it seems worthwhile.
Productizing it ahead of the mass presence of manycore CPUs would not lead to significant change.
I think it might be worthwhile focusing about the mass volume of chips with an on-die GPU.

Exactly. It enabled not just one particularly innovative applications, but many things. And in the same way fully generic hardware will enable a lot of new things. Things we can hardly imagine right now. But just like the punch-card guys realized that a stored program architecture would quickly become indispensable despite some added complexity, I don't have the smallest doubt that in ten years or so from now it will be unthinkable not to have fully generic hardware.

That will depend on where process scaling and power consumption go, and how intensely the mobile market will push the limits down.
Tim Sweeny indicated as much in one of his future-looking statements.
Whatever schemes turn out to be winners in the future will have the influence to warrant special-purpose hardware.
Power limits or problematic processes can also enourage specialized hardware, and we'll need to see where that leads.

I imagine a few people thought after the 486DX that it would be the last time an FP processor would have to be brought on-chip.

And there's a massive swath of software our there that was written for fixed-function pipelines that today is totally worthless.

Worthless because it can't run on current GPUs, or worthless because nobody wants to use them?
Backwards compatibility can be a bear for many older apps, for which graphics is one of the more common but not the only reason why I can't run them. At least in theory there should be some means of backwards compatibility within the implementations and driver frameworks.

I still have a laptop with an old Pentium M that comes in handy from time to time, and still runs a wide variety of old and new applications. The only thing that has become outdated very quickly is its graphics chip. Nobody writes new Direct3D 8 applications, and it can't run Direct3D 9 applications even at the lowest resolution because it's not fully generic. The CPU is the only reason I kept this laptop for so long.

This is a good point in favor of CPUs. I think you've made a lot of hardware manufacturers and software devs unhappy, though. They would have liked you to buy a new everything every product cycle or so.

So there's not a single reason to panic about losing an established knowledge base when hardware becomes more generic. The legacy APIs will still remain available for as long as anyone cares to use them. But you won't be restricted to what the hardware can do when you want to create something new.

Or they cannot wait until the next revision or can't do without this something new. That something new is what I am curious to see.

There will be a handful of vendors for rasterizers, a handful of vendors for ray-tracers, a handful of vendors for voxel renderers, a handful of vendors for physics engines, a handful of vendors for movie editing software, a handful of vendors for A.I. libraries, a handful of vendors for complete game engines in a certain genre, etc. So yes it will be very different from today. And creative minds like Tim Sweeney have always pushed the hardware to its limits so I very much look forward to the kind of things he will create with generic multi-core hardware. All this will enable an explosive diversity of new applications compared to today.

There will be a handful of vendors with products that have access to hardware accelleration coexisting with these, which will place a significant amount of back pressure on these other vendors.
Voxel renderers, I dunno. Some are exotic enough that even Larrabee would not be the best fit.

Larrabee delayed to 2011 ?

aaronspink

MfA

3dilettante

Andrew Lauritzen

Moderator

3dilettante

Andrew Lauritzen

Moderator

aaronspink

Andrew Lauritzen

Moderator

Nick

MfA

Nick

Andrew Lauritzen

Moderator

aaronspink

3dilettante

Nick

nAo

Nutella Nutellae

3dilettante

Nick

MfA

3dilettante