I'm 100% certain that ray tracing is NOT the path Intel is pushing towards.
I'm not sure I would say they're not pushing towards it, but I would definitely agree that I see no serious indication whatsoever that they are really pushing towards pure raytracing with the 2010 incarnation of the Larrabee architecture. As for when they really will be, who the hell knows. So overall I guess we pretty much agree here.
Lets take your shadow point further: Larrabee is 16-way SIMD, so the only efficient raytracing will be packet tracing with a multiple of 16 non-divergent rays at a time. Secondary rays simply don't high enough computational locality to be done, they are just too divergent. Re-sorting secondary rays into non-divergent ray packets is way too expensive (doesn't map well to SIMD).
I think I'd agree 100% with you if I agreed that Larrabee is 16-way SIMD. Remember Larrabee is a CPU, so it has scalar MIMD units in each core too. If they were smart enough, they can benefit from this hybrid MIMD-SIMD approach significantly for both raytracing and physics in general. Whether they were smart enough is another question completely, but if I was managing a strategy team at NVIDIA or AMD, I certainly wouldn't want to underestimate Intel too much in that respect.
Back before the G80 launched, I was a proponent of *not* going unified because I felt that flexibility (including GPGPU flexibility) would actually increase through a hybrid MIMD-SIMD solution rather than a plain SIMD one. Of course, ideally in terms of flexibility you'd go scalar MIMD all the way (with VLIW or other more exotic approaches), but that is another debate completely.
The true strength in rendering is that most computations are kept with good 2D computational locality and you can use logarithmic reductions in complexity to simulate tough to compute phenomenon.
Yes, even if Larrabee can benefit from a hybrid MIMD-SIMD approach, it won't magically be able to fix the locality problem in terms of memory bandwidth burst size - there simply is no magical solution to that problem. Certainly you can cache the top levels of the acceleration structure, but that doesn't fix all of the problem.
Getting back to Larrabee, I bet Intel is going to simply try and dominate the middle-end GPU by getting Larrabee on all motherboards and getting Larrabee II into the next XBOX.
Not a bad bet, I'd wager.
I wouldn't be surprised (Intel has the Project Offset team, TomF, and others) if Intel plans to release Larrabee at the same time with their own optimized in-house rendering graphics engine for developers.
And that's an excellent bet, IMO. A few years ago, I started thinking that given the direction the console market was going and the rate at which development costs were going up, NVIDIA and ATI absolutely needed to create free or near-free AAA middleware to spur growth in the market. And now it looks like it's Intel which has adopted that strategy instead, at least up to a certain extend - I certainly can't help but congratulate them for doing what I thought their competitors should be doing all along!
Reducing the need for external developers to tackle the tough parallel programming problems and simply allowing them to maintain the lazy C++ fine grain locking programming style parallel programming that most devs want for all the core game code (Larrabee is x86 after all). While x86 may be a crux for performance, it is great at making lazy programmer's code perform well.
Yeah, I agree with that. Although I'd like to point out that in terms of perf/mm²/(amount of effort), x86 is a complete and utter joke, and x86 many-core is even more of a joke. It's not the right direction for the industry to take, but this is a complex debate and I don't feel it's either the time nor the place to have it.
Personally I would rather manage my own cache/localstore and my ideal CPU+GPU machine would be something very similar to a mix of the Cell and CUDA, but that kind of thinking is so very rare these days!
Yeah, that'd be pretty cool. Personally, my ideal processor is a mix of PowerVR's SGX and Ambric's technology. That's a complex debate once again though, so if you want to have it we can create a new thread for that. Alternatively, I was thinking about writing an article on that kind of stuff...
1.) Ability to run more than one shader program at a time on one of the cores (meaning the core would have to schedule from at least a second instruction pointer). This would allow for interleaving of programs so you could pair a high TEX or ROP bound shader with a high ALU bound shader and better keep the various pipelines fully saturated. Would solve lots of problems such as the problem of filtering operations always being TEX bound, g-buffer creation being ROP bound, etc... (but might not be worth the extra hardware complexity).
Uhm, isn't this the case already? It *is* limited, but it does happen as far as I can tell. You can certainly have one PS program and one VS program running on the same multiprocessor on G8x or at the same time on R6xx.
2.) Ability to use a shader shared local store (like CUDA). Have support on shipping hardware, just an API issue.
Agreed.
3.) Ability to use a programmable read/write surface cache (useful for a programmable ROP, etc). Probably already planned for future hardware probably just a question of API exposing the functionality.
There are problems with that, but I agree it's necessary, and I also agree part of the problem will be API exposure.
4.) Ability to do process control from the GPU instead of requiring CPU to build a command buffer.
Agreed, having MIMD on the same chip as where you're doing graphics processing is very important in my book.
5.) Double precision. Probably here 2008 or 2009.
That's in GT200, see the related thread...
Sorry if this reply might have felt a bit critical or negative, I do believe your post was very good - although I obviously have a different opinion of the MIMD capabilities of Larrabee. For Intel's sake, I hope I'm right, but that doesn't mean I am obviously.