You only do a recompile when you need a specific piece of new functionality or a performance boost.
Well, if you don't care about the overall performance or the vector FPUs, why not just run your code on a vanilla ARM core?
I get your point, but you're falling in the same trap that plagued the Itanium design team, IMO. Binary compatibility is only appealing if it delivers 'good enough' performance. For a chip that is exclusively aimed at high-performance workloads (whether that is HPC or Graphics), there is no such thing as 'good enough'.
As I said in my response to Rys, it's all about the x86 tool chain, and the relative painlessness of extending it to support new ISA features
That is one thing I completely and utterly agree with. The tremendous investments in toolchains for x86 are certainly an advantage, although I'd also like to point out that it's not perfect yet for multithreaded applications, and that debugging programs with tens of threads can be a nightmare right now IMO. I would certainly hope and expecvt this to be much easier in 2009+ than today, however.
Ultimately, x86 becomes attractive in any given niche--whether it's HPC or ultra-mobile--at the exact moment that you're no longer at a real performance disadvantage for using it.
Which corresponds to the exact second when performance becomes 'good enough', because x86 can never be optimal. While this indeed makes it attractive for the ultra-mobile market in the long-term, the very definition of HPC tends to be that there is no such thing as 'good enough'. The only reason (except the toolchains) why x86 is attractive in HPC today is that it has better economies of scale in terms of *production* and R&D. You know, the exact same ones GPUs also enjoy today...
So the moment that Moore's Law makes it possible to use x86 in an area without suffering too badly from a relative performance standpoint, then it becomes a compelling choice for these scale-based, ecosystem reasons.
Moore's Law implies nothing regarding relative performance penalties. If you are 50% less efficient, that won't magically change when you're thinking 32B transistors vs 16B transistors compared to when it was 32M vs 16M. As such, x86 only becomes attractive when it either has economies of scale (for production + R&D) that other solutions do not enjoy or that performance has become 'good enough'. In the case of Traditional GPUs vs x86, both of these potential advantages do not exist.
I mean, nobody really /wants/ to use an intermediary ISA, or JIT, or anything like this, if they could just as easily use a product that natively implements the world's most popular ISA.
That is correct, but if and only if perf/$ and perf/watt are roughly similar.
I'd love to hear your arguments in favor of investing, say, an substantial portion of a large company's developer resources in a proprietary, intermediary ISA when there's an x86 solution that gets you, say, 80% there.
First, let me counter some of the negative aspects you're pointing out. It should be noted that PTX is not proprietary, so whether AMD and Intel support it is really up to them. In the end, there is nothing that prevents interested parties from writing an efficient PTX-to-x86 converter. And if NVIDIA feels that would actually put their hardware in a good light, they could even easily do it themselves.
Really, your entire arguement there is based around three points, so I'll answer one by one: a) x86 has a much better toolchain today. b) x86 is easier than alternatives because everyone is used to it. c) JIT-like techniques nearly never worked before, why would it suddenly make sense?
- A: NVIDIA and AMD have every interest in the world to invest aggressively to reduce the gap there between now and 2009/2010. I doubt they'll get there, but I don't think anyone can deny that it will be less of a problem (or advantage, from Intel's point of view) in that timeframe.
- B: Everyone is used to the latest architectures implementing x86, not the ISA itself. Optimizing for Larrabee and optimizing for Conroe are so fundamentally different tasks that you'll basically have to relearn everything, as far as I can tell. Abrash might be at an advantage here for various reasons, but I'm very skeptical about the rest of us.
- C: Traditional JIT languages only execute each code fragment a small number of times. Just doing the final stages of optimizations and compilation before running the program on a GPU is not the same thing at all, because that exact same code will be run thousands, or millions, or even billions of times. The overhead is pretty much negligible, and you can gain a lot from that extra bit of optimization.
In the end, I think many of your arguements pretty much fly out of the windows when you consider how large the GPGPU market will likely be by 2H09, because Intel won't have anything to compete with that before then. We'll see how fast that goes, though.