Intel has to deliver a GPU to create market share. There are two aspects to this: hardware and software. The hardware itself is finished, demonstrated good synthetic benchmark results, and is ready to deliver for the HPC market. So clearly they are confident about that aspect. My "belief" is there must be a software issue.
maybe. then again, there may be hw-posed issues for which you may never find a satisfactory solution in sw. as you said, since it's always the combination of hw and sw, a potential design issue discovered late in the hw (due to late sw, etc) could amplify the burden on the final sw by orders of magnitude. but since you brought up Abrash' role in the project - if Mike could not deliver a satisfactory rasterizer running on this hw, then maybe the hw was not meeting its GPU-domain targets?.. just speculating here, of course.
Any other ISA would have made it much harder to deliver a part which would receive good reception from the software world. In the long run that's more critical than anything else. One year of delay is nothing compared to the competition's ongoing struggle to gain some acceptance.
i really don't understand your logic about ISA acceptance - ISA's today are mostly judged by how well they play with their compilers (from coders' perspective), and meet their price/performance/wattage envelopes (from EE/business perspectives) - not by their proximity to their 40th anniversaries. look round you - chances are you'll find more devices hosting 'young' ISAs than such hosting x86 or older. so what reception and by whom are you concerned about?
The scalar cores have poor power efficiency, but again, most work takes place in the vector units. The texture samplers and memory controllers are also no different from other architectures. So another ISA for the scalar cores isn't suddenly going to dramatically improve overall power efficiency.
dramatically - no. but then again LRB's scalar ISA did not
have to copy in-verbatim any existing ISA - intel could have used that to their best advantage if they hadn't been so concerned with the central socket. heck, they could've used a modernized form of their dear x86 (say, x86-64) and shaved off the legacy, trimmed the pipeline, retooled the protection mechanism - any/all of the above, just to make LRB a better GPGPU core, while staying in familiar waters. but nooo - it had to be the word97-compliant GPU.
Because Intel has done many projects like this before... ?
creating a quirks/bugs-free design off the bat, and tracing such issues fast in your own designs are two different things. yes, intel have done a few CPU designs, so at least one could expect them to be able to identify an issue quickly, once the latter has been registered, and the symptoms - understood. of course, spotting an issue and resolving it are yet other two different things - there are potential issues that could invalidate fundamental design decisions.. like a shit-choice of a housekeeping ISA (just pulling a wild example here).
It's a hardware company. They didn't hire Abrash and other software rendering gurus for nothing. It's a very complex task and even for these gurus it takes a lot of time to try different approaches and achieve high performance. Like I said before, the parameter space is massive and there isn't one straightforward way to do things.
i agree, the task could become prohibitively-complex if they were seeking for that head-shot at the GPU - 'behold, we're the new GPU masters!' - again, i don't think anybody (clinically sane) expected LRB to dethrone any GPU heavyweights - if intel themselves had such expectations then maybe they were not familiar with the problem domain they were getting involved in. again, not saying they had such expectations - just trying to carelessly speculate of the events we've witnessed lately.
You really think another ISA would have solved the abysmal performance of Microsoft's reference rasterizer? First and foremost it's a software issue. Other software renderers are over a hundred times faster. Although REF clearly wasn't written with performance in mind, they didn't make it slow on purpose either. So this illustrates that there's a massive difference between optimized and unoptimized code.
i'd venture to
guess ms' ref performance issues are mostly algorithmic, and only secondly - of insufficient clock-counting. IOW, one cannot say that had they 'coded to the metal' of
any ISA they'd have achieved much better peformance. how does that prove the (non-)fitness-to-a-task of an ISA, though?
You're starting with the wrong assumption. Any ISA would do. So it saves time and money to use the ISA and tools you already have. And where it will really save time and money is to create a software ecosystem.
any ISA? really? tell you what, let's pick the good old 65816, slap a non-braindead SIMD extension on it and set a strong foot on the GPU turf overnight! as a proof of our dedication to the software world, we'll give them free binary-level appleGS emulation! what says you?
Who said anything about running it faster? It's about being able to run it at all with little or no changes. This is really going to motivate developers. Nobody likes spending months rewriting all code for GPGPU to get a first result, only to realize that their approach doesn't perform as expected and they'll have to rewrite some of it.
maybe i misunderstood you, but i believe you mentioned something about 'incremental performance improvements of present code' in your original post, ergo my performance reference. pardon me if i've erred.
There's another 90/10 rule here: half of the time to create a project is spent on the first 90%, the second half is spent on the last 10%. With x86 compatibility developers can skip rewriting 90% of the code and stay motivated to tackle performance issues, which is the only thing they were really interested in anyway.
so let me know if i got you right here: developers would really like to spare themselves a trivial recompile of their existing scalar code to a new scalar ISA, but they would eagerly face the challenges of a brand new VPU which, apropos, is only meant to carry the bulk of the workload? hmm..
Memory management, string manipulation, data conversion, containers, system methods, search algorithms, math routines, etc. You want to run them on Larrabee because the latency of a round-trip to the CPU is much slower.
i wonder how often a developer uses those in binary these days. i know for myself that i haven't used much of a container, a string, or anything of this caliber in binary form in my code for the past 10, if not more, years. the little that has still happened to be in binary has been deeply entrenched in OS services. oops, did i just lock my LRB development to some dated OS? sorry, clumsy me.
Where's the software for these parts? Can I open my C# compiler and start coding for it?
you can open your whatever compiler and start using the APIs these parts prudently offer. occasionally, you might have to resort to heresies like OCL, CUDA or native compilers *gasp*. regardless, it would be still a tad better than what you could do with a LRB today.
Larrabee will initially need experts to optimize various libraries and tools, but once available application developers can use them to add new functionality and create end-user applications. To dominate the computing world you have to cater for every developer. That's only possible by allowing direct access to the hardware and building various layers of software on top of it. Any ISA can be used for that, but x86 accelerates it by having a solid starting ground.
a solid ground of 99% HL-langues-encoded libraries? same libraries that run on every ISA and their dog today? these are indeed some fine 'solid starting grounds' anybody would gladly trade off whatever little ISA sanity they might have had.
What? A delay due to software issues?
what delay - the parts in question are on the market. the APIs are on the market. the native compilers are coming last, but you can ask intel how that generally goes. *wink*
LRBni is an extension, just like for instance SSE3. Did Intel face major issues when trying to introduce SSE3? No, the only thing of critical importance for those processors was the support of Pentium instructions. The existing ecosystem made it easy for developers to incrementally adapt their software for SSE3, where it mattered, instead of having to recreate everything from scratch.
issues with what - putting their next SIMD into their next CPU - no - they just sent it off the door - no issues. or do you mean issues for the developers - easy use of the new extension without having to (re)learn a new ISA? - let me see - no auto-utilizing/auto-vectorizing compilers for generations upon generations of the ISA, instead some rudimentary in-house libraries, but clear delegation of the onus of making use of the new extension to the app developers. which, combined with lack of basic capabilities commonly found in other SIMD ISA, can only mean no issues for the app developers. *grin*
let me ask you something: why, in your sincere opinion, developers embraced the GPU shader architectures, particularly after the advent of the HL shading languages? and why do you think intel was so determined to come up with a GPGPU of their own. i mean, after all they had the, erm.. fine GMA line of GPUs (more than half of the pc market - that's what we call a developers' embrace, right?), and they held the key to the central socket. so why?
I can't see how that relates to Larrabee. In the mobile/embedded market something like a 300% difference in performance/power is disastrous. For Larrabee the ISA choice has an impact of about 10% but it's offset by a massive advantage on the software front.
the only reason i brought it up was to improve your awareness of the current state of the GPGPU market.
so, another Q from me: what, again in your sincere opinion, failed this project? maybe Abrash & co's inability to pull a half-decent D3D/OGL implementation for it?
i mean, according to you, the performance/power ratio should've been ok (unless there was something deeply screwed up in LRBni, since the rest of the chip - namely x86 and caches - were just fine), the adoption of the programming model would've been fine (x86? - woot!). pretty much everything would've been roses. and yet, no LRB on the shelves after 3 years of focused effort (by some pretty smart individuals at that, where we totally agree). so why*?
* clearly it couldn't have been a delay in the native compiler /deep sarcasm