Again it depends on what you want to do with the chip. If one does decide to write an engine from the ground up designed around the programmability Larrabee brings to the table then you can't rule out x86 optimizations as being beneficial. How beneficial is certainly a valid question and I don't have the answer for that.
I guess the point is that in a worst case scenario in which you gain nothing from x86 support, you also lose nothing and at least you have the option.
But what would those be for a very basic in-order chip?
It wouldn't take very long to exhaust them, and it's no advantage because these optimizations have been well explored for every other chip of this type regardless of ISA.
So you think Larrabee would have been available by now and have optimized drivers if it wasn't x86? I really doubt that, but feel free to "demonstrate" otherwise.
New graphics product lines in various market segments have been conceived, deployed, and replaced in the time frame it's taken from initial design to this delay, and a generation or two more will have elapsed by the time Larrabee III comes out.
All I need to demonstrate is that the rest of the graphics world has not taken its sweet time.
The
scalar unit is dual-issue. To compilers it's a P54 with 64-bit support.
Intel slides and Larrabee dev statements have been consistently touting a scalar x86+VPU issue, with the possibility of performing a vector store using the scalar issue slot.
This question has been brought up before, and the answers so far have been pointing to a more restricted scalar pipe.
I've asked this question repeatedly, and while I admit I have no definitive way of determining if I've asked the right people, the answers so far have been consistent.
I'm quite willing to revise my estimate if an official source confirms dual-issue for the scalar side.
Hiroshige Goto's diagrams are not official Intel sources, as far as I know.
I wasn't comparing x86 decoders directly to RISC decoders. I was talking about x86 decoders compared to the entire Larrabee die, and how much you'd save when using ARM decoders. It's not like you'll be able to fit many more cores onto the same die.
The penalty for x86 applies to the entire x86 pipeline. There are other wrinkles to the design that bloat the rest of the chip.
As a result, a contemporaneous RISC chips
in total was about a third smaller the the original Pentium. There may be conceivably more obscure RISCs than could have done the same in even less space.
Given that 2/3 of each core is not the vector unit, even if the vector pipe was in no way oversized because of some additional x86 complexity, close to half of Larrabee in total (assuming it devotes about 2/3 of its area to cores) is possibly a third too large.
I have to add a lot of "possibly" and "maybe" caveats because we don't have a physical exemplar to analyze and compare, which is why I am very disappointed about the latest Larrabee announcements.
No it's not. The Direct3D API is allowing more control over the hardware with every new generation. OpenCL is also a relatively thin API, putting a lot of things into the hands of the developer. Yet at the same time we see many game studios use off-the-shelf engines. This acts like middleware which saves the application programmer from the increasingly complex task of putting pixels on the screen.
This permits greater algorithmic control. So long as the abstraction layer exists, potentially behind a compiler layer, x86 itself doesn't affect the developer.
With Larrabee the ability to develop custom middleware is boundless, offering application developers a very wide range of possibilities.
The market reality is that there is not a boundless need for custom middleware. Vast segments of the market get along fine and do not care for the lack custom middleware, or content themselves to a few significant abstraction layers.
The complexity of direct access to the CPU never stopped anyone either.
That's because most don't care to go that far. The extra effort and the risk of tying the product to a single physical implementation yields too little reward.
There won't be room for another API/ISA. And even though x86 has some overhead, so does supporting Direct3D and OpenGL on a GPU. There are tons of features that are not commonly used, that still take die space.
Larrabee's target performance put it on par with GPUs between 2/3 to 1/2 its size, and it failed to even hit that mark.
I think we know where the winning side of that argument lies at present.
If Larrabee fails it will be because the initial performance/price can't get them enough market share to create momentum in the software world. So getting that initial renderer software right is critical. It might also of course also take some hardware revisions.
So the design decision that bloats a massive chip by 10+ percent and possibly caps its power-constrained performance by tens of percent right from the outset has no bearing on this?
I do believe x86 alone is not the primary contributor, though it probably contributed to the decision to have a 1:1 relationship between the number of fully featured cores and hardware cores. That decision probably bloated things by tens of percent over whatever x86 already contributed.