LRB - ditching x86?

rpg.314 · Jun 9, 2009

Here

One critical point we were told was that 1st and 2nd generation Larrabee GPUs will not be compatible with 3rd generation Larrabee.

According to the data, Intel's 3rd generation part will have an emulation mode for backwards compatibility.

and here

We were told that Larrabee is currently only capable of performance levels similar to Nvidia's GeForce GTX 285.

If it's the 600mm2 part, who the hell they expect to pay for this spectacular piece of crap?

EDIT: Perhaps then, they are holding out to release it on 32 nm.

3dilettante · Jun 9, 2009

It might be the other way around.

LRB new instructions are not compatible with the main line of x86 ISA extensions.
Intel is possibly holding back from deploying Larrabee more widely to avoid fracturing the field with another x86 extension.

If the third generation of Larrabee has cores that can be used in consumer CPUs, then it would probably happen after Larrabee and the x86 main lines hit some convergent ISA extension.

rpg.314 · Jun 9, 2009

Well, if they wanted to introduce LRBni to the desktop, surely, they would have thought of it before they forked the x86 ISA. While they can emulate LRBni using AVX2 or some such thing, they may be actually trying to emulate the legacy x86 crap.

Panajev2001a · Jun 9, 2009

3dilettante said:
It might be the other way around.

LRB new instructions are not compatible with the main line of x86 ISA extensions.
Intel is possibly holding back from deploying Larrabee more widely to avoid fracturing the field with another x86 extension.

If the third generation of Larrabee has cores that can be used in consumer CPUs, then it would probably happen after Larrabee and the x86 main lines hit some convergent ISA extension.

Please do correct me... because it might be very naive of me and I'd like feedback, but historically the x86 front-end tax became a non-issue against the major RISC players because as CPU's got larger and larger, transistor budgets ballooned up, and more and more % of the chip's area was spent in cache, execution units, branch prediction, out of order issue and execution logic, etc... the x86 decoding part (essentially decoupled from the rest of processor pipeline after the Pentium Pro days) became almost a non-factor in terms of CPU cost and chip's real estate.

In the many-core era does not that problem get worse and worse?

Say that feature X costs you 0.01% of your total core's budget and you deploy 32 cores on a single chip or more... that seemed small for a single core, but over 32+ cores that does add up.

I like the idea of x86 compatibility, but I feel that technology wise ARM (Intel still has a license) could have helped make the chip smaller without sacrificing performance.

What do you think?

3dilettante · Jun 9, 2009

rpg.314 said:
Well, if they wanted to introduce LRBni to the desktop, surely, they would have thought of it before they forked the x86 ISA. While they can emulate LRBni using AVX2 or some such thing, they may be actually trying to emulate the legacy x86 crap.

Larrabee's somewhat of a bastard child of x86 that was conceived as a thing apart and not welcome even now.

Its support of x86 past the original Pentium is zero, and other elements of Intel do not want to expose it as a general programming target.

We'll also have to see where the opcode space is allocated LRBni. Given the weak coordination between the main line and the Larrabee team, we don't know what LRB encodings might already be taken up by existing x86 extensions.

3dilettante · Jun 9, 2009

Panajev2001a said:
In the many-core era does not that problem get worse and worse?

Manycore does force things back to how they were before bloat hid the cost of ISA complexity, and its primary way of scaling means that the proportion does not shrink with transistor budgets.

I like the idea of x86 compatibility, but I feel that technology wise ARM (Intel still has a license) could have helped make the chip smaller without sacrificing performance.

What do you think?

A while back I tried to guess at what the cost could have been.
Comparing a Pentium to a roughly contemporaneous RISC lead to an estimated 12-16% penalty in die area.
Some things have come to light that make the guess less applicable, such as the fact that Intel has narrowed the standard x86 issue capability to 1 instruction, with a possible commensurate reduction in the amount of hardware on the x86 side of the core.

edit:
Other confounding factors are that at the time I didn't know if there would be texturing hardware and what proportion of the die woudl be taken up by other things besides cores, which reduces the proportion. It might be something like 10% in aggregate.
What power penalty is something I don't have the data to calculate, and it would be dominated by the vector unit.

We wouldn't really know without Nvidia or somebody springing an ARM Larrabee on us.

edit edit:
Although, the die shots show that the L2 is smaller than 1/4 of the core+cache tile area, which means with the vector unit taking up 1/3, the area that is x86 is actually somewhat bigger than what I guessed at.

TimothyFarrar · Jun 9, 2009

"One critical point we were told was that 1st and 2nd generation Larrabee GPUs will not be compatible with 3rd generation Larrabee."

I'm missing why (even if this is true) that one wouldn't expect something like this anyway. C++ with vector intrinsics isn't exactly designed to be ideal even within the same architecture over generations, got DX11/OpenGL/OpenCL for that (and even that isn't ideal either). Given AMD's entrance of 64bit extensions, different cacheline sizes, all the changes to SSE over the years, x86 hasn't ever been future safe. Even with x86's backwards compatibility, it has still been a good idea to recode performance critical assembly or intrinsics code for different generations of x86 arch, so who cares if the Larrabee ISA changes with different generations!

3dilettante · Jun 9, 2009

There are subtle difference between new and older x86 chips in various places that can cause problems, even with backwards compatibility.

To state that there is an actual break in compatibility is indicative of something more substantial than a tightened specification or differing instruction corner cases.
It means something more significant might happen at that point for Larrabee, possibly more widespread use.

MfA · Jun 9, 2009

3dilettante said:
If the third generation of Larrabee has cores that can be used in consumer CPUs, then it would probably happen after Larrabee and the x86 main lines hit some convergent ISA extension.

It would be a bit depressing to see the Larrabee ISA hubbled with all the superfluous legacy of umpteen generations of x86.

rpg.314 · Jun 9, 2009

I like the idea of x86 compatibility, but I feel that technology wise ARM (Intel still has a license) could have helped make the chip smaller without sacrificing performance

.

I don't like the idea of x86 compatibility, but yes an ARM LRB would be great.

But even there, there are many questions. IE, would you like to have the Jazelle, Thumb, NEON etc.?

Although, the die shots show that the L2 is smaller than 1/4 of the core+cache tile area, which means with the vector unit taking up 1/3, the area that is x86 is actually somewhat bigger than what I guessed at.

OK 1/4 cache + 1/3 VPU ~ 58% is useful area. The rest of x86 takes up, 42% which is massive.

Minimizing this waste could help LRB catch up. Remember, they need (apparently) 600 mm2 on 45 nm to catch up with at ~480 mm2 chip on 55 nm.

And it would probably need 300W to stay alive. :runaway:

rpg.314 · Jun 9, 2009

MfA said:
It would be a bit depressing to see the Larrabee ISA hubbled with all the superfluous legacy of umpteen generations of x86.

Hmm, let's see. MMX for float 2, SSEx for float4, AVX for float8 and LRBni for float16.

Spaghetti, anyone?

3dilettante · Jun 9, 2009

MfA said:
It would be a bit depressing to see the Larrabee ISA hubbled with all the superfluous legacy of umpteen generations of x86.

I am currently at a loss for alternatives that would match making an incompatible x86 Larrabee chip.

Shifting the Larrabee instruction set so that it can be used more widely might justify breaking the compatibility with the insular accelleration board variants.
There hasn't been any comment about something frightfully wrong with the current instructions, so why remove compatibility on a whim?

MfA · Jun 9, 2009

They might go narrower

Nick · Jun 9, 2009

rpg.314 said:
Hmm, let's see. MMX for float 2, SSEx for float4, AVX for float8 and LRBni for float16.

Spaghetti, anyone?

You're confusing MMX with 3DNow!

And AVX can be extended up to float32.

PhilTaylor · Jun 9, 2009

rpg.314 said:
Here

and here

I'd take some of this Tom's reporting with a grain of salt, as I said here, a few of their latest articles contradict each other.

Until Intel states what the ISA compatibility story is from LRB1 to LRB2 and LRB2 to LRB3, this story is just speculation.

rpg.314 · Jun 9, 2009

Nick said:
You're confusing MMX with 3DNow! And AVX can be extended up to float32.

In a LRBni compatible way? I mean with all the masks and swizzles etc.?

Barbarian · Jun 10, 2009

rpg.314 said:
And it would probably need 300W to stay alive.

Now, now, this is pure speculation. Let's wait for the actual hardware before declaring winners.
I personally would be very impressed if they get a Larrabee implementation matching GTX 285 - with completely software DX10 driver mind you.
If this kind of performance can be achieved while emulating a hardware oriented API, I can't wait to see what direct low level access can get you in terms of performance, quality and features.

rpg.314 · Jun 10, 2009

Barbarian said:
Now, now, this is pure speculation. Let's wait for the actual hardware before declaring winners.

Fair enough.

I personally would be very impressed if they get a Larrabee implementation matching GTX 285 - with completely software DX10 driver mind you.

To each, his own. I'd be disappointed if they brought a product to the market with ~1.3x the area on a smaller and more mature process. And let's face it, what can LRB provide you as a programmer that GT200 can't (from a programmability POV). I mean, what the hell is the point of a super corei7 with some nice vector ISA if it deosn't deliver kick-ass perf.

Do you really care if it is x86 compliant. Are you going to call mmap() or the BCD arithmetic instructions from a shader? (or from a CUDA/opencl kernel). It doesn't (or can't do) something that it's competitors can't.

To be fair though, I am really impressed with the automatic load balancing and the multi-LRB scaling possibility.

If this kind of performance can be achieved while emulating a hardware oriented API, I can't wait to see what direct low level access can get you in terms of performance, quality and features.

This perf is from a direct low-level access. After all, Abrash and co. were involved in their graphics pipeline implementation.

Davros · Jun 10, 2009

A noob question if I may

I'd buy larrabee to replace my gpu / cpu or both ?

rpg.314 · Jun 10, 2009

Good for you then. I'd prefer to have something which has better perf/price. I really don't care if my GPU can run MS Word.

LRB - ditching x86?

Similar threads