22 nm Larrabee

Voxilla · Dec 26, 2011

rpg.314 said:
So RISC, RISC and probably RISC ...

Not so much RISC but more SISC, Simplified Instruction Set Computer

MfA · Dec 26, 2011

Voxilla said:
Any one had a look at latest ARM 64 ISA ? At least twice as good as x64 I would say.

Meh, ARM has way too much deadweight for truly lightweight cores ala Cell SPUs ... and once you hang some massive width SIMD unit off the side to compensate the differences are rather academic.

denev2004 · Dec 26, 2011

I don't think ISA has much thing to do with register file as long as they have a same approach through decode unit.

Anyway, Decode could be a part that ARM has strength.

Voxilla · Dec 26, 2011

MfA said:
Meh, ARM has way too much deadweight for truly lightweight cores ala Cell SPUs ... and once you hang some massive width SIMD unit off the side to compensate the differences are rather academic.

Yes indeed, Cell is dead end.

Voxilla · Dec 26, 2011

denev2004 said:
I don't think ISA has much thing to do with register file as long as they have a same approach through decode unit.

Anyway, Decode could be a part that ARM has strength.

Don't you have to encode the number of instruction registers somewhere in a 32 bit word ?

MfA · Dec 26, 2011

Voxilla said:
Yes indeed, Cell is dead end.

If you think narrow (ie. 4 way SIMD or better 4 way VLIW) cores are a dead end then what use is there for ARM? A slight improvement in area efficiency in what is only a small part of the core doesn't do you a lot of good ... and that is what ARM and x86 are in a Larrabee type architecture.

Voxilla · Dec 26, 2011

SIMD is architecture ARM,x64 are examples of that, the former has NEON

denev2004 · Dec 27, 2011

Voxilla said:
Don't you have to encode the number of instruction registers somewhere in a 32 bit word ?

Isn't that store in the middle part after decode?

Voxilla · Dec 27, 2011

Well, fixed instruction length allows hardwired, no need to decode.

Nick · Dec 27, 2011

Having more than 16 architectural registers is a waste of encoding space. The only reason to desire more would be to perform unrolling for latency hiding... But there's a more efficient way to achieve that, and it's likely going to be supported by AVX: It can be extended to support 1024-bit operations, which can be executed on 256-bit units in four cycles. This only takes one more bit of encoding space, but offers four times more register space for implicit 'unrolling'.

Voxilla · Dec 27, 2011

Nick said:
Having more than 16 architectural registers is a waste of encoding space. The only reason to desire more would be to perform unrolling for latency hiding... But there's a more efficient way to achieve that, and it's likely going to be supported by AVX: It can be extended to support 1024-bit operations, which can be executed on 256-bit units in four cycles. This only takes one more bit of encoding space, but offers four times more register space for implicit 'unrolling'.

You are smart Nick, but don't think you are smarter than ARM engineers. This new 64 bit ISA is exactly what I hoped for, and I have quite a bit of experience in ARM coding.
There are 32 scalar registers and 32 SIMD registers, even FMA with 4 registers in one fixed 32 bit instruction...

denev2004 · Dec 28, 2011

Voxilla said:
Well, fixed instruction length allows hardwired, no need to decode.

Well..that's what I mean, no decode unit can be a strength of ARM.

Nick · Jan 22, 2012

Could anyone help me interpret the following: http://www.indeed.com/r/b1b4922f298d8c33

"Graphics and Media cluster's micro-architecture validation for iLRB (Larabee-3 slice for Haswell Client product and Discrete Larabee-3)"

It makes no sense to me that Haswell would get a Larrabee-based IGP, considering the gather and FMA support for AVX2 and its future extendability to AVX-1024 which would lower the power consumption.

rpg.314 · Jan 22, 2012

Interesting. That schedule suggests a lot of confidence in LRB3.

I am hoping that Intel has added low level hooks for binning, rasterization and other hw and opened them up to compute.

Gipsel · Jan 22, 2012

Nick said:
It makes no sense to me that Haswell would get a Larrabee-based IGP, considering the gather and FMA support for AVX2 and its future extendability to AVX-1024 which would lower the power consumption.

I told you so.
SCNR.

dkanter · Jan 23, 2012

I got news for everyone. ARM has decoders, and they have ugly variable length instructions.

It's obviously cleaner than x86, but it's all a matter of degrees.

DK

3dilettante · Jan 23, 2012

Outside of old-timey VLIW architectures with instructions that were the control signals for the ALUs, some amount of decode hardware is necessary.

The iLRB3 entry in that profile is interesting. I have not seen much detail on what is planned for Haswell's graphics slice. The terminology for the various graphical grades is similar to Sandy Bridge and Ivy Bridge, but would not be indicative of what hardware falls beneath the labels.
Earlier rumors hinted that there was some back and forth between initiatives based on GMA or LRB for what would go on-die, and iLRB3 may have been the contender from the LRB side.

One possibly tempting reason to include it would be if LRB3 is aligned with Haswell's new instructions, or is part of the alignment process.
It might also lead to questions on where the most aggressive gather implementation would be on the Haswell die, if LRB found its way there.

Nick · Jan 23, 2012

Gipsel said:
I told you so.

Told me what, precisely? Hats off if you got it right but let's not sell the skin before the bear is caught. Larrabee got cancelled and the plans to integrate it into Haswell may have gone down with it. Or it could just be the codename for the next evolutionary step in the GMA product line, without x86 compatibility.

Even if it's truely Larrabee, possibly with a hardware rasterizer and such, it could still be an intermediate step toward homogeneous computing. Then I would go "I told you so" several years later... ;-)

Nick · Jan 23, 2012

dkanter said:
I got news for everyone. ARM has decoders, and they have ugly variable length instructions.

It's obviously cleaner than x86, but it's all a matter of degrees.

Are you referring to Thumb and AArch32/64?

Nick · Jan 23, 2012

3dilettante said:
One possibly tempting reason to include it would be if LRB3 is aligned with Haswell's new instructions, or is part of the alignment process.

AVX2 isn't power efficient enough yet. We need AVX-1024 for that. But yes it could be part of a longer term convergence.

It might also lead to questions on where the most aggressive gather implementation would be on the Haswell die, if LRB found its way there.

According to Tom Piazza the theoretical scatter/gather performance of Ivy Bridge's IGP is 32 times higher than Sandy Bridge. He also reveals that in practice it's lower due to bank conflicts.

AVX2 doesn't support scatter, only gather. A one cacheline per clock implementation makes the most sense. It doesn't require any changes to the cache itself and doesn't complicate coherency.

22 nm Larrabee

Similar threads