Larrabee at Siggraph

A minor note: according this thread on Ace's hardware forums Larrabee vector ISA is called LRBni, and it's different from AVX, but I guess we already knew that.
 
A minor note: according this thread on Ace's hardware forums Larrabee vector ISA is called LRBni, and it's different from AVX, but I guess we already knew that.

That's not the final name, it's 'Larrabee New Instructions', similar to 'Prescott New Instructions', 'Katmai New Instructions' and so on.
 
The initials fit.

The forking of vector sets is unfortunate. It sounds like Larrabee's vectorization capabilities are better than AVX, while AVX has a niftier and possibly more extendable encoding.
 
What you say is all true. I just think a design from the early 90's would be more at home in a museum than tomorrow's multi-teraflop C-GPU :p

More than anything such an old chip would be an excellent starting point. How many transistors did the original Pentium have, like 3M? Now how many of those can you fit into the G80's transistor budget, and those would be truly scalar units :oops:!!! Of course that's a poor comparison because adding the vector extensions, MT, and x86-64, and an L2 cache would leave it a little more bloated than 3M transistors per core but it gets the ball rolling.
 
Going by 1.4 billion for Nvidia's chip, and sandpile.org's listing of 4 million for the P54, 350 original Pentiums would fit.

They wouldn't be too useful, since they lack the other 26 million transistors Larrabee's cores have, and there's no other logic or interconnect to actually talk to them.
 
LOL. I was hinting to the fact that caches are more dense that logic, that's it!

True, but I was basing my numbers off transistor count, not area. So I still think the 350 cores would be a little useless, unless we say everything but the cores takes 0 transistors... ;)
 
True, but I was basing my numbers off transistor count, not area. So I still think the 350 cores would be a little useless, unless we say everything but the cores takes 0 transistors... ;)
Agreed. 32+ cores is more likely
 
Going by 1.4 billion for Nvidia's chip, and sandpile.org's listing of 4 million for the P54, 350 original Pentiums would fit.

They wouldn't be too useful, since they lack the other 26 million transistors Larrabee's cores have, and there's no other logic or interconnect to actually talk to them.

Larrabee cores are 30M? Where did you hear this?
 
Where was the source on the 30 million number?
It sounds reasonable, but I can't find the source of the figure.
I didn't see it in the larrabee pdf, but I may have overlooked it.
 
They wouldn't be too useful, since they lack the other 26 million transistors Larrabee's cores have, and there's no other logic or interconnect to actually talk to them.

Thanks...;) Took the words right out of my mouth. I think that for the sake of "conceptual simplicity" some of us just might be emphasizing the "simplicity" notion a tad too much. Things like cache and the glue logic to make the whole shebang work probably would take at least--oh, at least a dozen or so transistors, I should think...;)
 
The hints of Larrabee's vector functionality and scatter/gather memory access capability seem to indicate its vector functionality exceeds the limitations of current x86 SSE, so why expect current x86 tools to do it justice?

Not the current tools, but slightly updated versions of the current tools.

As Larrabee's vectors are easier than current x86 vectors (SSE), modifying the Intel C compiler (ICC) to support these new vectors shouldn't require a total re-write of the whole compiler. In fact, the existing auto-vectorization should map well to Larrabee. With a bit more tweaking, the ICC compiler should be able to vectorize even more loops (for example, ones with conditionals) to create more efficient vector code. The rest of the non-vector aspects of the compiler's code generation would be basically unchanged.

So, not zero effort in updating the tools, but much easier than making an entire EPIC compiler for a new ISA.
 
Hope Intel guys are working on a Larrabee implementation of the Ct language, unfortunately they don't mention it at all on their siggraph paper.
 
Hope Intel guys are working on a Larrabee implementation of the Ct language, unfortunately they don't mention it at all on their siggraph paper.
They do mention Ct, but they don't mention working on an implementation if that's what you meant.
Besides high throughput application programming, we anticipate
that developers will also use Larrabee Native to implement higher
level programming models that may automate some aspects of
parallel programming or provide domain focus. Examples include
Ct style programming models [Ghuloum et al. 2007], high level
library APIs such as Intel® Math Kernel Library (Intel® MKL)
[Chuvelev et al. 2007], and physics APIs. Existing GPGPU
programming models can also be re-implemented via Larrabee
Native if so desired [Buck et al. 2004; Nickolls et al. 2008].
 
Oh! Hm. Anyone know the transistor count for 256KB of cache, PCI-E bus and some memory controllers? :p Time to bust out the calculator..
 
Oh! Hm. Anyone know the transistor count for 256KB of cache, PCI-E bus and some memory controllers? :p Time to bust out the calculator..

Can't speak to the controllers, but 256KiB of 6T SRAM is about 12.5 million transistors.

edit: this doesn't count any cache tags, just data arrays
 
edit: this doesn't count any cache tags, just data arrays

Roughly speaking cache tags are around 10% or less the size of the data array.

The details: For 64-byte blocks, the worst-case tag overhead is around 12.5%, which is a full 64-bit tag for a 64-byte block. But assuming a 48-bit physical address space and a 6-bit block offset, you're down to 42 bits or 8%. For the 256KB cache, the index would likely be around 10 bits, so now you're down to only 6% overhead for the tags.
 
Back
Top