Larrabee at GDC 09

Jawed · Mar 31, 2009

Yep, agree with all that - nothing looks difficult in the context of Larrabee. It has similarities to the way in which a thread requests TEX operations, and receives status updates or results. Though I suspect a TU is dedicated to a core, which is simpler than the one:many gather:worker setup I'm contemplating. Still, message passing between threads across Larrabee seems pretty fundamental.

For added fun, the gather thread(s) could sort addresses from extant requests to achieve some degree of coalescing

Jawed

crystall · Mar 31, 2009

Jawed said:
True, one thing that's not clear yet is whether there's any LOD, bias and addressing computation in the TUs. The justification for making them dedicated was based upon decompression and filtering.

I don't think that the TUs will be just dumb samplers & filters (i.e. limited to bilinear) because it would force them to do anisotropic filtering in software. If they want to be competitive with GPUs they will need adaptive anisotropic filtering done on the TUs and thus they will have to accept more parameters than just UV coordinates.

Squilliam · Mar 31, 2009

So what does Larrabee do badly in terms of general purpose code? (Sorry for being noob and vague.) If one was comparing a general purpose CPU to one like this, what sacrafices in terms of general purpose computing have been made to speed it up in the way it has been and how far could specific targeted programming overcome that shortfall?

pjbliverpool · Mar 31, 2009

Squilliam said:
So what does Larrabee do badly in terms of general purpose code? (Sorry for being noob and vague.) If one was comparing a general purpose CPU to one like this, what sacrafices in terms of general purpose computing have been made to speed it up in the way it has been and how far could specific targeted programming overcome that shortfall?

I wonder if it could even be used as a general purpose CPU or if its limited to graphics and GPGPU only?

I.e. could you run Windows on this thing as well as all the other regular PC software and how would it compare to something like Core i7? I'm assuming not favourably if its possible at all?

Squilliam · Mar 31, 2009

Windows 7 requires a minimum of a 1ghz processor to 'run'. So im wondering if it requires any instructions which may not be present in the P1/Larrabee archiecture as im assuming they haven't updated the general purpose instructions with SSE etc, or have they?

bowman · Mar 31, 2009

pjbliverpool said:
I wonder if it could even be used as a general purpose CPU or if its limited to graphics and GPGPU only?

I.e. could you run Windows on this thing as well as all the other regular PC software and how would it compare to something like Core i7? I'm assuming not favourably if its possible at all?

I think the very first Larrabee .pdf had some sort of comparison between a theoretical Larrabee core and a Conroe/Penryn core.

According to the slides from this presentation (here) Larrabee is 'fully capable of running operating systems'. (here)

Depending on the frequency of the actual chip I guess it will compare the best to an Intel Atom, given that the wide vector will be relatively useless in terms of regular desktop use like web browsing, text editing and such.

Jawed · Mar 31, 2009

How will Larrabee cope with MMX/SSE?

That functionality seems to be completely missing and would require some kind of translation into LRBni - or it just won't work it seems.

Jawed

Panajev2001a · Mar 31, 2009

Jawed said:
How will Larrabee cope with MMX/SSE?

That functionality seems to be completely missing and would require some kind of translation into LRBni - or it just won't work it seems.

Jawed

Optimized JIT on the scalar core?

rpg.314 · Mar 31, 2009

Yes but since it isn't there in hardware, probably LRB isn't backward compatible with code having sse/mmx instructions.

3dilettante · Mar 31, 2009

Intel at one time showed a potential design where Larrabee sat on a board as a system processor.

Later statements indicate that Intel does not currently want to allow Larrabee to be visible outside of an expansion slot.
Any physical implementation of Larrabee at this point might not have the necessary connections for interrupts and system signals present or enabled for actual use as a host.

This appears to be a largely artificial restriction. Intel might not want to hurt its margins in HPC and fragmenting the ISA even further with yet another incompatible extension set.

Jawed · Mar 31, 2009

The curious thing being, though, that LRBni appears to be the kind of destination that AVX is a step towards. I know so little about MMX/SSE though...

In the scalar core, what kind of instruction differences are there between Larrabee and Core-2 or Pentium 4?

Jawed

3dilettante · Mar 31, 2009

Wikipedia has a listing of x86 instructions.

The bulk of the scalar integer instructions are there.
CPUID might be there.

Things like conditional moves and certain fast system call instructions didn't appear until the Pentium Pro or Pentium MMX.
Conditional moves aren't necessary for code to run, but any software using them would have to be refactored to split each CMOV into a branch statement.
I don't know how often SYS type instructions are used in current code.

MMX and SSE are the bulk of the later instructions, which are of course not present in Larrabee.

compres · Mar 31, 2009

I had the idea it could run an OS but just slower than a regular CPU.

pjbliverpool · Mar 31, 2009

So I guess we won't be swapping out our quad/octo Sandybridges for this chip in the 2010 timeframe....

3dilettante · Mar 31, 2009

If the head of Intel's HPC division has any say, no.
Intel does not seem to want Larrabee exposed to the system, or at least not Larrabee I.

Intel's positions on Larrabee have shifted several times, and it has been hard to get a coherent position from the company.

Based on public quotes, Larrabee has transitioned from a mid-range solution, to a high end power-hungry enthusiast solution, then to a more modest solution with an emphasis on power-efficiency (an argument that isn't done unless there is no higher pinnacle to reach).
Let's not forget the schizophrenic ray-tracer/rasterizer PR stance back in the day.

The latest change to Larrabee's target bracket hints to me that somebody's bubble got popped, either by the economy or by tapeout results.

nutball · Mar 31, 2009

3dilettante said:
Based on public quotes, Larrabee has transitioned from a mid-range solution, to a high end power-hungry enthusiast solution, then to a more modest solution with an emphasis on power-efficiency (an argument that isn't done unless there is no higher pinnacle to reach).

Sounds spookily similar to their aims/goals/projections for Itanium a decade ago.

Humus · Mar 31, 2009

3dilettante said:
Conditional moves aren't necessary for code to run, but any software using them would have to be refactored to split each CMOV into a branch statement.

A better replacement is the SET instruction which conditionally sets a register to 0 or 1, which is available since the 386. It'll be a few more instructions than the cmov, but at least avoids the branch.

bowman · Apr 1, 2009

Presentations available here:

http://software.intel.com/en-us/articles/intel-at-gdc/

Larrabee section at intel.com:

http://software.intel.com/en-us/visual-computing/?larrabee=1

trinibwoy · Apr 2, 2009

I'm really curious to see how/if LRB scales downward. With all the talk of doing rasterization and triangle setup in software that's gotta slow things down tremendously when the architecture is stripped down to an entry level configuration. I'm assuming that those fixed function bits don't scale much in either direction in current GPU families.

Nick · Apr 2, 2009

trinibwoy said:
With all the talk of doing rasterization and triangle setup in software that's gotta slow things down tremendously when the architecture is stripped down to an entry level configuration.

Why? It simply rebalances itself. There are no bottlenecks, just hotspots. As long as there is very high utilization of the cores, that's more optimal than a GPU (where it's either a bottleneck or wasted silicon - often both).

Larrabee at GDC 09

Jawed

crystall

Squilliam

Beyond3d isn't defined yet

pjbliverpool

B3D Scallywag

Squilliam

Beyond3d isn't defined yet

bowman

Jawed

Panajev2001a

rpg.314

3dilettante

Jawed

3dilettante

compres

pjbliverpool

B3D Scallywag

3dilettante

nutball

Humus

Crazy coder

bowman

trinibwoy

Meh

Nick

Similar threads