Larrabee at GDC 09

Discussion in 'Architecture and Products' started by bowman, Feb 16, 2009.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Yep, agree with all that - nothing looks difficult in the context of Larrabee. It has similarities to the way in which a thread requests TEX operations, and receives status updates or results. Though I suspect a TU is dedicated to a core, which is simpler than the one:many gather:worker setup I'm contemplating. Still, message passing between threads across Larrabee seems pretty fundamental.

    For added fun, the gather thread(s) could sort addresses from extant requests to achieve some degree of coalescing :razz:

    Jawed
     
  2. crystall

    Newcomer

    Joined:
    Jul 15, 2004
    Messages:
    149
    Likes Received:
    1
    Location:
    Amsterdam
    I don't think that the TUs will be just dumb samplers & filters (i.e. limited to bilinear) because it would force them to do anisotropic filtering in software. If they want to be competitive with GPUs they will need adaptive anisotropic filtering done on the TUs and thus they will have to accept more parameters than just UV coordinates.
     
  3. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    So what does Larrabee do badly in terms of general purpose code? (Sorry for being noob and vague.) If one was comparing a general purpose CPU to one like this, what sacrafices in terms of general purpose computing have been made to speed it up in the way it has been and how far could specific targeted programming overcome that shortfall?
     
  4. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,722
    Likes Received:
    936
    Location:
    Guess...
    I wonder if it could even be used as a general purpose CPU or if its limited to graphics and GPGPU only?

    I.e. could you run Windows on this thing as well as all the other regular PC software and how would it compare to something like Core i7? I'm assuming not favourably if its possible at all?
     
  5. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    Windows 7 requires a minimum of a 1ghz processor to 'run'. So im wondering if it requires any instructions which may not be present in the P1/Larrabee archiecture as im assuming they haven't updated the general purpose instructions with SSE etc, or have they?
     
  6. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    I think the very first Larrabee .pdf had some sort of comparison between a theoretical Larrabee core and a Conroe/Penryn core.

    According to the slides from this presentation (here) Larrabee is 'fully capable of running operating systems'. (here)

    Depending on the frequency of the actual chip I guess it will compare the best to an Intel Atom, given that the wide vector will be relatively useless in terms of regular desktop use like web browsing, text editing and such.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    How will Larrabee cope with MMX/SSE?

    That functionality seems to be completely missing and would require some kind of translation into LRBni - or it just won't work it seems.

    Jawed
     
  8. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    Optimized JIT on the scalar core?
     
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Yes but since it isn't there in hardware, probably LRB isn't backward compatible with code having sse/mmx instructions.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    Intel at one time showed a potential design where Larrabee sat on a board as a system processor.

    Later statements indicate that Intel does not currently want to allow Larrabee to be visible outside of an expansion slot.
    Any physical implementation of Larrabee at this point might not have the necessary connections for interrupts and system signals present or enabled for actual use as a host.

    This appears to be a largely artificial restriction. Intel might not want to hurt its margins in HPC and fragmenting the ISA even further with yet another incompatible extension set.
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    The curious thing being, though, that LRBni appears to be the kind of destination that AVX is a step towards. I know so little about MMX/SSE though...

    In the scalar core, what kind of instruction differences are there between Larrabee and Core-2 or Pentium 4?

    Jawed
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    Wikipedia has a listing of x86 instructions.

    The bulk of the scalar integer instructions are there.
    CPUID might be there.

    Things like conditional moves and certain fast system call instructions didn't appear until the Pentium Pro or Pentium MMX.
    Conditional moves aren't necessary for code to run, but any software using them would have to be refactored to split each CMOV into a branch statement.
    I don't know how often SYS type instructions are used in current code.

    MMX and SSE are the bulk of the later instructions, which are of course not present in Larrabee.
     
  13. compres

    Regular

    Joined:
    Jun 16, 2003
    Messages:
    553
    Likes Received:
    3
    Location:
    Germany
    I had the idea it could run an OS but just slower than a regular CPU.
     
  14. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,722
    Likes Received:
    936
    Location:
    Guess...
    So I guess we won't be swapping out our quad/octo Sandybridges for this chip in the 2010 timeframe....
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    If the head of Intel's HPC division has any say, no.
    Intel does not seem to want Larrabee exposed to the system, or at least not Larrabee I.

    Intel's positions on Larrabee have shifted several times, and it has been hard to get a coherent position from the company.

    Based on public quotes, Larrabee has transitioned from a mid-range solution, to a high end power-hungry enthusiast solution, then to a more modest solution with an emphasis on power-efficiency (an argument that isn't done unless there is no higher pinnacle to reach).
    Let's not forget the schizophrenic ray-tracer/rasterizer PR stance back in the day.

    The latest change to Larrabee's target bracket hints to me that somebody's bubble got popped, either by the economy or by tapeout results.
     
  16. nutball

    Veteran Subscriber

    Joined:
    Jan 10, 2003
    Messages:
    2,222
    Likes Received:
    549
    Location:
    en.gb.uk
    Sounds spookily similar to their aims/goals/projections for Itanium a decade ago.
     
  17. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    A better replacement is the SET instruction which conditionally sets a register to 0 or 1, which is available since the 386. It'll be a few more instructions than the cmov, but at least avoids the branch.
     
  18. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,561
    Likes Received:
    601
    Location:
    New York
    I'm really curious to see how/if LRB scales downward. With all the talk of doing rasterization and triangle setup in software that's gotta slow things down tremendously when the architecture is stripped down to an entry level configuration. I'm assuming that those fixed function bits don't scale much in either direction in current GPU families.
     
  20. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Why? It simply rebalances itself. There are no bottlenecks, just hotspots. As long as there is very high utilization of the cores, that's more optimal than a GPU (where it's either a bottleneck or wasted silicon - often both).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...