Larrabee: 16 Cores, 2GHz, 150W, and more...

Discussion in 'Architecture and Products' started by B3D News, Jun 1, 2007.

  1. glw

    glw
    Newcomer

    Joined:
    Aug 29, 2003
    Messages:
    64
    Likes Received:
    0
    I believe that CSI uses PCIe 2.0 as its physical layer.

    So hypothetically Intel can use either bus as is appropriate.
     
  2. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    300
    Location:
    UK
    INKster: From the B3D news post: "Page 2 is much more interesting however, as they link to the presentation above and also uncover the hidden Larrabee PCB diagram on slide 16." - If you look closely, you'll notice that the PCB xtreview and tgdaily are pointing at is on that slide. It is perfectly legit.

    What the slides imply is that there are two chips based on the Larrabee architecture, one aimed at the GPU/Gaming market and one aimed at the GPGPU/HPC market. The former has dedicated texture samplers & other 3D-specific hardware, and sports 16 cores. The latter sports 24 cores, but doesn't have those fixed-function units.

    Unsurprsingly, the slides imply the GPU is uses PCIe Gen2, while the HPC chip is based on CSI. There are other possible explanations, but this one seems to be the most likely to me... Also, Gesher is a traditional CPU, also known as Sandy Bridge and it is Intel's next microarch after Nehalem.

    Gesher has absolutely nothing to do with a coprocessor. Larrabee could be considered a coprocessor, but it looks extremely general-purpose so that might not be a good way to represent it. I'm not so sure what's so confusing in the presentation, really... I guess if you're assuming Polaris (aka the 80-cores chip, also mistakenly called TeraScale) has anything to do directly with Larrabee and Gesher (it might influence future architectures, but that's all), then you might manage to confuse yourself, heh.
     
  3. santyhammer

    Newcomer

    Joined:
    Apr 22, 2006
    Messages:
    85
    Likes Received:
    2
    Location:
    Behind you
    Ok, thx for the clarification Arun! Do you know too if the Larrabee is like the Cell ( some general purpose cores + more simple SPEs to make some SIMD math operations ) or if is composed by full general-purpose x86/x64 cores?

    Anybody know if that Larrabee could be used with C/OpenMP or will need DirectX/OpenGL shaders/dedicated assembly or API to be programmed? One of the slides I saw mention indirectly the parallel programming should be easy to do ( and also shows a DVD with the fortran compiler cover ) so I bet could use OpenMP ( which is very good supported on Intel compilers btw ) but is just speculation!

    I think Intel hired the Quake4 raytracing guy to give them some feedback about it. If I could use C/OpenMP with enough cores and fast instructions could do raytracing(likes says the presentation!) very fast on the TeraScale! ( because the spatial structures are a pain currently on the GPU due to stackless restrictions, although DX10/Evans increased te instruction count and allows jumbo textures ).

    Other question... the Intel Geneseo (CSI?) is the Torrenza equivalent, isn't it? Can be used like the HT3.0 to interconnect all these coprocesor cards without CPU intervention? If it's that, why is going Intel to use PCIe 2 for that Larrabee graphics card, is better?

    And other more... The graphicsa card Larrabee slide shows two "BSI" connectors placed usually where the graphics cards place the SLI connectors... So I assume are for "SLI"... If the idea of HT3.0 and Geneseo is to plug multiple cards into the motherboard as coprocessors and to use the bus without CPU intervention.. why these connectors exist? Or perhaps aren't for SLI after all...

    How reactioned NVIDIA and AMD about this? Larrabee/polaris/whatever sounds like "dangerous" for them ( although some webs mentioned NVIDIA was going to cooperate in the Larrabee design lol! )

    I know, too much questions... let's use Mr.Sylar to open the Intel's engineers head and see what's inside :lol:
     
    #23 santyhammer, Jun 3, 2007
    Last edited by a moderator: Jun 3, 2007
  4. Pancakin

    Banned

    Joined:
    Feb 11, 2007
    Messages:
    7
    Likes Received:
    0
    wow, 32k l1 with 3 cycle latency. thats enormous, compared with current gpgpus. is l3 expected to run at full frequency? Hopefully intel will stick with traditional aliasing logic, simply because of shear strength.
     
  5. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,297
    Likes Received:
    464
    Is there even a need to point out that 49.5mmx49.5mm die would be the most insane thing ever?
     
  6. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    It's most likely the chip package and/or socket size.
     
  7. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,297
    Likes Received:
    464
    Got to be...
     
  8. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Likes Received:
    10
    Can anyone clarify where the pdf hosted on xtreview.com came from precisely and maybe comment its provenance?

    Arun mentions that it was "uploaded" in April...

    Was it a confidential briefing that has leaked or just one of those briefings that initially flew under the radar?
     
  9. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,163
    Likes Received:
    1,453
    Location:
    Beyond3D HQ
  10. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Likes Received:
    10
    That was some workshop.
     
  11. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,662
    Likes Received:
    182
    it should be of no surprise what Intel is aiming for, and will continue to aim for:

    circa 2+ years ago....

    [​IMG]


    http://www.intel.com/technology/magazine/computing/platform-2015-0305.htm
     
  12. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,278
    Likes Received:
    3,531
    Location:
    Well within 3d
    That's definitely on the early side for Larrabee, though the process tech should be ramping by then.

    At that price point, I'd hope it would be the lower-end variant of Larrabee (if that's what Intel is going to use so soon), because the theoretical specs on Larrabee's peak execution rates would make it a worse undershoot than the "meh" R600.

    If it did come out that early, it would be somewhat more favorable in comparison to GPUs that are likely to be out at the time, though it would need a process node advantage to hit a temporary parity.
     
  14. pelly

    Newcomer

    Joined:
    Sep 5, 2002
    Messages:
    159
    Likes Received:
    5
    Location:
    San Jose, CA
    With a die size that large, the idea of having poor yields is more than a bit scary...Then again, the architecture could lend itself well towards redundancy...so they'd be able to "recycle" die by allocating to lower-end parts...

    Now, I wonder how much TSMC or UMC would charge a fabless semiconductor company for a die like that...lol... :shock:
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,278
    Likes Received:
    3,531
    Location:
    Well within 3d
    It's been noted that the size on that slide is for the entire package, not just the chip.

    I don't think Intel's standard fab lithography equipment has optical reticles that are wide enough for a chip that size.

    Itanium is the biggest they make, and it's nowhere near that die size.
     
  16. ebola

    Newcomer

    Joined:
    Dec 13, 2006
    Messages:
    99
    Likes Received:
    0
    This looks extremely intruiging, for something like this to be coming from intel.

    is this a smarter, more-mass-market-supportable Cell ?

    How much do they lose for being x86 versus risc

    How much do they gain for apparently having more graphics support (texture samplers), and more lower- latency threads.... (arguably easier to utilize than unrolled loops/SOA)


    How will Cell stack up against this ( is Cell dead ? )

    could this find it's way into xbox 720.

    will it have propper cache control instructions :)

    how many registers will they have with the in-order cores (i.e. no register renaming ??) ..

    In need to read through all this in more detail again.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,278
    Likes Received:
    3,531
    Location:
    Well within 3d
    Even if it's not, the rate of evolution is potentially higher for Larrabee.
    The volumes are likely higher, and Intel has more fab capacity to burn.

    There are a number of unknowns, such as how Larrabee will work out in silicon.

    No FMADD, though this isn't a big problem if the chip can sport a MUL and an ADD pipeline.
    The caches will be leaned on more heavily than a RISC would need to, thanks to the reg/mem operands and small register file.

    x86 at Larrabee's clock speed has already been done, so that's not a huge problem.
    Aside from having to hassle with register pressure more, much of x86's complications amount to little more than a few extra pipeline stages on a simple in-order core, some extra hardware, and slightly higher power draw.

    On that account, a few stages is not killer, Intel can manage larger dies, and the high-end Larrabee's target power draw is already declared to be equally high.

    There's other awkwardness to the ISA, but the vector extensions have not been discussed, and the they may be very significant.

    The graphics hardware would most likely keep Larrabee well ahead of Cell for graphics, and is about the only reason why it would be mentioned in the same paragraph as dedicated GPUs.

    As a GPU, Cell is already a non-starter.

    At 90nm Cell is ~200 gflops.
    At 45nm Larrabee is ~1 tflop. (The range given in the slides is VERY wide, 0.2-1 tflop)
    In an ideal world a Cell design scaled without significant design changes would be around 800.

    However, Larrabee seems to be listed as having that massive throughput with DP precision, which is more than what Cell can do right now.

    There are too many unknowns, given the wide range of possible clock speeds and core counts.
    A future Cell was stated to be the same neighborhood, though I don't think that was DP.

    The cache looks to be a very important design element. It seems likely that greater control will be present for caches.
    With proper controlling instructions, Larrabee might negate much of the advantages that the LS offers Cell.

    I'm still unsure of the exact arrangment of the caches. Someone said the L1 was write-through, which would be painful for a shared L2 cache. I'm not clear how the L2 is distributed.

    What is still not mentioned is a DMA engine or other mechanisms for bringing in batches of data.

    If working from x86, it's 8 GP registers and 8 SSE.
    x86-64 is 16 of each.

    Larrabee has been characterized as having a 512 bit vector FPU, which is 4 times the width of current SSE.

    The number of registers, however, is still at most 16 unless they get Larrabee to run on a modified subset of x86.
     
  18. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    213
    Location:
    Uffda-land
    What makes you think that Larrabee is necessarily the only arrow in their discrete quiver? Surely that time frame would indeed suggest otherwise.
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,278
    Likes Received:
    3,531
    Location:
    Well within 3d
    I don't know if Larrabee will be the initial discrete product, and it does sound early.

    My intended statement was that Larrabee would be more impressive compared to GPUs of that time frame as opposed to GPUs that would be coming out later.

    If it came out later, it would be running up against more powerful GPUs, and it would have a harder time making an impact in the discrete GPU space.

    To complicate matters, if Intel's first product is very different from Larrabee, then Intel will have to deal with developmental whiplash when Larrabee is released.
     
  20. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    213
    Location:
    Uffda-land
    Mmm, maybe. Depends on the characteristics of that product and how closely they match current competitors rather than a third model that is neither current IHVs nor Larrabee.

    But there might be other "goods" that they might see as more important. For instance, are they going to want AIB participation? If not in North America (and may there too), at least in Europe and Asia? AIB's probably have much better retail relationships for graphics products with retailers, both e- and b&m for introducing a new graphics product. Also, if they are suddenly going to need oodles of graphics memory, as we've seen reported, wouldn't you want those relationships to get going and become solidified? How about pcb manufacture and assembly?

    I just look at that date and wonder if they might have a 'tweener discrete strategy in mind to help them ramp up in some of these areas before "the main show" with Larrabee debuts.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...