Larrabee at Siggraph

Discussion in 'Architecture and Products' started by nAo, Jun 2, 2008.

  1. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Meh, at least Intel gave a full breakdown of the terminology and are basing it off the actual hardware instead of some random terminology with no basis in their hardware.

    But I thought it was Intel Fiber = Nvidia thread.
     
  2. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Hmm, I thot that's what I said. :lol:

    But yeah, if you're willing to let performance having momentary dips below 60fps then your average fps would be much higher.

    But then maybe with all their flexibility they'd go off and do something else with those cores just then. :???:
     
  3. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Well, unless there's a prob with it, here's my rough ruler-based estimate of the number of 1Ghz larabee cores needed for each of the 25 sample frames for Gears of War, to render them at 60fps:

    23
    13
    12
    23
    12-13
    12
    12
    17
    17
    22
    22
    15
    12
    14
    15
    24
    20
    23
    14
    15
    17
    17
    16
    17
    15

    Sorry it's not precise, I didn't want to start guesstimating decimal points etc. Based on my numbers, for this set of 25 sample frames, some stats:

    Average: 16.78
    Min: 12
    Max: 24
    Median: 16

    Soz, yeah, for some reason I read maximum for minimum in your post! :|
     
  4. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    Interesting Titanio :)
    But I guess you forgot to mention the resolution, no?

    EDIT
    Thanks Geo, between I'm stupid... I guess Titanio based his figure on the FEAR results (@1600x1200)... :oops: I should have think twice before ask stupid things.
     
  5. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Of course, it is worth noting these are 1GHz cores. . . and most people seem to be expecting north of 2GHz by the time it is shipping in retail.
     
  6. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    1600x1200
     
  7. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Yeah, sorry, it's Gears at 1600x1200, no AA.
     
  8. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    F.E.A.R. 1600x1200 4x AA (I presume that's what 4 samples means).. again, estimates based on the chart:

    9.5
    11.5
    14
    14
    16.5
    19
    16
    13
    14
    12
    15
    15.5
    15
    14.5
    7
    16
    12.5
    10.5
    8
    9
    8
    7
    26
    15
    14

    (I used 0.5s here..)

    Average: 13.3
    Min: 7
    Max: 26
    Median: 14
     
  9. Abu85

    Newcomer

    Joined:
    Apr 29, 2008
    Messages:
    5
    Likes Received:
    0
    How did you calculate that?:?:
     
  10. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Figure 10 in the paper has a data-points for each game's sample frames, showing how many 1Ghz cores would be required to render that frame at 1/60 of a second. The y-axis label jumps in increments of 5 cores, though, so I'm having to eyeball some of the numbers.

    I don't think I'll do HL2..it's distribution is a bit less all over the place, ranges between maybe 6 and 10..
     
  11. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    Given that Intel is confidently going to churn out a 700+ million transistor 45nm die (Nehalem) this year, that GPUs are seemingly always bigger than CPUs, and that any unsatisfying cores can be fused off the die and then sold off as a different SKU.. I think there will be a lot more than 32 cores on this thing. It reminds me of the tactic AMD used to make everyone think RV770 would be mildly more powerful than the RV670. :cool:
     
  12. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Based on this limited data set of simulated results, anyone care to hazard a guess how it might compare to some GPUs we know of today? :lol:

    Say, a 32-core 2Ghz variant? Assuming linear scaling with clock-speed and core numbers? (Intel claims near linear scaling with cores in these games, at least..)
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Since I don't have a subscription, I have to ask: do they indicate how they pick their sample frames? Is this running on some kind of Larrabee simulator?

    I guess it's too early for a histogram or somesuch, but the hairs on the back of my neck raise up whenever I see the prospect of "selected" frames.
     
  14. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    It's running the same simulator they use to simulate CPU archs under development. I don't know how kosher this is, but here goes.

     
  15. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    For HL2 they seem to say they took 1 in every 30. For FEAR, 1 in every 100 and for Gears, 1 in every 250. They say "frames are widely separated to catch different scene characteristics as the games progress". They also say they captured frames "while the game was played at normal speed".
     
  16. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    I'm only flicking on in the paper now, and I spy some data on real time ray tracing with Larrabee. The implementation is c++ with some assembly in key places.

    Shows a screenshot from a 1024x1024 frame with 234k triangles, 1 light source, 1 reflection level, and "typically" 4m rays per frame.

    They have a performance comparison for 1Ghz Larrabee vs a 8-core Xeon 2.6Ghz for varying numbers of Larrabee cores. It's not clear if this is for the same scene shown in the screenshot or not.

    But anyway, the Xeon gets something between 10 and 15fps. A 8-core larrabee gets 21.92fps. 16 cores gets 41.16 fps. 32 gets 71.63fps.

    They observe that a core duo 2 requires 4.67x more clock cycles than a single Larrabee core for this workload.

    edit - sorry, the numbers in the chart are for the scene shown in the screenshot.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The way Larrabee's needed core numbers are derived makes it difficult (impossible?) to compare to any current GPU numbers, which are not tested in like manner.

    GT200 for FEAR at 1600x1200 with 4x AA had what, a 90-100 FPS average?
    It doesn't do a good job capturing a minimum (or what fraction of resources yield a given minimum), but I don't think the Larrabee measurments do, either.
     
  18. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    No, they don't..it's quite plausible the tiny set of frames used here doesn't include the most demanding frame you might come across in a comparable benchmark for the game..or even that it's necessarily representative of what's typical (although intel would probably say it is)...

    ..but..

    for this tiny set of frames, assuming linear clock and core scaling, if their simulation is accurate.. a 48-core 2Ghz larabee would render that set of frames at a minimum of ~240fps.
     
  19. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Here's what they say about their benching:

    I like the "aggressively pessimistic" part, but I'm not sure about how reliable the rest is. :smile:
     
  20. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    The paper is quite interesting, and it's impressive what they can do with sofware rendering.

    It's a shame that they don't have any information about storage requirements for binning. I guess we can infer that they must be less than half the BW per frame. Their method of rasterizing during binning is something I never thought of before because a 64x64 tile w/ 4xAA would potentially need a coverage mask of 16 kbits (!) per triangle, but I guess the subdivision-based rasterization would produce a more efficient coverage mask. Still seems like quite a bit of space per triangle, though, and you have a bunch of post-VS attributes to store per vertex, too.

    There's an interesting blurb in there about rasterization:
    I guess increasing setup speed in GPUs beyond 1 per clock is indeed a tough task, as dealing with fragments from multiple rasterizers would be tough. I have a feeling this is one reason that Larabee is doing well, as it can have each core work on a seperate set of primitives (they divide it into chunks of 1,000). They even have a different subsection in each bin to store the results from each core.

    I hope ATI/NVidia tackle this issue. I think it's doable.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...