Larrabee delayed to 2011 ?

Discussion in 'Architecture and Products' started by rpg.314, Sep 22, 2009.

  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Oh contraire, monsieur. Even downclocked versions of Atis last-gen (desktop!) products are sufficient for the seventh-fastest supercomputer.
     
  2. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    For peak performance at least, Cypress is faster in both SP and DP flops (680 vs 515 DP flops). So what it really comes down to is which one delivers better efficiency in what kernels.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Cypress in HD5870 is 544 DP GFLOPS theoretical. The server version, FireStream, is likely to be less.
     
  4. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    So T pipe can't do DP?
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    No, T is not used for double-precision. Though I suppose it can be used to seed initial values for evalution of DP-RCP etc., since DP transcendentals aren't in the instruction set and so require a "macro".

    DP MUL and MAD are both single-cycle using XYZW cooperatively. DP ADD uses pairs of lanes cooperatively, meaning that ADD is also 544 GFLOPS. I presume DP ADD in GF100 is half the FLOPS of DP MAD - not sure.
     
  6. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    DP Add for GF100 is 1/2 MAD flops.

    Though I don't think the MAD/MUL uses the XYZW cooperatively. Makes more sense for it to simply reduce each from x16 to x4 unless the data paths between XYZW are already interleaved.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    It's definitely cooperative. Simple inspection of a single DP-MUL instruction, as it is compiled, shows this:

    Code:
    kernel void doubletest( double X<>, double Y<>, out double Z<>)
    {
     Z = X * Y ;
    }
    compiles as:

    Code:
    ; --------  Disassembly --------------------
    00 TEX: ADDR(48) CNT(2) VALID_PIX 
          0  SAMPLE R1.xy__, R0.xyxx, t0, s0  UNNORM(XYZW) 
          1  SAMPLE R0.xy__, R0.xyxx, t1, s0  UNNORM(XYZW) 
    01 ALU: ADDR(32) CNT(5) 
          2  x: MUL_64      R0.x,  R1.y,  R0.y      
             y: MUL_64      R0.y,  R1.y,  R0.y      
             z: MUL_64      ____,  R1.y,  R0.y      
             w: MUL_64      ____,  R1.x,  R0.x      
             t: MOV         R0.z,  0.0f      
    02 EXP_DONE: PIX0, R0.xyzz
    END_OF_PROGRAM
    The ALU is already cooperative for dot product. Then there's the dependent instructions introduced by Evergreen that build-upon the dot product capability.
     
  8. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
  9. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,251
    Location:
    British Columbia, Canada
    Cool, but is there more than the first 6:30 minutes?
     
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    From there,

     
  11. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,251
    Location:
    British Columbia, Canada
    Right... I'm just confused. Not being a real YouTube user how long does it take to upload the different parts? Or are they editing them for other reasons first? The first part seems to imply they're pretty raw so I'm curious as to what the delay is about.
     
  12. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    "opted for picture quality" - in a fracking interview filmed with shaky cam (tm)? I rather think they're opting for visit-fishing…
     
  13. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK
    Correct!
    And yes, my mobile phone camera can shoot better video than they did :roll:
     
  14. larrabee

    Newcomer

    Joined:
    Dec 21, 2009
    Messages:
    29
    Likes Received:
    0
    it's can take a while to upload depending on server load but you can concurrently upload the different videos. the limit is 10 minutes and under 2GB.
     
  15. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    The real limit on YT is actually 10:59. ;)
     
  16. Space Giraffe

    Newcomer

    Joined:
    Jun 3, 2010
    Messages:
    16
    Likes Received:
    0
  17. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Does someone have a transcription of this? The image quality isn't too bad but the sound quality is terrible.

    Anyway, trying to understand the best I can...
    While the facts appear to point in that direction, I believe Andrew fails to see some of the dynamics behind it.

    There are thousands of hardware engineers working on DX11 products, while the number of people working on a full DX11 software implementation can probably be counted on one hand. What is lacking to change this around is fully generic multi-core hardware (such as Larrabee). Once that's on the market it won't take very long before innovative new applications appear as a software implementation before any hardware implementation (if that would ever appear at all). Even today a lot of 'hardware' features are actually implemented using software in the firmware or driver.

    Also, while an optimized DX11 software impelementation has yet to appear, I believe it would have been very straightfoward for Microsoft to continue the development of WARP and make it available before any hardware. And that would have effectively been created by a handful of software engineers versus thousands of hardware engineers.

    Ironically, there are millions of software developers who can work on various aspects of graphics technology, while only several thousand people have a hand in designing hardware. So there is tremendous potential waiting to be unleashed. Huge companies such as Microsoft revolutionized the way we use computers not through hardware, but through a complete focus on software. The software revolution for computer graphics has yet to begin...
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Wow, that's naive. Hardware engineers had the bright idea to implement a software pipeline, in hardware, decades ago.
     
  19. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    What's naive exactly?
    Sure, but what's your point? It's just one pipeline. What developers (such as Sweeney) want is to be able to implement any pipeline. And this will require a true software implementation on fully generic hardware.
     
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Is this a hint that there is a massive untapped market for a software DX11 implementation or...?

    Fully generic and performant multicore hardware.
    Obviously, we've had multicore CPUs for years, they are just insufficient.
    Larrabee I apparently was not the one to break that trend.

    I'd like an analysis of this. What particularly innovative things would have a significant material impact on the market?
    There a number of weaknesses in the standard pipeline that could potentially be corrected with a different implementation.
    However, how much would this amount to externally for the consumer?
    A number of algorithms promise to correct one weakness or another in software, and they often do, but the gains are often incremental (better transparency pre-DX11, a lot of chrome spheres) and not sufficient to counter the reduced performance in the bulk of the workload, or they wind up being capped by other restrictions (asset creation, memory, art pipeline, etc.)

    Does this creative flowering of software renderers offer significantly greater utility to the market, or is it searching for a problem?

    What is the economic incentive for Microsoft for doing so?

    So it's better if millions of people work on the same thing over and over versus having a few thousand work on the same thing once?
     
    #700 3dilettante, Jun 16, 2010
    Last edited by a moderator: Jun 16, 2010
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...