Larrabee at GDC 09

Discussion in 'Architecture and Products' started by bowman, Feb 16, 2009.

  1. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,505
    Likes Received:
    424
    Location:
    Varna, Bulgaria
    Huh? Well, so there's no LRB silicon in the wild to picture it, looks like. :???:
     
  2. crystall

    Newcomer

    Joined:
    Jul 15, 2004
    Messages:
    149
    Likes Received:
    1
    Location:
    Amsterdam
    I don't think so, considering the abysmal quality of Intel's OpenGL drivers I don't see them making inroads in the professional market soon irrespective of how good the hardware is.
     
  3. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,433
    Likes Received:
    181
    Location:
    Chania
    I live under the impression that the LRB team has nothing to do with the chipset team.
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,345
    Likes Received:
    3,851
    Location:
    Well within 3d
    The cost estimates for Itaniums at that die size already excluded the amortized fab overhead. This is wafer, processing, packaging, and validation costs only.
     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,558
    Likes Received:
    600
    Location:
    New York
    [​IMG]

    Any clue how strands and fibers are actually represented? Are they just concepts to guide developers or are they actually part of the programming model? Can I take a single hardware thread and do anything I want or does everything have to be broken down into fibers? What are we calling a "group of 16-strands" on Larrabee anyway? Is there an equivalent to Nvidia's warp?

    I'm also curious as to how hardware and software switching are going to work together. Software switching is basically knowing you just asked for something that's gonna take a long time so switch to some other data item. But is it up to the developer to keep track of which fibers are waiting on what? And how does the hardware know that it's encountered an unpredictable short latency and that it should invoke a context switch instead of letting the thread switch to another fiber? Sorry for the barrage of questions but it all seems like magic at the moment....
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,345
    Likes Received:
    3,851
    Location:
    Well within 3d
    Fibers and strands are handy abstractions for collecting work into large enough collections that allow for efficient SIMD processing.

    If the programmer does not rely on the abstractions provided by the software stack that uses fibers and strands as base units, considerations for tracking execution and stalls is up to the programmer.

    The hardware is an x86 processor with 64-byte vector registers. What a programmer wants to do within that context doesn't need to define anything in terms of strands or fibers.
    Software switching is something either the framework provides, or a direct programmer can decide.

    Unpredictable short-latency events are one reason why there are multiple threads per core.
    It's too much work for not enough gain to try and predict small stalls like that. Context switches only happen on more obvious long-latency events.
     
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,558
    Likes Received:
    600
    Location:
    New York
    That seems to be the basis for this Nvidia patent. Page misses on texture requests are serviced for subsequent frames but the current frame tries to find the next best texel that's available in local memory.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,558
    Likes Received:
    600
    Location:
    New York
    Is there any example code out there as yet that uses those abstractions? Still can't wrap my head around the concepts :oops:

    So there are two measure of "long-latency" hiding? Switching between fibers in a thread and then switching out hardware threads? Guess I don't understand how the hardware knows when to do the latter.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,345
    Likes Received:
    3,851
    Location:
    Well within 3d
    Low-level detail on a lot of this hasn't been disclosed.

    Software can try to hide latency as long as it remains on-chip. The static allocation of qquads to a strand (correction: fiber) to hide best-case texture latency is an example.

    Anything that goes to memory is probably going to be too long for most types of software-based latency hiding.
     
    #170 3dilettante, Apr 10, 2009
    Last edited by a moderator: Apr 10, 2009
  11. MrBelmontvedere

    Newcomer

    Joined:
    Nov 26, 2006
    Messages:
    40
    Likes Received:
    0
    was the actual Larrabee wafer ever posted? I must have missed it? :oops:
     
  12. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    That is exactly what I personally would want from the hardware (from a developer perspective). I don't want to be checking if TEX instructions fail in the shader (that is insane IMO), and I also don't want my shader to have to be restarted with different MIP clamps per "subtile" to insure efficient computation after a page fault (to insure that either I don't have to check, or I don't get repeated page faults on future shader accesses).

    Found an older thread on LRB and final post from TomF on the Molly Rocket Forums,

    "-In the second pass (which is now the only pass), you don't need 14(?) shader instructions and 2 texture reads, you just read the texture with a perfectly normal texture instruction. If it works, it gives the shader back filtered RGBA data just like a non-SVT system. If it doesn't (which we hope is rare), it gives the shader back the list of faulting addresses, and the shader puts them in a queue for later and does whatever backup plan it wants (usually resampling at a lower mipmap level)."

    Not this says "texture instruction" (talking about doing mega textures on Larrabee compared to current GPUs). The question is how exactly is the texture unit giving the shader a list of faulting addresses?

    And later he writes,

    "... My understanding is that Rage's lighting model is severely constrained because every texture sample costs them so much performance, they can only read a diffuse and a normal map - they just can't afford anything fancier..."

    I just don't buy this later comment. I'd bet, if they are limited to diffuse and normal, it's probably because of lack of ability to store all the data (DVD's for 360), or decompress and recompress enough of it to service the real-time streaming requirements. Or lack of ability to do high enough quality re-compression to pack more into two DXT5s. Should be able to get diffuse+monospec into one, and a 2 channel normal with two channels for something else in the other (the Insomniac trick for detail maps) ...

    I'll be more clear on my original point, going back to the texture size limitation, DX11's 16364x16384 max isn't enough even with virtual texturing to do mega textures with a single texture. And with just two DXT5s, that's 4GB of data, ie the full 32bit address space. So likely you'd still need a level of indirection in the shader to get around this problem (beyond optionally dealing with software page faults). And this is exactly why I'm not sold on virtual paging for mega texturing, unless the card supports 64-bit addressing for texture memory, and then I could split my megatexture up into many tiny mega textures and more draw calls.

    In light of the above problem, I think LRB's virtual texture paging would make more sense with a more classical engine "material system" like say Unreal, where you still use tiled textures + lightmap, or Uncharted with its usage of dual tiled textures + blend mask. But in this case, if with LRB, my shader has to deal with page faults, I'd likely want to factor all that work out into a texture streamer and manually stream textures so I don't eat any unnecessary costs (ie page faults) during shading ... even if just for the reason that I want a consistent cost to render when stuff like textures applied to surfaces which get un-occulded.

    But who knows, I might be singing another song if I was actually playing with the real hardware :wink:
     
  13. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
  14. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,008
    Likes Received:
    535
    If it really gives all the addresses of individual texels I'd assume it pushes them on the stack, letting you add a conditional branch for the exception (not a big deal if it's a rare occurrence) and some scalar code to deal with the faults.
     
  15. crystall

    Newcomer

    Joined:
    Jul 15, 2004
    Messages:
    149
    Likes Received:
    1
    Location:
    Amsterdam
    It's a purely software concept to indicate a bit of parallel code working on unrelated data. As such it has no hardware equivalent and there is no such thing as 'switching between strands' in hardware. Think of each strand as an iteration of a loop working on data which is already known to be in the L1 - for example - or doing purely computational work and thus having a predictable execution latency (minus the variability introduced by hardware thread switching).
     
  16. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
  17. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,505
    Likes Received:
    424
    Location:
    Varna, Bulgaria
    Well, that photo is big enough just to count the integral structures exactly, at least, but no detail on the surface to be sure it isn't LRB but something else... anyway, there is 85 integral structures on that wafer and that writes for 625 mm² per die.
     
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,898
    Likes Received:
    2,220
    Location:
    Germany
    What else could be in the ballpark? Dunnington, Itanium...?
     
  19. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    He only held up two wafers during the keynote so it's either one or the other. The one on the bench behind him does look like Jasper Forest so its pretty sure in my book.
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    For what it's worth: 22.8x29.6=675mm2, assuming the wafer is 300mm in diameter.

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...