Larrabee at GDC 09

Discussion in 'Architecture and Products' started by bowman, Feb 16, 2009.

  1. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    As a side note, only just now the beta of CUDA 2.2 provides direct host memory access: GT200 through PCIe bus, and MCP79 direct access. More Info. Limited by ~6GB/s of PCIe bandwidth (low) and latency (hi). Not sure if newer NV hardware has device/host page granularity (ie a texture could have parts on device and parts on host). Regardless, I think the usefulness of paged texture memory is marginalized by the need for software to page to and from disk, high latency, and limited PCIe bandwidth. Perhaps there is a good reason WDDM v2 is quiet.
     
  2. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,009
    Likes Received:
    536
    Loading on page misses is not how you would use virtual memory on the GPU in games, that's more something for time multiplexed independent applications trying to share the GPU. In games/rendering you would use virtual memory to efficiently store sparsely populated textures. The obvious example is megatextures. An extra layer of indirection in the shader is never going to be as efficient as a TLB.
     
  3. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    As for mega-textures, the idea being one virtual huge texture, which isn't doable with virtual paging alone because of texture size limitations (filtering), in which case the level of indirection would be necessary. I guess one might be able to get around this with texture arrays.

    Not sure exactly how LRB is doing texture page misses, but arn't you still going to need a level of indirection to choose to sample lower mips on the page fail? If texture page miss is an interrupt, that is going to get awfully scary for real-time...
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    A page miss on the x86 cores will behave like any other memory access.

    The texture units have TLBs, but they do not handle page misses. They need to revert back to the core to load the page.
     
  5. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    So for LRB then, the texture fetch instruction would throw an exception, which would fault the shader, and end up in an interrupt handler?

    If so I'd speculate that in the case of virtual texturing, the interrupt handler would toss the page on a page fault list (to be handled later). Then for the rest of the "subtile" force a mip level cap for the faulted texture to insure the page fault didn't happen again. This would have to be at some "subtile" granularity to insure future fetches didn't kill performance with interrupts...
     
  6. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,332
    Likes Received:
    119
    Location:
    San Francisco
    From Larrabee: A Many-Core x86 Architecture for Visual Computing
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    I haven't seen an outline of how the cores control the texturing units. I didn't see any instruction on the Larrabee vector ISA list that would match such an event.

    The exact control method is not mentioned, but perhaps a texturing section is a software loop that sends commands to the texure unit and checks a structure for access problems.
    If there is an unsucessful access, the code then tells the core itself to perform a memory read that would lead to a page miss, which would then set off a standard x86 handler.
    After that was done, the texture loop would resend to the texture unit.
     
  8. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,009
    Likes Received:
    536
    Prefetching has to take care of it 99% of the time, load on page miss has to be a rare corner case or it would just get too slow.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    The hardware thread on the core that generated the texture instruction is, generally speaking, running a loop over multiple fibres (qquads). So when one fibre generates a texture request it's likely that the next fibre in the loop will generate a texture request within a very short time ... and so on for all the fibres in that hardware thread.

    Once the hardware thread has issued all its texture requests and exhausted all "shader ALU instructions" that could run in the shadow of those requests, it has to relinquish control of the core to the other hardware threads, i.e. go to sleep.

    When the texture unit reports that it's encountering misses, I guess the relevant core marks some kind of "waiting for slow texture results" status on the thread that generated the requests. This prevents the core returning the dozing thread to context until there are results it can use.

    If the texture unit had texels already in cache and was able to return results to the originating thread without the originating thread having to go to sleep, I guess it raises a simple "texture results ready" flag.

    Alternatively, I suppose, the originating thread could request that a simple watchdog with sleep intervals is setup, to poll the status of texturing. The core could then adjust its polling interval when told that page misses have occurred. Polling would run on the "control thread" that runs on each core, independent of the shader threads. This control thread is the same thread that generates qquads, performs interpolation, etc.

    Jawed
     
  10. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    First, I don't see any way to fill missing pages in real-time from disk to service one frame (the latency in the draw call would be horrid and stall future dependent calls). I'm speaking from experience with the texture streamer I work on at work. So either we are talking about decompressing/recompressing on the fly from a higher compressed format in memory, to a lower compression usable by the texture units, or procedurally generated textures. In any case the idea of pure virtual textures requires the assumption that some lower mip level is resident, and switching to the lower mip level to continue the draw call.

    One wouldn't want to actually load the missed page in that draw call. IMO this would have to be a background process, and even would be a good idea with procedurally generated content to amortize that generation cost over a few frames.

    Having to produce the page on a texture fetch fault just seems like a bad idea to me. If this is the case can page fault has to be a rare case, then from the developer perspective, pure page virtual texturing wouldn't seem like a good idea.

    So if texture requests go through L2, then we are talking locked L2 cache lines and effectively memory mapped IO to the texture unit?

    Wouldn't seem like a good idea to have to check for texture fetch fails by looking at L2 results in a shader. So perhaps the texture unit itself throws the exception, which in turn faults one of the cores?
     
  11. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    #131 bowman, Apr 9, 2009
    Last edited by a moderator: Apr 9, 2009
  12. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,009
    Likes Received:
    536
    With or without hardware support megatextures have pages ... with hardware support they are accessed efficiently, without hardware support less efficiently. It's hardly the only application of sparse textures, for instance you don't want to store a photon map in a 3D texture at a uniform resolution in memory ... you want to store it sparsely.

    Being able to store sparse arrays without having to pull tricks with extra layers of indirection in the shader is just a nice ability to have.
     
  13. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    This is actually Jasper Forest, Computerbase got the wrong picture hehe
    I was at IDF and I can say however that Gelsinger also showed briefly a Larrabee wafer. I'll post the pic I got (bad quality sadly) when I'm done with my report about the event. Larrabee is much bigger than that ;)
     
  14. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    Yeah, I thought there was something off - even if it's at 45 nm you could fit three to four of those on a GT200. :lol:

    I can't believe there's an IDF going on and NONE of the big sites are covering it. Not even posting third party info. It's like they've all signed contracts to pretend it doesn't exist. :???:
     
  15. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Cost reduction so Intel didn't invit the international press to this IDF.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    Any quick estimates on the number of dies per wafer or a ballpark figure on the die area?
     
  17. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    LRB may be a wonderful dx10 gpu, but do you think it will work well for a dx 11 gpu? I am referring to Tessellation specifically. I can't see how they will be able to run the tessellation efficiently in software. (Though many said the same about rasterization :) )I guess they'll add some new instructions to the ISA.
     
  18. PatrickL

    Veteran

    Joined:
    Mar 3, 2003
    Messages:
    1,315
    Likes Received:
    13
  19. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    I still can't believe they censored the die with a powerpoint slide in the webcast. Black Project!

    It better be amazing with this amount of skulking.
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    This is when you wished you'd brought a 50MP digicam with serious zoom, eh?

    If that's really Larrabee then, ahem, 32nm can't come soon enough, eh? Maybe that's a prototype built on 90nm :razz:

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...