Larrabee includes texture filter logic because this operation cannot be efficiently performed in software on the cores. Our analysis shows that software texture filtering on our cores would take 12x to 40x longer than our fixed function logic, depending on whether decompression is required. There are four basic reasons:
- Texture filtering still most commonly uses 8-bit color components, which can be filtered more efficiently in dedicated logic than in the 32-bit wide VPU lanes.
- Efficiently selecting unaligned 2x2 quads to filter requires a specialized kind of pipelined gather logic.
- Loading texture data into the VPU for filtering requires an impractical amount of register file bandwidth.
- On-the-fly texture decompression is dramatically more efficient in dedicated hardware than in CPU code.
The Larrabee texture filter logic is internally quite similar to typical GPU texture logic. It provides 32KB of texture cache per core and supports all the usual operations, such as DirectX 10 compressed texture formats, mipmapping, anisotropic filtering, etc. Cores pass commands to the texture units through the L2 cache and receive results the same way. The texture units perform virtual to physical page translation and report any page misses to the core, which retries the texture filter command after the page is in memory. Larrabee can also perform texture operations directly on the cores when the performance is fast enough in software.