Jawed
Legend
This comes back to the programmer. e.g. they've designed a way to construct a D3D pipeline that sizes a tile to fit within cache alongside other stuff that's also going to use cache. So the "driver" must assess the pixel shader for its register payload versus the amount of texture latency it needs to hide, and trade those off against tile size. ATI and NVidia don't have a tile size to worry about, but they do have to worry about cache thrashing caused by the raggedness of the progress of the batches - i.e. what's the greatest difference in program counter amongst the extant batches and what effect that has on cache thrashing.I was making a distinction between cache and RAM. I think Larrabee would do best to not force an upcoming qquad's state to memory hundreds of cycles away.
Perhaps there is a way the core or compiler can ensure the furthest it can go is the L2.
So in Larrabee the programmer is supposed to configure L2 cache lines to suit the types of fibres running. Once a core gets under way with a phase of rendering I get the impression that the cache lines are pretty much static - e.g. in pixel shading a block of lines for the tile data, another set of lines for texture results (parameters too) and some lines for general scheduling.
One thing that's occurred to me is that Larrabee's circular fibre scheduling could lead to under-utilisation of the texture units - this is the average versus worst-case latency hiding that Mintmaster was alluding to earlier, I think. Not sure, need to think about it more.
Depends on the interval between starting the move and the other unit consuming the data - i.e. whether this mostly stays within L1 or often ends up going to L2.That's reminiscent of Xenon, where moving data between pipes requires a similar trip. The latency from that is pretty significant in the Xbox implementation.
Perhaps that is something we can expect to improve with LarrabeeII.
---
So, what happens on interrupts? I've got no idea what happens to x86 SSE registers in this situation, so not sure what to expect in Larrabee and the effect on VPU. Is it likely that Larrabee will turn off interrupts on most cores, e.g. leaving one core as able to accept them?
Jawed