Maybe there are other candidates that profit more from a cache, where latency hiding is harder? If there were no L2 caches in previous generations (?), are there features in DX10 that make a cache more useful?
Older GPUs have other kinds of caches, not just texture caches. e.g. ATI GPUs have a colour buffer cache (render target) and a z/stencil buffer cache.
There's also the post-transform vertex cache that is titchy (10s of vertices in size, a few hundred bytes, perhaps as much as 1KB, effectively).
R600 seems to be ATI's first GPU with an L2 cache, for what it's worth. NVidia has had L2 since at least NV40. Earlier than that is too far back in time for me.
The use of the screen-space tiled rasterisation of triangles in R300 and later meant that there was little need for L2. Obviously texels overlap a bit at tile boundaries, but if the tiles are reasonably big, e.g. 256 pixels, then I suppose the high-cost texels tend to be constrained within single tiles.
At least in ATI's old texturing system, anyway. Clearly the R600 texturing system is vastly more complex.
Really you could say that D3D10 introduces more sources of latency into the GPU pipeline. A nice example is constants, where you can have multiple fairly complex struct arrays defined, each arbitrarily addressed at random times during shading.
And then there's the explicit looping that 3dilletante refers to.
Jawed