Our virtual texturing system renders the scene to a 4x4 smaller 32 bit per pixel frame buffer. For this pass you do not need to calculate anything else than texture coordinates for each pixel and the mip level, so the pixel shader will be pretty simple (and very fast). Vertex shader input can be also really simple: position and (a rough precision) texture coordinate. Pass though the texture coordinate to pixel shader and transform the position.
In the pixel shader, you can calculate the mip level either with gradient instructions (based on texture coordinate screen space derivatives) and a few math instructions, or you can use DX10/DX11 texture objects' CalculateLevelOfDetail-method (but that method needs DX10.1 hardware or better). Both methods are really fast.
We directly transform the texture coordinates to page IDs in the pixel shader, as it saves a lot of CPU cycles (later in the pipeline), and makes it easier to quickly remove duplicate pages when the data is processed by the CPU. Our page ID is basically: 4 bit mip, 11 bit page Y, 11 bit page X (starting from most significant to least significant). The pixel shader outputs the packed page ID.
Now on the CPU side, you will lock the texture data buffer (use staging resources on DX10/11). Each 32 bit value in the buffer represents one visible page. There's usually a lot of same page IDs in the locked buffer, so I recommend doing some quick and dirty duplicate removal before feeding that data to your cache (*). We are using LRU policy in our cache, and it has been working really well. In your cache design focus on query speed, since every visible page needs to be queried in every update (you do not want to load data that's already on cache). Adding and deleting pages can be slower operations, since the frequency of those operations isn't as high.
(*) You can iterate the pixel buffer, and if "pixel = last added pixel" do not add it to visible array. This removes huge amount of duplicates already. On consoles it's very efficient to do duplicate check for whole cache tiles (it's very efficient since scanlines are not linear, but in tiled format, so a region of the image is in linear memory addresses). This kind of local duplicate check removes almost all duplicates with a very low cost.