Pages could be striped across multiple channels at a smaller granularity than a virtual memory page. OpenCL optimizations for some GCN GPUs indicated a kernel should align its data assuming 256 bytes per channel.So, I just had a random thought about this. If SFS allows you to stream in textures at a per mip level and a perhaps only needing to load part of the texture, and because of the way memory works (pages connected to channels), might the difference between fast and slow memory be essentially the same? For example, if the texture you are streaming in is only 4 pages, assuming those pages are accessed by different channels, there would be no speed difference between the slow and fast areas in memory because you are only using 4/10 of the memory channels anyway.
Even if SFS does save the bandwidth from a given texture, if it's at least several KB it would be able to span enough channels where it would make a difference. Even if the accesses for a given slice of the texture happened to access only a handful of channels, it's one resource access out of likely hundreds. The global behavior of all accesses would average out the behavior of specific accesses.