What this means is that unless you are fetching multiple textures every other clock cycle, or writing to the frame buffer every other clock cycle (not the case for very long shaders), on average, half the time your bandwidth is wasted.
On each clock cycle, I can fetch 1 128-bit instruction, and have another 128-bits of bandwidth left over to do with as I please. On a long shader (say, 500 instructions), with very few texture texture fetches (say, 80% color ops), 40% of my bandwidth is just wasted, since the color ops don't use bandwidth, and the instruction streaming doesn't saturate the bus.