This is the case with the GeForce 3 as well. The Vertex Shader Latency seems to be twice the throughput, so yes, in the case of Vertex shaders, we do see pipelining.
Pixel shaders seem to cycle through a batch of fragments and wait for the instruction to be complete on the first one before...