16 wide vector units are efficient on graphics but not much else. That's why most systems (including x86 chips) are 4 wide.
It doesn't matter that much. In the end you need to use both for graphics. It doesn't matter if you use 1 vector or 4 vectors, in most cases you'll want 4 - because of latency for ALU commands.
I can see Larrabee being used as a GPU but I can't see it beating Nvidia or ATI. It might beat Cell, but only on highly parallel tasks.
Larabee is quite similar to modern GPU designs (read NV/AMD). Couple of round-robins with data passed on circular bus, throughput increased by alternating load/store with ALU operations.