I posted this a long time agao about SS and MS. I don't know why this topic gets revisited time and again.
Supersampling is just that: "super" sampling, which is calculating more samples. 2x2 supersampling means calculating 2x2, or 4 samples per pixel. Within each pixel, 4 "samples" are calculated independently. For each sample, every rendered polygon that covers that sample point, texture is sampled and filtered (bilinear, trilinear, or anisotropic, as requested), shading is calculated, and stencil and Z are calculated. The frame buffer holds a unique value for each supersampled color, alpha, Z, and stencil. 4X supersampling takes 4X texture bandwidth, 4X pipeline computation, and 4X frame buffer bandwidth and space.
Multisampling is an innovation that was introduced by SGI some years ago. The observation is that for texture operations (like alpha mask), texture filtering, and shading, should be calculated correctly by calculating them once per pixel, or at least approximately (very close) correctly. That's what level of detail (LOD) is: the appropriate level of filtering for the texture, within the pixel. So, in true SGI-style Multisampling, a single textured, shaded sample is produced per pixel, and independent Z and stencil values are produced per pixel. The color/alpha value is replicated across all of the samples within a pixel, but the individual samples have unique Z and stencil. Also, there are unique, separate frame buffer entries for all: color, alpha, Z, stencil, as before. So, for 2x2 (4X) multisampling, you require 1X texture bandwidth, 1X pipeline computation, but still 4X frame buffer bandwidth and space.