Multisampling can be performed in several ways.
In the case of tilers, the pixel color can be computed once and applied to the visible samples in the pixel based on the sample's z values. When the tile is done, the filter can be applied to the tile and the results written to the frame buffer.
On IMRs multisampling is best done using some version of the A buffer. In this case, triangle fragments (those portions of a triangle within a pixel) set the bits of a coverage mask with one bit per sample. A single color and z value is stored for the fragment. Since there are usually only one or two fragments in a pixel, this allows significant compression of the color and z data. For 8x AA it allows up to 4 to 1 compression without loss for two fragments.
It does assume, however, that the z values are the same for each sample, which is usually not the case. Z3 handles this by including the z slopes along with the z value at the center. Other problems occur when the number of fragments exceeds what is allocated (usually two are allocated since most pixels have one or two fragments).
There are a number of sample patterns.
Ordered grids - standard row-column pattern
Staggered grids - every other row offset half a column (close packed)
Rotated grids - ordered grid rotated by a fixed angle
Sparsely sampled grids - NxN grid for N samples, one sample per row and one sample per column
Jittered grids - a uniform grid with each sample randomly jittered in +/- x and y up to half a sample step. Requires a large number of samples to work well. Similar to Poisson distribution.
Poisson pattern - a uniformly sampled random pattern. Requires a large number of samples to work well. Difficult to implement in hardware.
Others
The important aspects of a good sample pattern are:
For small numbers of samples ( say <32) a pattern that provides approximately the same number of intensity gradations as the number of samples is of primary importance. Ideally this would be true for edges at all angles, but this is especially true for near horizontal and near vertical edges. Sparsely sampled grids is best, followed by rotated grids.
The next most important factor (especially for a small number of samples) is uniform sampling. This prevents pixel popping. Care must be taken with all the above patterns to make sure the samples are fairly uniform in their coverage of the pixel and do not aggregate in clumps or along characteristic lines.
To break up aliasing across pixels, a pattern library is often used. These patterns are interleaved across several pixels (say 4x4 pixels). To avoid pixel popping, care must be taken that the pattern boundaries do not cause non-uniform sampling. Ideally, the whole 4x4 pattern looks like one uniform pattern much like seamless texture tiles.
Of course it goes without saying the more samples the better. For IMRs coverage mask (A buffer) techniques easily allow 8x, 16x, or even 32x AA without much extra memory bandwidth since it simply requires a larger bit mask. Tilers can also easily increase the samples without incurring extra bandwidth.