Think about it this way:
The z-buffer stores depth information. 3D scenes are made out of triangles (and will likely be made out of curved surfaces in the future). Each triangle is linear in nature, meaning that some sort of linear compression is best for depth data. Curved surfaces have different properties, but may still be compressed well enough (though this time it will be somewhat lossy...it will be interesting if it can be made to work...) through polynomial compression techniques (ex. cubic).
Color information is much more chaotic. Neighboring pixels often have rather little to do with one another. Since framebuffer compression schemes need to work on as small of a scale as possible, not even texture compression schemes like DXTC will work well (esp. when you include the lossy nature of such schemes...). The primary thing that allows framebuffer compression to work is multisampling. Quite simply, multisampling FSAA shares colors between different samples within a pixel. Why write the color four times when you can just write it once?
It will be interesting to see if coverage mask information can be used in the future for further compression of complex scenes (ex. 16x FSAA with an average of two triangles per pixel? Why store an average of 8 samples per triangle/pixel when you can store just one?).