There are also downsides of doing tone mapping in custom resolve step just after the rendering. If you write the tone mapped colors to the render target, it's no longer linear, so you can't use hardware alpha blending anymore for transparencies or particles. You can sidestep this by rendering the transparencies to off-screen buffers and combining the results, but it will cost extra. Also in post process shaders you need to first convert the color value back to linear space and at the end of the shader convert it to tone mapped space. The extra ALU cost can be quite big (especially if you do your tone mapping for luminance and need to separate/rebuild the chroma every time).
Writing linear values to the render target is of course also possible (and I expect this is the way you handle it). Custom resolve: Tone map all MSAA samples, average the tone mapped values, inverse tone map the average value and write it to render target. This is less correct, but doesn't need repeated tone mapping / inverse tone mapping steps later. I suspect this looks good in most cases (especially if the post processing pipeline is simple), but when you start adding lots of atmospheric effects (fog, volumetrics, etc) the error will get worse (as the linear assumption doesn't hold for the MSAA edge pixels). But it shouldn't be that bad, since the absolute worst case is that the object edge gets one pixel narrower or wider (and you lose antialiasing for that edge). Not a biggie, unless it happens often. I would be more worried about losing the antialiased edges because of the transparencies (and volume fog rendering). If you have lots of big flying soft fog particles around, the image will pretty much lose all antialiasing.
I don't personally like modern forward rendering techniques (Forward+ descendants) in general, because you need do a depth pre-pass. This doubles your geometry cost. Our new (64 bpp, full fill rate) g-buffer rendering pipeline is able to render the whole g-buffer approximately as fast as a depth pre-pass would take (it's primitive setup bound). This kind of deferred rendering is very difficult to beat by forward techniques that need to render their geometry twice.
2xMSAA is not enough by itself. If you resolve the MSAA at the beginning of the pipeline, you don't have separated (sample precision) values anymore for the edges, and thus you cannot use this data to improve the PPAA quality. Deferred pipelines can combine MSAA and PPAA better. Pure 2xMSAA isn't enough anymore (it has too many problem cases).