The way I remember it is tiles always used 32bit internally, and FrontBuffer was usually 16bit to save VRam (which was a precious resource shared with display-geometry and textures, so it was likely common to use 16bit FB).
Because downsample to 16bit only happens on resolving to FrontBuffer, color loss artifacts are minimal (alpha blended stuff doesn't get screwed up and so forth), but yes, there will still be dithering of course.
To be fair, 16bit looks virtually perfect on SDTVs doing it this way (it only gets ugly if you do entire rendering in 16bit, which was more common with GC/Wii and PSP games).
But moving to fixed pixel HDTVs, things tend to look ugly anytime you use dithering and pixel resolution that isn't display native.