In your argument you've missed one vital fact - the sampling offset can change between frame. You render every odd pixel on every odd field.
You render every even pixel on every even field. If the odd field and even field are from the same image, you reconstruct perfectly the alternating odd and even pixel data as you'd experience sampling continuously across the image.
Going back to my earlier visual representation, you said,
I was showing what the source data at 1080p
should look like, and what the data rendered using either trick was. You wouldn't render all the pixels in a field and then replace half of them - that'd be a complete waste of time!
Why would you average the two values when sticking them back together? Just draw black line, white line, black line, white line. An upscale (render or sample at half res) would render all black as it only samples every other line.
Or in a concise summary, you've got it all wrong.
You've misunderstood the interlacing method and what it's doing. Your original analysis of the image quality was incorrect and subsequent arguments aren't valid.