Um, no. Depends on what you're doing. You're correct whan you don't have to do magic decoding things and can decode just a frame or two at a time, but if you have to do anything special (like use the GPU to accelerate decoding) then you often have to have multiple frames in flight at the same time. The 360 had sometimes upwards of 12 frames in flight simultaneously when decoding 1080p H.264 for HD DVD. This was so we could keep all the shader pipelines doing the inverse DCT busy (At the start of and end of a frame you can only do a few IDCT simultaneously). Then also the bandwidth could be up to 36Mb, which is 4.5 megabytes per second. I don't know what the buffer size was on the 360, but probably a couple fo seconds at least. For netflix style streaming stuff, they tend to buffer quite a bit more, upwards of 15 seconds of video, which is around 12MB of ram.
We routinely ran out of memory while writing the XBox HD DVD player and had to go back and re-engineer some things to save memory.
The Apple TV can get away with less memory because it has hardware decoding for all it's supported codecs.