Scalers that scale the resolution, such as the ones found in high-end dvd-players/TV's/consoles (PS3 & Xbox 360 Hana). How do they work? Are fundamentally different from GPU's?
High-end scalers do a lot more than just stretch the image and apply some kind of filter to it: they're intended to display a low resolution, low frame rate, sometimes interlaced image on a high-frame rate, high resolution screen.
When your original source is interlaced at 30Hz (basically, a standard SD NTSC signal) and you have a 100Hz 1080p display screen, you'll first need a deinterlacer. Since there is a time delta between the A field and the B field, you can't just merge 2 fields together to get a double resolution image.
Well, you can, but then you get horrible artifacts that are unacceptable for a high end TV.
High-end deinterlacers are smarter and, sometimes, WAY smarter than that: the smarter ones will do motion detection on the image. For static parts of the image, they'll merge the A field and B field, effectively doubling the resolution. For moving parts, they'll do so-called 'bobbing', whereby they just repeat the previous line: this is not as nice as merging, but you avoid the ugly artifacts. Because that part of the image is moving anyway, the lower quality will be less noticeable.
After the de-interlace step, you can then use regular upscaling techniques to get to the final resolution.
The way smarter de-interlacer uses motion compensation instead of just motion detection: not only will it detect if there is motion, it will detect which pixel is moving where over the course of multiple frames and use this to interpolate both spatially and temporally: invent completely new intermediate frames (because the scan rate changes) and inventing new pixels within an A or B frame during de-interlacing.
I don't know to what extent GPU scalers are able to this kind of stuff. But AVIVO and PureVideo are getting increasingly higher HQV scores, so they must get a lot of stuff right.
Which brands are there today(IC, not DVD's using them etc)?
Silicon Optix, Trident, Gennum, ATI. There must be a lot more. Some large CE companies may have their own in-house developed stuff. (Sony being a likely candidate, they are famous for their NIH tendencies.)
WHAT is there in? CPU/GPU/CACHE?
CPU: very likely, to manage the dataflow in the chip. An ARM, ARC, Tensilica, MIPS, or some other embedded processor.
Cache: I$ and D$ for the cpu.
GPU: Why?
Those components are standard stuff these days in pretty much any complex embedded chip and probably only a very small part of it. Without in-depth knowledge of the inner workings of the other blocks, area and complexity estimation is pretty much meaningless.
Can't a GPU do the same job as a scaler?
For motion estimation, you start with a 16x16 (or smaller) pixel block and then move it in a region of, say, 32x32 in steps of 1/4 pixel and try to find the location with the closest match. (It's very similar to what a video encoder is doing.) I don't think GPU shaders have the ability to be efficient for that kind of memory access patterns.
Simple upscaling is a small piece of dedicated hardware. Probably not worth putting on a shader.
I am trying to understand them but find little to no references at all.
The Google is your friend. Any permutation of the words 'motion estimation compensation frame rate conversion scaling' will give you way too much to read.