The Nature of Scaling (Signal Processing Edition)

This is arguing the definition of the word "hard" though. This was done before. I was more specific than the general statement. "Hard lines" refer to high frequency signals in the Fourier domain. So the issue at question became not the definition of hard line, but whether or not upscaling suppresses high frequency in the Fourier domain. Lets not turn this back into a game of semantics.
That's not my intention, but this is important. You mention high frequencies, but "high" can be understood as being close to the highest representable frequency, which is directly proportional to the sample rate.

A 10kHz sound sampled at 22.05 kHz is "high" relative to the representable frequency range. The same sound sampled at 44.1 kHz is in the lower half of representable frequencies. Yet in absolute terms it's the same frequency.

It does make a statement about distance though. It is actually very precise - it defines distance based on the number of samples.
So what exactly is the unit distance in your definition, and does it change when you upscale an image? To define spatial frequency we first need a common definition of space.

If you are claiming the frequency content is not changing, why do their amplitudes decrease?
I never claimed the frequency content doesn't change with linear interpolation. Upscaling with linear interpolation creates frequencies above the maximum representable frequencies in the source image.
 
A 10kHz sound sampled at 22.05 kHz is "high" relative to the representable frequency range. The same sound sampled at 44.1 kHz is in the lower half of representable frequencies. Yet in absolute terms it's the same frequency.

Once again though, this shifts the arguments to semantics. We can argue back and forth over what high means, but it wont help. So the definition that has been used in this thread is that "high" means those frequencies near the limit in the original image. If you look at my comparisons, you will find that this is the way they have all been done. Not relative to the frequency limits of the upscaled image, but always with respect to the frequency limits in the lower limit.

So what exactly is the unit distance in your definition, and does it change when you upscale an image? To define spatial frequency we first need a common definition of space.

The DFT sections off frequencies as f=k/N where N is the number of samples and k goes from 1 to N-1. So the DFT defines a unit of distance as 1/N. In other words, the space between adjacent pixels.

This DOES change when upscaling. You will notice that I already touched on this. Here is the quote:

Xalion said:
Now, the Fourier transform itself samples frequencies at a rate of 2*Pi*k/N - so the number of samples usually determines the size of frequencies sampled. Now, I have loosely in my previous post done the same thing you have tried to do so it better be explained. The actual frequencies sampled in a 10 sample picture and a 5 sample picture would be:
2*Pi/5*{0,1,2}
and
2*Pi/10*{0,1,2,3,4,5}

Note that the actual frequency correspondence would be the first with first, the second with third, and the third with the fourth. We on the other hand have compared the first with the first, the second with the second, and so on. It is a bit disingenuous. There are many ways used to justify it, but for now we will just accept that it can generally be done as long as both functions are properly normalized.

The justification I mentioned but did not give was that for comparing the frequency 1/5 with 1/10 the sampled "distance" goes from N to 2N, so 1/10 goes to 1/2*5 and comparing 1/5 with 1/10 is comparing equal "distance" points.

However, it is worth mentioning that it doesn't matter if you actually do this remapping or if you compare 1/5 with 1/5 in both cases - you will find that the effect still happens.


I never claimed the frequency content doesn't change with linear interpolation. Upscaling with linear interpolation creates frequencies above the maximum representable frequencies in the source image.

It also depresses frequencies that are near the maximum of the non-upscaled image. Note that the frequencies listed in the post I quoted are equal "distance" in the sense that I used the N to 2N type mapping above (with the appropriate number in front of N for the number of pixels I added of course). So they cannot be frequencies above the maximum representable frequency of the source image. They are the frequencies right on the edges of the maximum representable frequency.
 
The DFT sections off frequencies as f=k/N where N is the number of samples and k goes from 1 to N-1. So the DFT defines a unit of distance as 1/N. In other words, the space between adjacent pixels.

This DOES change when upscaling.
Well, that's where our definitions differ. I consider the dimensions of the image unit distance, and this does not change when upscaling. Therefore the upscaled image can contain higher frequencies than the source image.
 
Well, that's where our definitions differ. I consider the dimensions of the image unit distance, and this does not change when upscaling. Therefore the upscaled image can contain higher frequencies than the source image.

This seems like a red herring to me - so let me ask you a few questions so you can clarify.

First, how do you extract frequencies from an image?

Second - your definition is nothing more than normalization. I've already normalized all of the values that I've given you as examples. So how exactly does this definition change anything that I've claimed? Just to remind you, the process I described here:

Xalion said:
The justification I mentioned but did not give was that for comparing the frequency 1/5 with 1/10 the sampled "distance" goes from N to 2N, so 1/10 goes to 1/2*5 and comparing 1/5 with 1/10 is comparing equal "distance" points.

is just another way of "normalizing" the image so that "distance" doesn't change. In other words, every one of my examples has shown what happens under your definition.

Note, I never said an upscaled image couldn't contain higher frequencies than the source image. It can. The issue is what happens to existing frequencies when you upscale. The issue is whether or not existing frequencies are suppressed during upscaling.
 
It explains why I don't agree with the statement that upscaling destroys hard lines.

No - it really doesn't. A line is the same whether you define the range as going from 0 to 1 or from 0 to 100. All you need for a line is a set of points that can be connected and an obvious reason to connect them (like the fact that they are all the same color and are much different in color to the lines right next to them). The "distance" between them is irrelevant for all intents and purposes. Especially because remapping the coordinate space is a linear operation - which preserves lines.

This also ignores the definition of frequency suppression with upscaling. Hence once again you need to answer the question that you've been avoiding.

How do you claim to extract frequencies from a picture?
 
No - it really doesn't.
If you think so I should probably stop right here wasting more of my time trying to explain why I believe that upscaling doesn't generally destroy hard lines but can in some cases even make them "harder".
 
If you think so I should probably stop right here wasting more of my time trying to explain why I believe that upscaling doesn't generally destroy hard lines but can in some cases even make them "harder".

I'm just a casual observer, but insofar as what little I know on the subject, I'd agree with Xalion -- I don't see how your statement is relative or even possible. The only way you're going to upscale and keep "hard lines" (as has been previously defined several times) is by simple pixel-doubling. And since that's not what you appear to be talking about, then I don't see your point either.

Of course, again, I'm relatively naive on this subject -- but it's interesting to follow along. Either way, I'm still not seeing your side of things yet, but that doesn't mean I don't want to hear more about it.
 
The point is you don't know if a hard line in an image (i.e. post-sampling) is a hard line in reality (i.e. pre-sampling). You're obviously going to kill post-sampling lines, but those might not exist pre-sampling so you might actually make things more correct; alternatively, the line might exist pre-sampling and upscaling might 'destroy' its hardness, thus reducing the quality of the image, which is Xalion's point.

AFAICT you two are mostly arguing semantics and not agreeing on definitions, let alone what is being discussed - unless I'm misunderstanding the discussion's subject myself, which would make the whole thing even more confusing... :)
 
The point is you don't know if a hard line in an image (i.e. post-sampling) is a hard line in reality (i.e. pre-sampling). You're obviously going to kill post-sampling lines, but those might not exist pre-sampling so you might actually make things more correct; alternatively, the line might exist pre-sampling and upscaling might 'destroy' its hardness, thus reducing the quality of the image, which is Xalion's point.

I do recall Xalion mentioning that we're not targetting the correctness of the original source art -- which generally makes sense when you put it this way:

Here's what I'm thinking:
The display device cannot tell what the original picture was, so trying to understand how the upconverted image relates to the original source art is pretty much worthless; it really never comes into the "picture" (har har!)

We have the original digitized image A, and we have the upscaled image B. B cannot possibly contain any more image data than A, it's physically impossible. You can't recreate detail that wasn't there, especially since nowhere between A and B is the true original source.

Thus, one way or another, image B will have to be "stretched". And in nearly all circumstances that I'm aware of, this results in some amount of image degredation. It's like playing 320x240 rez on a 50" screen -- in it's native size (about the format of a post-card, let's say) then the image will be well defined. In it's 50" size (about 12x the native rez) it's going to be a wreck damn near no matter what.

Only place that I can conceive of where this isn't the case would be corner cases where the entire frame is a single color or something thereabouts. I cannot conceive of any other way it can be scaled and NOT look like ass at some point.
 
The display device cannot tell what the original picture was, so trying to understand how the upconverted image relates to the original source art is pretty much worthless; it really never comes into the "picture" (har har!)
I'm gonna chuck in here some more context. This thread originated in game-scaling discussion, and the idea that scaling a game render introduces blurring, from which came this argument about what is blurring etc. (if I'm remembering this right!). In these cases, the rendering is a very imperfect representation of the infinite edge fidelity of our mathematical polygons. The purpose of scaling is to take that limited amount of information and stretch it over the screen so it looks good. Rendering at native resolution accomplishes this, one sample of image for one pixel of display, within the limits of the rendering engine! Rendering below native resolution and upscaling has to worry about 2 issues : either duplicate pixels creating jaggy graphics with hard edges, or interpolated pixels producing soft 'blurred' edges - blurred being a potential technical subject for discussion, but for the purposes of description by onlookers, they'd call the images blurry.

The nature of adding information is perhaps the meat of this discussion's origins. Sure, we can't ever know what the real information was. But is it possible to fabricate information to approximate the missing information, or even just effectively disguise the missing information? In some cases an algorithm could make an excellent guess. Consider a perspective chequered floor. An algorithm could analyse that image, determine what the original source was, and upscale by rendering a chequered floor to whatever resolution you require. There also exist algorithms that use 'fractals' to introduce...information of a sort, that help disguise upscaled samples. Toshiba have showcased a TV that upscales SD images with what look like very good results, not by interpolating pixels, but by doing something clever to guesstimate content that would fit those samples missing from the original source. And we all know, thanks to Hollywood, the CIA have computers that can take a fuzzy CCTV image and zoom in x100 and read the signature of a document that exists in only a few pixels of source... ;)

All this discussion on signal theory should, hopefully, lead back to the nature of scaling and how developers can produce software that creates data, such that upscaled images serve effectively to portray the original artistic ideal. Or at least determine that such intentions are impossible, and we either have jaggies or fuzzies!
 
The nature of adding information is perhaps the meat of this discussion's origins. Sure, we can't ever know what the real information was. But is it possible to fabricate information to approximate the missing information, or even just effectively disguise the missing information? In some cases an algorithm could make an excellent guess.
While I agree with what you're saying, that's treading on some pretty thin ice... My concern would be "mis-guesses" that would end up making it look even worse than a billinear-filtered resize. I'm not sure that we have the technology (yet, if ever?) to effectively guess right enough of the time to make it worth it.

Of course, that too depends on the image being upscaled. Old Nintendo games genreally get GREAT upscaling with those really fancy effects you find in most emulators -- mostly because those kinds of games are incredibly easy to make the correct "guess" as to the original art. But if I went out and filmed my own mini-movie about the comparitive merits of Roman vs Greek architecture on my SD handicam and wanted an upscaler to convert it to 1080P -- I can only imagine the mess that might be possible if it were trying too hard to guess what I was filming.

Consider a perspective chequered floor. An algorithm could analyse that image, determine what the original source was, and upscale by rendering a chequered floor to whatever resolution you require.
Always possible, surely. But imagine the overhead involved to properly calculate the plane, then lighting changes / gradients in a color pallete that meets the original art, and then shadows, reflections? Will the reflections be upconverted properly -- or will they be "blocky" against a smoothly generated tile floor?

it's my opinion that the technology doesn't currently exist (unless I'm missing something BIG) to effectively upscale anything but the most trivial or basic scenes without losing image fidelity. And while I see some people in here talking about fringe cases where XYZ might happen, those cases just don't really appear that often In Real Life -- at least in my experience.

But hey, I'm always up for being proven wrong on these sorts of things. I'm way outside of my depth being able to technically describe the processes involved, but I can understand the math that's going back and forth. Someone prove me wrong please, so that I can finally start endorsing those losers who buy 1080P upconverting DVD players ;) :p
 
But if I went out and filmed my own mini-movie about the comparitive merits of Roman vs Greek architecture on my SD handicam and wanted an upscaler to convert it to 1080P -- I can only imagine the mess that might be possible if it were trying too hard to guess what I was filming.

it's my opinion that the technology doesn't currently exist (unless I'm missing something BIG)
the best showcase so far has been the Toshiba TV tech demo, but I don't think they've released solid media. Quick Google...we only appear to have low quality/small reference materials. I know from the 2D image processing space that upscalers there are pretty uninspired, even the fancy ones, producing clear artefacts and interpolated sample boundaries. I've no idea what algorithm Toshiba are using, or the theoretical basis. And TBH without a clear comparison pic, I don't really have a reference point to know if they actually doing anything impressive or not! It'd be nice if those with the theory, or some good imagination, could contribute ideas based around scaling theory and signal processing.
 
I'm just a casual observer, but insofar as what little I know on the subject, I'd agree with Xalion -- I don't see how your statement is relative or even possible. The only way you're going to upscale and keep "hard lines" (as has been previously defined several times) is by simple pixel-doubling. And since that's not what you appear to be talking about, then I don't see your point either.
I'll try it from another angle:
Imagine instead of just doubling pixels you scale by 100,000x using nearest neighbour sampling. The result contains clearly defined, hard-edged rectangles for every pixel in the source image.

Did those hard edges exist in the source image? No, since the lower sample rate of the source image couldn't even represent such high frequencies. Thus upscaling, depending on the algorithm used, can even make lines "harder".


We have the original digitized image A, and we have the upscaled image B. B cannot possibly contain any more image data than A, it's physically impossible. You can't recreate detail that wasn't there, especially since nowhere between A and B is the true original source.

Thus, one way or another, image B will have to be "stretched". And in nearly all circumstances that I'm aware of, this results in some amount of image degredation. It's like playing 320x240 rez on a 50" screen -- in it's native size (about the format of a post-card, let's say) then the image will be well defined. In it's 50" size (about 12x the native rez) it's going to be a wreck damn near no matter what.
In one way or another, image A will have to be "stretched", too! ;)

To make a digital image visible it needs to be reconstructed – the process of generating a continuous signal from a discrete one. Every monitor that shows a digital image performs reconstruction. It gives pixels an area and shape to generate a continuous* signal of light from discrete samples in the framebuffer.

The example I gave above may help in understanding that reconstruction can also be viewed as infinite upscaling. And if you put these two facts together, you'll see that every monitor effectively performs upscaling. Some generate rather hard edges (e.g. LCDs), others tend to be softer (CRTs). Algorithmic upscaling isn't really all that different from reconstruction using the physical shape and properties of pixels. None of them is inherently more correct - the information is just missing from the signal. But depending on the context and personal preference, any of them may be percieved as better.


If you watched the image on the 50" screen from 12x the distance it would probably look ok, too.


* macroscopically speaking.
 
Shifty, multi-frame superresolution takes a whole lotta horsepower ... so maybe they are doing that.
 
I'll try it from another angle:
Imagine instead of just doubling pixels you scale by 100,000x using nearest neighbour sampling. The result contains clearly defined, hard-edged rectangles for every pixel in the source image.

I don't see how you would/could get the result that you describe... You aren't taking a picture of the pixels, unless you're doing something ridiculous. You are stretching the digital image itself, not the output device. You're not comparing sensical things -- a 320x200 image on a 72DPI viewport will be sized something like a 4x3 postcard... A 32,000,000x20,000,000 on a 72DPI viewport will be sized something like seven miles by four miles. But the latter image will be made up of a hojillion pixels and will be displaying a blurry mass unless you're viewing it from orbit...

Your suggestion is nothing more than a red-herring.

In one way or another, image A will have to be "stretched", too! ;)
I suppose you can argue semantics and say that an image is made of discrete points of data that are then "stretched" to fit pixels. But you're again arguing semantics -- there's no other display alternative that I'm aware of; care to enlighten me on what the other options are? Rendering does indeed take into consideration the pixel format of modern display devices; hence the entire reason why we have field of view for varying resolutions.

Image "A" will be no more stretched than a rendering at "native" resolution. So again, another red-herring.

If your argument is based around nothing but pixel size, then I cannot agree. One is a requirement for display device technology and, so far as I know, has no workaround. Software upscaling is not the same, especially the way you described it in your first paragraph.
 
Albuquerquue, I don't think you've understood what Xmas has been saying.
I suppose you can argue semantics and say that an image is made of discrete points of data...
That is exactly what a digital image is. A rectilinear set of "samples" (aka pixels).
that are then "stretched" to fit pixels
I wouldn't use the word "stretched".

For want of better terms, you have to do reconstruction to go from the discrete digital representation to the visible, analogue image that you see. In the case of an LCD this reconstruction is basically done with box filters (though I suspect that the glass (and your eyes) will do some further optical low pass filtering), while with a CRT it's more like Gaussians.
 
I'll try it from another angle:
Imagine instead of just doubling pixels you scale by 100,000x using nearest neighbour sampling. The result contains clearly defined, hard-edged rectangles for every pixel in the source image.

Very early in the thread great care was taken to define what I meant by the statement you keep crediting to me. In light of that, I want you to do exactly what you have just suggested and post the frequencies of the image before and after. Make sure you explain how you are getting your frequencies so everyone can follow along, and then explain whether or not what I have said about frequencies and upscaling holds true.

I am serious about this. Working through something yourself is the best way to learn something, and this is something you should really work through. I think you will be surprised with the result. This is why the definitions of hard line and blurring were so important in the discussion. Things are really going in circles at this point. The best way for you to explain what your point is is to post an example with real concrete math.
 
I don't see how you would/could get the result that you describe... You aren't taking a picture of the pixels, unless you're doing something ridiculous. You are stretching the digital image itself, not the output device. You're not comparing sensical things -- a 320x200 image on a 72DPI viewport will be sized something like a 4x3 postcard... A 32,000,000x20,000,000 on a 72DPI viewport will be sized something like seven miles by four miles. But the latter image will be made up of a hojillion pixels and will be displaying a blurry mass unless you're viewing it from orbit...
Why are you assuming the size of the output device scales with the number of samples in the image, or that the viewing distance doesn't?

I suppose you can argue semantics and say that an image is made of discrete points of data that are then "stretched" to fit pixels. But you're again arguing semantics -- there's no other display alternative that I'm aware of; care to enlighten me on what the other options are?
That is precisely my point. There are no other options. Every monitor performs "upscaling".

Rendering does indeed take into consideration the pixel format of modern display devices; hence the entire reason why we have field of view for varying resolutions.
It takes into account the aspect ratio, but not the signal shape in which a pixel is reconstructed.
 
That is precisely my point. There are no other options. Every monitor performs "upscaling".
You've lost me. If an image is 1920x1080 discrete samples, and the display is 1920x1080 discrete pixels, how is the image data being upscaled? That the image 'pixels' are in essence singularities and the display pixels are of substantial area, and the singularity data is being applied to this vast area to form the coloured pixel?

Okay, that's starting to make sense. The rendered image is created without an absolute size. Producing an image of any real size means constructing that data into a format of area. In a rendered front-buffer with pixels of black - white - black, where the natural way of thinking of the front-buffer data is black pixel - white pixel - black pixel, they are in reality just samples. On an LCD display, one sample corresponds to one pixel. Plotted on an analogue display like an oscilloscope, expanding those samples into a visible area would require deciding how to address the transition from sample to sample. Do you render black and white areas of equal proportion, or do you interpolate intensities between uniformly placed peaks of black and white?

Does thinking about it this way help with the issue of scaling images to non-native resolution displays? The intrinsic nature of the upscale from singularity sample to area of light of an LCD doesn't have any obvious bearing on the conversion from an area of samples to a larger array of samples. The LCD upscale is out of our hands. We can only work with the array of samples that will be transmitted to the display device.
 
Back
Top