Texture Compression and Quality Metrics (OpenGL4.3 thread spinoff)

Bitrate needed to reach a particular quality level? or are there other efficiency aspects that you would like to highlight? (encode time? HW area/power cost? memory bandwidth? compressibility with things like Rich Geldreich's "crunch" tool?)
I suppose I'm primarily concerned about power/area because ASTC is huge compared to most TC schemes (with the possible - but I'm not sure about this - exception of BC6/7)

In our testing, when comparing ASTC to "old" formats like PVRTC1, ETC1, S3TC and BC5 (but not BC6/7), ASTC has typically been able to achieve equal quality at about 2/3 of the bitrate, as measured by PSNR
And that is another concern. PSNR is a flawed metric. Olson et al's paper said that "a PSNR difference 0.25 dB is visible to most observers" where higher is better and that 1.5 ~ 1.9 dB "are very significant differences", but that is debatable.

Here is an example of one mode of failure: (my apologies [EDIT] converting the source image [/EDIT] to JPG to meet forum limits - The A and B images are PNG):

source image:
attachment.php


with two approximate representations:
A:
attachment.php
& B:
attachment.php


One of these images scores 2.25dB better than the other and so should look significantly better. Does it?

Now there are a number of other metrics that seem to do a better job but, AFAWCS, they still leave something to be desired.

A is 2.25dB better... apparently. Draw your own conclusions on the reliability of PSNR
 

Attachments

  • lena-a.png
    lena-a.png
    10.2 KB · Views: 306
  • lena-b.png
    lena-b.png
    14.6 KB · Views: 306
  • lena-256.jpg
    lena-256.jpg
    16.3 KB · Views: 304
One of these images scores 2.25dB better than the other and so should look significantly better. Does it?

Now there are a number of other metrics that seem to do a better job but, AFAWCS, they still leave something to be desired.

A is 2.25dB better... apparently

Yikes, I had guessed that B. was the better one as despite it being grainier, it retains more of the detail from the original source image.

Regards,
SB
 
Couldn't you upload .png files to a remote host and link to them with tags?

Like [url]http://www.hfr-rehost.net/[/url] or [url]http://pix.toile-libre.org/[/url], and before that imageshack.us, before they decided to become pricks like the other pix hosters with javascript shit to prevent you from getting to your image and pop-up crap to piss you off.
[/spoiler]

Nevertheless this is greatly interesting!, I wonder if there are similar situations in audio where a high SNR hides similar crap.
In the same vein, about pictures, you can just bilinear filter everything so that it doesn't look pixellated.. But this can lead to crap. That's a thing I experienced very lately.. If you run a Doom port which bilinears everything and can't find an easy way to disable it, then Doom Guy's face really looks like crap (it's a set of 42 frames of 24 by 24 pixels). It's very dramatic. The original face stills looks nice if you just triple-pixel it.. We can probably build an algorithm to determine which is less pixellated relative to the surface area and it could say the bilinear filtered version is tons better.

(PS : someone made those faces at 120 pixels wide to deal with it [url]http://www.youtube.com/watch?v=5YfO02WIY6w[/url] )
 
Last edited by a moderator:
I suppose I'm primarily concerned about power/area because ASTC is huge compared to most TC schemes (with the possible - but I'm not sure about this - exception of BC6/7)
It is noticeably bigger than BPTC, but shouldn't be *that* huge (about 2% of total GPU area in Mali-T624) As for power, ASTC should be able to reduce external memory traffic and thus save a fair bit of power that way; if you manage to burn more power in the decoder block than what you save in memory traffic, I'm a bit surprised.

And that is another concern. PSNR is a flawed metric. Olson et al's paper said that "a PSNR difference 0.25 dB is visible to most observers" where higher is better and that 1.5 ~ 1.9 dB "are very significant differences", but that is debatable.

We did a handful of comparisons using SSIM and subjective evaluations; the results we got from these comparisons were largely the same as we got from PSNR (generally-equal quality at 2/3 bitrate); as such, we concluded that PSNR, while clearly not flawless, worked well enough for practical purposes.

Using PSNR to compare the PSNR-optimized output from one codec to the PSNR-optimized output from another codec is, as I see it, mostly fair.

The big case where PSNR will give seriously misleading results is when comparing outputs that have been optimized towards other targets than PSNR, i.e. psychovisual optimization. Your Lena images are actually a very good example of this: the first one is color-quantized in a PSNR-optimized way, while the second one is color-quantized with a very simple psychovisual optimization: dither.

It is possible with the current ASTC codec to get some level of psy-optimization: it has a ton of switches to tweak its error-weighting functions; properly used, these switches can often give subjectively improved quality at the expense of PSNR. The improvement is often visible but modest for LDR RGB textures, but sometimes dramatic for normal maps and HDR textures. It probably doesn't count as a true psy-model like the ones you can find in LAME and x264, but it is a start.

Psy-models for texture compression is IMO a badly understudied area, and I would expect it to be possible to get quite a bit of quality improvement out of them, especially for ASTC but also for other formats (yes, including PVRTC!)
 
I'll state the obvious : a 512x512 S3TC texture looked a ton better than a 256x256 uncompressed texture.
Dunno about psycho optimizations : I guess they were studied to create JPEG, a bit over twenty years ago?

ASTC is simply a significantly more efficient texture compression format than S3TC/DXTC, regardless of API. That said, I believe both ETC2 and ASTC are royalty free only for use in Khronos APIs.

I was going to ask whether ASTC is patent encumbered, it's nice they dealt with it.
 
Couldn't you upload .png files to a remote host and link to them with tags?[/quote]
It was only the source image that was JPG. The others were PNG. If you really want to create the former, I just used GIMP on the standard 512x512 Lena with cubic downscale.
[quote]. If you run a Doom port which bilinears everything and can't find an easy way to disable it, then Doom Guy's face really looks like crap (it's a set of 42 frames of 24 by 24 pixels). It's very dramatic. The original face stills looks nice if you just triple-pixel it.. We can probably build an algorithm to determine which is less pixellated relative to the surface area and it could say the bilinear filtered version is tons better.[/quote]
Bilinear is not a great filter - try using some simple form of bicubic or [URL="http://www.cs.utexas.edu/~fussell/courses/cs384g/lectures/mitchell/Mitchell.pdf"]Mitchell-Netravali[/URL] filter instead. (Of course, there are more sophisticated filters but making those realtime might be rather challenging)

[quote="arjan de lumens, post: 1721158"]It is noticeably bigger than BPTC, but shouldn't be *that* huge (about 2% of total GPU area in Mali-T624) [/quote] I suppose I just worry about people taking that attitude with lots of parts of the chip (that are active at the same time). <shrug>
[quote]
As for power, ASTC should be able to reduce external memory traffic and thus save a fair bit of power that way; if you manage to burn more power in the decoder block than what you save in memory traffic, I'm a bit surprised.[/quote] True but I have seen some odd use cases where compression's been involved and unexpected things have happened.
[quote]Using PSNR to compare the PSNR-optimized output from one codec to the PSNR-optimized output from another codec is, as I see it, mostly fair.[/quote]
I'd say it's useful for comparisons in the same codec but I'm still not so sure about comparions between different codecs <shrug>.
 
I'd say it's useful for comparisons in the same codec but I'm still not so sure about comparions between different codecs <shrug>.

Well, in K.R.Rao "Digital Video Image Quality and Perceptual Coding", its clear that PSNR is crap. At least use BD-PSNR/BD-Rate, PSNR-HVS-M or DSSIM.

CZD (Czenakowski Distance) as PSNR is a per-pixel quality metric (it estimates the quality by measuring differences between pixels). Described in literature as being “useful for comparing vectors with strictly non-negative elements” it measures the similarity among different samples. This different approach has a better correlation with subjective quality assessment than PSNR mostly because noise in darker areas of the picture has a bigger impact on the value of the metric than noise in brighter areas (see Weber's Law of Just Noticeable Differences). PSNR and CZD are more sensitive to noise than other metrics. The HVS (Human Visual System) behaves in a similar way, being more sensitive to changes in the darker areas of a picture.

Good automated way to test: PerceptualDiff

Even so all objective metrics fail.
MSU comparison 1

MSU comparison 3
 
Overview of compression formats:
http://squish.paradice-insight.us/index.php?id=67

ASTC vs. BPTC vs. S3TC vs. ETC1 vs. ETC2 vs. PVRTC1 vs. PVRTC2 vs. FXT1 (8bit)
Colour only data:
http://squish.paradice-insight.us/index.php?id=26
Colour with 1bit alpha:
http://squish.paradice-insight.us/index.php?id=73
Colour with full alpha:
http://squish.paradice-insight.us/index.php?id=71
Gray only data:
http://squish.paradice-insight.us/index.php?id=72
Gray with full alpha:
http://squish.paradice-insight.us/index.php?id=27

Getting a good ASTC coding takes a while, but's certainly competitive. Site is heavily under construction, but I can't hold back anymore. :p
 
Well, in K.R.Rao "Digital Video Image Quality and Perceptual Coding", its clear that PSNR is crap.
Yes. I knew PSNR (which is just RMSE in sheep's clothing) was less than stellar when I was doing texture compression research back at, err, the turn of the century, but didn't have time to look for anything better.
At least use BD-PSNR/BD-Rate, PSNR-HVS-M or DSSIM.
Thanks for these. I'll forward them on to a colleague who's been busy researching quality metrics. He may have already seen all of these, but it can't hurt to check.
 
Overview of compression formats:
http://squish.paradice-insight.us/index.php?id=67

ASTC vs. BPTC vs. S3TC vs. ETC1 vs. ETC2 vs. PVRTC1 vs. PVRTC2 vs. FXT1 (8bit)
Colour only data:
http://squish.paradice-insight.us/index.php?id=26
Colour with 1bit alpha:
http://squish.paradice-insight.us/index.php?id=73
Colour with full alpha:
http://squish.paradice-insight.us/index.php?id=71
Gray only data:
http://squish.paradice-insight.us/index.php?id=72
Gray with full alpha:
http://squish.paradice-insight.us/index.php?id=27

Getting a good ASTC coding takes a while, but's certainly competitive. Site is heavily under construction, but I can't hold back anymore. :p

I'll take a proper look at these when I get a chance.

I take it that the source image data comes from here? If so...it looks like one should optimise one's codec for blue :) FWIW I don't like the "Reduction algorithm" section on that page. Box filter==p. poor.

I have just a couple of quick comments/requests:
  • Any chance that the tables can list the bit rate?
  • The "4 Slow PVRTC-compression takes around 1 to 3 hours per texture" footnote seems to relate to ETC results. Which is it meant to be?
 
Nice web site and numbers! Now we just need to get prunedtree's BC7 compressor released :smile:

Thank you.
I take closed source competitors if he trusts me enough, ISO board members haven't complained yet. :)

I take it that the source image data comes from here? If so...it looks like one should optimise one's codec for blue :) FWIW I don't like the "Reduction algorithm" section on that page. Box filter==p. poor.

Right.
Well, better than tuning your coder to the 12bit PCD crap of the Kodak set. The Fruit-tables are awesome for testing! And anyway, I even think bluish is kind of good in the face of 0.21/0.7/0.09 perceptual optimizers, otherwise the ones which kill (the small amount of) blue, win.
BTW, you can get individual results when you click a thumbnail in the corpus-gallery (on my site, not theirs), which you get clicking on the corpus-name besides the plus, the right grey one is the link to stats about them.

I have just a couple of quick comments/requests:
  • Any chance that the tables can list the bit rate?
  • The "4 Slow PVRTC-compression takes around 1 to 3 hours per texture" footnote seems to relate to ETC results. Which is it meant to be?

Bit rate is constant over most of the results, it's the subversive background-image in the middle of the groups of results ... oh, you want 4bpp instead of 1:6? Is that more helpful? When I start adding 16bit results it may be quite nice ... 1:12 ... wow.

It refers to the ETC as well as the PVRTC coder of the PVR SDK, but I think primarily the PVRTC coder (which is even absent sometimes, seems it takes too long :LOL: ), hmm maybe I add another word or two. :) Anyway, slow, is slow like hell!
 
Last edited by a moderator:
Well, better than tuning your coder to the 12bit PCD crap of the Kodak set.
Sorry but what's the problem with the Kodak set?
Bit rate is constant over most of the results, it's the subversive background-image in the middle of the groups of results ...
Oh I see. Oh that's far to subtle for me. You need to put it garish, flashing neon lighting. ;). Seriously though, I think it needs to be a bit more obvious - even breaks in the table would be useful.
BTW, you can get individual results when you click a thumbnail in the corpus-gallery (on my site, not theirs), which you get clicking on the corpus-name besides the plus, the right grey one is the link to stats about them.
You're "hiding your light under a bushel" :) A great feature that needs to be advertised!

But now I am really confused. How are you measuring the RMSE? For example, take image #30. Your table says the (R)MSE for S3TC with the AMD Compressonator is =4.52 but when I tried it (with version 1.30.1084) I got 7.80 which I doubled checked against our own differencing tool. Something is definitely wrong.

It refers to the ETC as well as the PVRTC coder of the PVR SDK, but I think primarily the PVRTC coder (which is even absent sometimes, seems it takes too long :LOL: ), hmm maybe I add another word or two. :) Anyway, slow, is slow like hell!
ETC I'd understand because it's probably using the ETC1 reference encoder source, but for PVRTC what version encoder and what sort of system are you running it on?

I know the PVRTC research compressor still needs work but, for example, with image 030 of the 1200x1200 colour set, it took around 154 CPU-seconds ( e.g. ~20sec on a shared, multi-core Linux machine) on a "higher than PVRTextool's high" quality setting. Mind you, this version is slightly newer than the public SDK.
 
Sorry but what's the problem with the Kodak set?

http://en.wikipedia.org/wiki/Photo_CD#Image_format
The images of Kodak are not raw camera photos or dia-positive scans. They have been back converted from PCD, you definitely can see the quantization noise in the contrast with the naked eye.

Just see here where the pictures come from!:
http://r0k.us/graphics/kodak/
http://www.math.purdue.edu/~lucier/PHOTO_CD/
http://www.math.purdue.edu/~lucier/PHOTO_CD/TIFF_IMAGES/README

More later, being at work. :)
Request/Suggestion: Maybe we spawn the posts about compression into a new thread, I think we're going to go on a while about it. :) Thanks.
 
Last edited by a moderator:
http://en.wikipedia.org/wiki/Photo_CD#Image_format
The images of Kodak are not raw camera photos or dia-positive scans.
But at some stage they (or at least some) were scans from ektachrome, weren't they?
They have been back converted from PCD, you definitely can see the quantization noise in the contrast with the naked eye.
Read through some of the links . I see some loss will have occurred in the all the transformations. Still, I quite like the images as a test set <shrug>.

Request/Suggestion: Maybe we spawn the posts about compression into a new thread, I think we're going to go on a while about it.
I'll see if I can do that now.
 
But at some stage they (or at least some) were scans from ektachrome, weren't they?

I suppose, it's Kodak marketing material after all. And they seem to be from long before digital cameras. But then, I wouldn't be so sure that old scanner technology holds up to our current standards.

Read through some of the links . I see some loss will have occurred in the all the transformations. Still, I quite like the images as a test set <shrug>.

I like Lena too, but I'd never use her picture to proof something to someone. The Kodak images are also too small for putting them into the same group as "common representable subset" of 2000+.
I don't want to convert anyone, I'm just saying. :)

I'll see if I can do that now.

Cool, thanks.
 
I suppose, it's Kodak marketing material after all. And they seem to be from long before digital cameras. But then, I wouldn't be so sure that old scanner technology holds up to our current standards.
Oops/ I misread - it's ektar (film) not ektachrome (slide), though I guess it doesn't makes a lot of difference if they scan off the negative. It was also 25 speed film so shouldn't have much in the way of grain.

I like Lena too, but I'd never use her picture to proof something to someone. The Kodak images are also too small for putting them into the same group as "common representable subset" of 2000+. I don't want to convert anyone, I'm just saying. :)
I think the fact that 'Lena' is a noisy image makes things interesting. Not all textures are photographs or at least photographs taken with top-of-the-line DSLRs.
Cool, thanks.
No worries.
 
I like Lena too, but I'd never use her picture to proof something to someone. The Kodak images are also too small for putting them into the same group as "common representable subset" of 2000+.
I don't want to convert anyone, I'm just saying. :)

The main effect of using larger pictures is that you, in general, get less detail per pixel - which makes things easier for every encoder and reduces the differences between them (the range between the highest-quality and lowest-quality in your RGB tests - excluding obviously-broken stuff - seems to be about 7 dB as measured by PSNR, while for the old Kodak images the difference is on the order of about 15 dB).

If you want something representative of 2000+, then I would suggest the Ryzom Asset Repository (http://media.ryzom.com/) which contains a large collection of actual (uncompressed) textures developed for an actual commercial game - which I would expect to be a lot more representative of textures than random photography collections. (Ryzom is an MMORPG that was originally released as a commercial game in 2004; unlike, say, World of Warcraft, it never really took off, and as a result, its art assets - including several thousand textures - eventually ended up being released under a creative-commons licence).

As for the blue stuff that you and SimonF discussed, I note it as an example of psychovisuals in action: in the case of the blue color channel, the human eye is very sensitive to hue but quite insensitive to detail. As a result, getting blue right for any individual pixel is rather unimportant, especially in high-detail regions, but getting the regional average right for any region (of a sufficiently large size) of the image is very important. The so-called "perceptual" 0.21/0.7/0.09 weighting captures the per-pixel unimportance well but completely fails to capture the importance of regional average, with occasional bad results.
 
Back
Top