Why can't I do this? I know it's impossible, but not sure why.

.... · Feb 28, 2008

Take 1Gb, that is about one billion bits(rounding it off.).

EDIT corrected calculations, thanks to ConayR's observation!

Now we split the large one billion digit base 2 number into 100 million 10 digit base 2 chunks in linear order. We take a base 1000 system, and convert each 10 digit chunk into a single base 1000(rounded off from 1024 but think of it as 1024 if you'd like) symbol. We end up with a linear sequence of 100 million 'base 1000' digits.

We take an HD picture, which is about 500KB, we send this 500KB pic, and have it transformed into a base 2 1Gb number.

Obviously it's impossible, but I'm not sure why.

Dominik D · Feb 28, 2008

And why is it supposed to be impossible other than 1000 base 2 digits can hold values larger than one 1000 base digit?

Just to make it easier to understand: 1000 base 2 digits can hold values from 0 to 2^1000 - 1. One 1000 base digit can hold values from 0 to 999. 1000-base digit can be represented on 10 bits (0..1023) 1000 base 2 digits can be represented on 1000 bits. Obviously you cannot squeeze 1000 bits into 10.

.... · Feb 28, 2008

True, the base is higher or we need smaller chunks, I'll recalculate in a moment.

Dominik D · Feb 28, 2008

Good luck. To squeeze 1000 bits in one digit you need 2^1000-base digit. 2^100 = 1267650600228229401496703205376 and it's already silly base for any number. But sure, go for 2^1000 if you want.

.... · Feb 28, 2008

EDITED OUT wrong calc.

Dominik D · Feb 28, 2008

The problem is you can put only 9 bits of binary data in base 1000 digit. But you need 10 bits to save base 1000 digit. You will end up using 11.(1)% more (file) space so you should work on your math a little bit, starting with multiplication, division and exponentiation.

And I'm not sure what's the point of this topic. No, you're not going to compress the data by changing its base.

.... · Feb 28, 2008

ConayR said:
The problem is you can put only 9 bits of binary data in base 1000 digit. But you need 10 bits to save base 1000 digit. You will end up using 11.(1)% more (file) space so you should work on your math a little bit, starting with multiplication, division and exponentiation.

And I'm not sure what's the point of this topic. No, you're not going to compress the data by changing its base.

I think it could be done in the real world by exploiting the limits in the number of colors possible in reality itself, and using that number as the base. How high does it go?

But I'll change the opening post to reflect an alternate mean, since higher bases can't properly be transfered digitally due to number of colors being stored in binary representations, unlike reality.

The other way I've heard is for example a compressed program's binary sequence can equal say a large prime number. You'd only have to send "look at prime number X in a publically available source, or calculate it!", and with that small sentence which is insignificant in terms of bytes you'll be sending the whole program! A more than 1000:1 lossless compression ratio.

Dominik D · Feb 28, 2008

Color is a representation of light frequency. It's all in your mind. Light is a wave. There is a range of visible lightwave frequencies. Lightwave frequency is not discreet. Therefore there is infinite amount of colors (lightwave frequencies) in the nature. Does that answer your question?

As for the second part - once again you're trying to represent more with less. Try to represent any two integer numbers from 0 to 1 (including 0 and 1) using only three different sentences. You cannot.

And I'm sorry to ask, but how old are you and where are you from? I remember learning all this basic stuff in public school when I was around 12.

.... · Feb 28, 2008

ConayR said:
Color is a representation of light frequency. It's all in your mind. Light is a wave. There is a range of visible lightwave frequencies. Lightwave frequency is not discreet. Therefore there is infinite amount of colors (lightwave frequencies) in the nature. Does that answer your question?

As for the second part - once again you're trying to represent more with less. Try to represent any two integer numbers from 0 to 1 (including 0 and 1) using only three different sentences. You cannot.

And I'm sorry to ask, but how old are you and where are you from? I remember learning all this basic stuff in public school when I was around 12.

I'm not sure colors are infinite, the quantum nature of interactions with matter, make me think it should quantize the colors too. Though I'm not sure, but given the idea that the universe might still be discrete and deterministic in the bottom, is alive and kicking, I would guess there may be limits.

Uhmmm, as for the second one, you can't dispell the second one. Search for illegal numbers, it has been done.

PellePlutt · Feb 28, 2008

... said:
I'm not sure colors are infinite, the quantum nature of interactions with matter, make me think it should quantize the colors too. Though I'm not sure, but given the idea that the universe might still be discrete and deterministic in the bottom, is alive and kicking, I would guess there may be limits.

Uhmmm, as for the second one, you can't dispell the second one. Search for illegal numbers, it has been done.

The basis of compression is that you have to choose some inputs to expand in order to be able to compress others.

As for 'illegal numbers', that is about showing that a random program can violate patents. I'm fairly confident they didn't find an existing program that happened to be a prime when represented in binary.

Despite being mystical and useful in other situations, prime numbers aren't more useful in this context than any other arbitrary sequence of numbers.

.... · Feb 28, 2008

PellePlutt said:
The basis of compression is that you have to choose some inputs to expand in order to be able to compress others.

I should say I mean in terms of physical space compression, assuming you can discern all the colors.

As for 'illegal numbers', that is about showing that a random program can violate patents. I'm fairly confident they didn't find an existing program that happened to be a prime when represented in binary.

Despite being mystical and useful in other situations, prime numbers aren't more useful in this context than any other arbitrary sequence of numbers.

Yet you can send the whole program by merely sending an email stating: " write prime number X in base 2 and open it with this app."

Obviously the intention was not to achieve massive compression, but in a way it was achieved anyway.

I'm glad you talked about numerical sequences, cause that reminded me of another way to get ridiculous levels of lossless compression, though ridiculously computationally intensive to compress, this one works for pure random data of any size, and is very easy to 'decompress'.

PellePlutt · Feb 28, 2008

... said:
I should say I mean in terms of physical space compression, assuming you can discern all the colors.

Yet you can send the whole program by merely sending an email stating: " write prime number X in base 2 and open it with this app."

Obviously the intention was not to achieve massive compression, but in a way it was achieved anyway.

I'm glad you talked about numerical sequences, cause that reminded me of another way to get ridiculous levels of lossless compression, though ridiculously computationally intensive to compress, this one works for pure random data of any size, and is very easy to 'decompress'.

Yes. My point was that the program had not been written previously. Someone went through all the prime numbers trying to find one that corresponded to some runnable program, that violated a patent. The resulting program was most likely completely useless. The prime number sequence is no more useful than the sequence 1, 2, 4, 8, .... for example.

Edit: on second thought, the program could just as easily have been found by other methods. The point was that it was likely not simply applied to an existing program.

As for your second claim, as I tried to explain in my previous post, compression of random data is impossible. For every bit you save on one possible input, you lose at least one bit on another possible input.

This is regardless of what algorithm you are using. You can google the counting argument, for example.

Davros · Feb 28, 2008

ConayR said:
Light is a wave.

Its a particle - apparently

.... · Feb 28, 2008

PellePlutt said:
Yes. My point was that the program had not been written previously. Someone went through all the prime numbers trying to find one that corresponded to some runnable program, that violated a patent. The resulting program was most likely completely useless. The prime number sequence is no more useful than the sequence 1, 2, 4, 8, .... for example.

Edit: on second thought, the program could just as easily have been found by other methods. The point was that it was likely not simply applied to an existing program.

That one example is not to say this can be done with all programs, but that it may be done with some. There's no disproving the fact that a particular program can be sent in such a small footprint.

A more general alternative, would be to have highly modular programs that use a standard very large common set of functions and algorithms, a larger more diverse standard library. And simply send the higher organization, with appropriate code. This way once a series of programs was transformed and coded appropriately, you could send 100s or 1000s or millions of different programs with a very small memory size, given most code already resides in the recipient computer(just send parameter adjustments, hierarchy info, and necessary code to tie together the algos and functions.)

As for your second claim, as I tried to explain in my previous post, compression of random data is impossible. For every bit you save on one possible input, you lose at least one bit on another possible input.

This is regardless of what algorithm you are using. You can google the counting argument, for example.

Googled it but couldn't find the relevant info, maybe it addresses the following.

There are infinite deterministic random sequences of numbers available. The statistical properties mean, that any specific sequence of numbers will be found within this sequence an infinite number of times, only thing being the larger the sequence the rarer it becomes.

Now it will definitely be prohibitive computationally at present, but it stands to reason that once found you only have to say. Start at digit number Xxxxx and end say 1 trillion~ digits to the right. Presto, you've just sent one Terabit.

Now the problem is obviously given a certain numerical sequence with these properties say pi, it is very likely the first digit of such sequence is very very far. If it is say exactly digit number one quadrillion. It's easy to process and extract with the formula for extracting digits. If the first sequence lands in a rougher numerical spot, it might be tough.

But given the sequences repeat infinitely, it is a certainty it will eventually begin in an easily summarizable number say number 1 quadrillion quadrillion quadrillion. How would that affect the performance of the pi individual digit calculation formula I don't know. But it seems very possible to send arbitrarily large sequence of numbers in a very small number of bits, only computationally extremely demanding to do so.

I've heard quantum computers can be exponentially faster in search problems, but not sure if it would help here. If it does, you'd search pi for the sequence you're after, and simply give the starting digit and give 1M or 1B or 1T as the size or number of digits to acquire after it.

PellePlutt · Feb 28, 2008

The counting argument indeed addresses any and all methods of compression. It is impossible, for example to design an algorithm that will compress all strings of length N to lengths smaller than N.

The reasoning is simple: you start out with 2^N distinct strings. There are 2^N -1 distinct strings of lengths < N. Any mapping from 2^N strings to 2^N-1 strings will not be reversible.

There are also stronger versions of this argument, but they are more difficult to prove.

Because of this, it is mathematically impossible for your scheme to do what it claims, regardless of how it works.

Now, for the sake of it, the problem with your method is that the "index" into the sequence of random numbers will not, on average, be smaller than the number you wish to compress.

See http://www.dogma.net/markn/FAQ.html#Q19 for a more thorough explanation of why it doesn't work.

.... · Feb 28, 2008

PellePlutt said:
The counting argument indeed addresses any and all methods of compression. It is impossible, for example to design an algorithm that will compress all strings of length N to lengths smaller than N.

The reasoning is simple: you start out with 2^N distinct strings. There are 2^N -1 distinct strings of lengths < N. Any mapping from 2^N strings to 2^N-1 strings will not be reversible.

There are also stronger versions of this argument, but they are more difficult to prove.

Because of this, it is mathematically impossible for your scheme to do what it claims, regardless of how it works.

Now, for the sake of it, the problem with your method is that the "index" into the sequence of random numbers will not, on average, be smaller than the number you wish to compress.

See http://www.dogma.net/markn/FAQ.html#Q19 for a more thorough explanation of why it doesn't work.

Oh but you said it exactly on AVERAGE!
If I intended to send the first 1 Quadrillion digits of pi in base 2, a Petabyte of data, all I have to send is the small formula and specify when to stop the count, which occupies a few KB at most.

I can likewise, give any sequence that starts at any power of any number or has other properties that allow you to compress the starting point to a small figure(Example start exactly at digit "1 Billion or 2^10^10^10 or 7^45^3 or 3^23^4" = easy to express starting points or etc of pi and procede to obtain the next 1 Million digits.)

If we had a modifiable universal formula for obtaining digits similar to the one available for pi, and an enumerable infinite list of trascendental numbers. We'd only have to say look at trascendental number X and start at digit X obtain the next following X digits.

This greatly increases the odds of finding the desired sequence within an acceptable round or power of something number.

PS

Further clarification on pi. If we had enough computing power we could go on moving on say powers of ten or of two or of 5, etc and search a ridiculous chunk of pi. The desired sequence is found an infinite number of times within the pi sequence, we only have to find one that lands on an easy to shrink number say(2^1000^1000^1000 that's a nice number, instead of some long sequence of digits.).

Dominik D · Feb 29, 2008

Sure on average. What's the point of "compression" if it can deal only with selected set of streams? If you know upfront what this set contains, sure you can use lookup table. But prime numbers are as good for that as any other set of number. The point is that it is NOT a compression schema, because you are not compressing data - you're representing some known set of data using other, same size set (size as in quantity, not complexity). How are you going to deal with the data you have never seen before?

I see no point in explaining you the basics any further. Basic math is available to everyone willing to understand it. Go and read something on compression too. Your claims are not revolutionary - there is one "smart" pseudoengineer like you doing this magic every day. All of them failed but if you think you're better - go for it. Prove your thing works, sell it, become a gazillionaire.

.... · Feb 29, 2008

ConayR said:
Sure on average. What's the point of "compression" if it can deal only with selected set of streams? If you know upfront what this set contains, sure you can use lookup table. But prime numbers are as good for that as any other set of number. The point is that it is NOT a compression schema, because you are not compressing data - you're representing some known set of data using other, same size set (size as in quantity, not complexity). How are you going to deal with the data you have never seen before?

I see no point in explaining you the basics any further. Basic math is available to everyone willing to understand it. Go and read something on compression too. Your claims are not revolutionary - there is one "smart" pseudoengineer like you doing this magic every day. All of them failed but if you think you're better - go for it. Prove your thing works, sell it, become a gazillionaire.

As I explained it is not viable computationally to search something like pi for large numerical sequences. That's a pretty good reason why this sort of thing wouldn't be used. And the pi example shows how you can deal with any arbitrary data once you know it.

3dilettante · Feb 29, 2008

... said:
Oh but you said it exactly on AVERAGE!
If I intended to send the first 1 Quadrillion digits of pi in base 2, a Petabyte of data, all I have to send is the small formula and specify when to stop the count, which occupies a few KB at most.

This would be an excellent compression method if all you want is the digits of pi up to a certain number.

I can likewise, give any sequence that starts at any power of any number or has other properties that allow you to compress the starting point to a small figure(Example start exactly at digit "1 Billion or 2^10^10^10 or 7^45^3 or 3^23^4" = easy to express starting points or etc of pi and procede to obtain the next 1 Million digits.)

Unless all you want are the exact digits of pi from digit N to digit M, this isn't doing much.

If we had a modifiable universal formula for obtaining digits similar to the one available for pi, and an enumerable infinite list of trascendental numbers. We'd only have to say look at trascendental number X and start at digit X obtain the next following X digits.

This greatly increases the odds of finding the desired sequence within an acceptable round or power of something number.

If you are still on a compression kick, this method is not capable of compression on all desired data.

The determination of whether compression is successfuls is if the size of the compressed data+the size of the data used by the compressor < size of the original data.

An infinite list of transcentental numbers you are basing your compressor on is obviously not size 0. The necessary data needed to delineate the infinite list must be infinite in size as well.

The next problem is that the value that selects the transcendental number for the compressor's basis and the stop and start values are of nonzero size.

You'll have to prove somehow that some arbitrary bit combination of arbitrary size cannot cause the selection, compressed data, and stop and start values's size from exceeding the original value in size.

Further clarification on pi. If we had enough computing power we could go on moving on say powers of ten or of two or of 5, etc and search a ridiculous chunk of pi. The desired sequence is found an infinite number of times within the pi sequence, we only have to find one that lands on an easy to shrink number say(2^1000^1000^1000 that's a nice number, instead of some long sequence of digits.).

You'll have to prove the sequence can always be addressed with start and stop counts whose bit size does not exceed the size of the sequence in the first place.

Dominik D · Feb 29, 2008

K, I lied. Here's a simple proof.

If you can represent any sequence of bits using start/stop index in PI sequence and size(start_index) + size(stop_index) < size(data) then how about doing this next:
Represent start and stop index using this "compression" schema! You'll end up with even less space. Do it again. And again. And again. Voila! You can compress anything in 1 bit! Perfect compression schema, all credit obviously goes to you.

@MfA below: Yes Sir! ;]

Why can't I do this? I know it's impossible, but not sure why.

Similar threads