Thinking as a Hobby

I'm definitely not a mathematician, but I'm interested in structure and regularity, and have been long before I wrote the section of the SVS based on it as a value.

But I need some help here. I was reading Eric Baum's What is Thought?, a really interesting book. He talks about cognition as basically data compression and extrapolation. We discern regularity in the world and represent those regularities with the most compact representation that best fits the data (Occam's Razor). If the data closely fit a straight line, that's great. Most real-world data fits non-linear functions, like an S-curve, which are more complicated. If data points are scattered randomly, then they don't fit any function, so it is impossible to discern a pattern and encode it. Once you have a representation encoded, you can then make predictions about data points that you haven't actually observed yet. They'll either fit to the curve and reinforce your representation, or they'll fall somewhere outside the representation, in which case you'll either need to revise your representation or ignore the new data.

This all sounds good, and jives with my current thinking about thinking. It nicely explains a lot of things, such as us vs. them kinds of dichotomies, simple linearly-separable views of the world, like prejudices, that are more easily represented in cognitive systems.

But I'm having a bit of problem with randomness and something called minimum description length. It's the mathematical concept of the amount of regularity in a pattern, and the maximum compression of the pattern by the most minimal description.

Wikipedia says:

The length of the shortest program that outputs the data is called the Kolmogorov complexity of the data.

And here's the entry from Kolmogorov complexity:

In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object. For example consider the following two strings of length 64

0101010101010101010101010101010101010101010101010101010101010101

1100100001100001110111101110110011111010010000100101011110010110

The first string admits a short English language description, namely "32 repetitions of '01'", which consists of 20 characters. The second one has no obvious simple description other than writing down the string itself, which has 64 characters.

But I'm confused. I understand graphically how a random scattering of points doesn't fit to any linear or non-linear function. But I'm having a hard time with the bit-string example. If the second string could be produced by an algorithm allowing for an equal probability of each bit being either 0 or 1, which is the definition of random, then why can't we describe the second bit string as "random", which has a smaller description length than the one for the top bit string.

And algorithmically, would the length of the program that outputs the top string really be much shorter than the one that outputs the second one? Are algorithms that generate random strings that complex? Or is it condition that it has to output that particular string every time?

Is any of this making any sense?




	Thinking as a Hobby Home Get Email Updates LINKS JournalScan Email Me Admin Password Remember Me 3478325 Curiosities served Share on Facebook				2007-08-08 1:06 PM Minimum Description Length Previous Entry :: Next Entry Read/Post Comments (4) I'm definitely not a mathematician, but I'm interested in structure and regularity, and have been long before I wrote the section of the SVS based on it as a value. But I need some help here. I was reading Eric Baum's What is Thought?, a really interesting book. He talks about cognition as basically data compression and extrapolation. We discern regularity in the world and represent those regularities with the most compact representation that best fits the data (Occam's Razor). If the data closely fit a straight line, that's great. Most real-world data fits non-linear functions, like an S-curve, which are more complicated. If data points are scattered randomly, then they don't fit any function, so it is impossible to discern a pattern and encode it. Once you have a representation encoded, you can then make predictions about data points that you haven't actually observed yet. They'll either fit to the curve and reinforce your representation, or they'll fall somewhere outside the representation, in which case you'll either need to revise your representation or ignore the new data. This all sounds good, and jives with my current thinking about thinking. It nicely explains a lot of things, such as us vs. them kinds of dichotomies, simple linearly-separable views of the world, like prejudices, that are more easily represented in cognitive systems. But I'm having a bit of problem with randomness and something called minimum description length. It's the mathematical concept of the amount of regularity in a pattern, and the maximum compression of the pattern by the most minimal description. Wikipedia says: The length of the shortest program that outputs the data is called the Kolmogorov complexity of the data. And here's the entry from Kolmogorov complexity: In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object. For example consider the following two strings of length 64 0101010101010101010101010101010101010101010101010101010101010101 1100100001100001110111101110110011111010010000100101011110010110 The first string admits a short English language description, namely "32 repetitions of '01'", which consists of 20 characters. The second one has no obvious simple description other than writing down the string itself, which has 64 characters. But I'm confused. I understand graphically how a random scattering of points doesn't fit to any linear or non-linear function. But I'm having a hard time with the bit-string example. If the second string could be produced by an algorithm allowing for an equal probability of each bit being either 0 or 1, which is the definition of random, then why can't we describe the second bit string as "random", which has a smaller description length than the one for the top bit string. And algorithmically, would the length of the program that outputs the top string really be much shorter than the one that outputs the second one? Are algorithms that generate random strings that complex? Or is it condition that it has to output that particular string every time? Is any of this making any sense? Read/Post Comments (4) Previous Entry :: Next Entry Back to Top