Thinking as a Hobby

In my research, I've been thinking a lot more about causation and correlation in the context of learning.

Basically, correlation is when two things are linked together, though such a relationship doesn't convey much information. It's hot when the sun's out, so there's a correlation between hot days and sunshine, but we haven't said anything about cause and effect. There are four possiblities:

1) Heat on earth causes the sun to shine.
2) The sun causes heat on earth.
3) Some third factor (e.g. the moon) could cause both the sun to shine and the heat on earth at the same time.
4) It could just be a coincidence.

To describe a causal relationship, we're being much more specific, so when we say the answer is #2, we're eliminating the other options and providing more information about the specific relationship between the sun shining and hot days.

A common mistake in science (and everyday life) is to confuse correlation and causation, to assume that just because two things occur together that there is a causal relationship. A famous example of this is the correlation between the decline of worldwide piracy and global warming. Over the past 200 years, piracy has decreased, while average global temperatures have risen. This doesn't mean there's a relationship between these two things. It's a pretty fun game to play, actually. The human population has increased while the average size of computers has decreased.

So how do we distinguish broad correlation from specific causation? Well, that's one of the main things scientists do. In a good experiment, there are two groups, which are roughly equal in terms of all factors except the one you're interested in (e.g. a drug). So you have a control group, where everything is held constant, and the experimental group, where everything is the same except for the introduction of the variable of interest. If you see a difference between the two groups, and you've set up the experiment reasonably well, you can chalk up some evidence that the variable you tweaked had a causal role in the difference.

Following the drug example, if you had two groups of cancer patients, who were similar in their distribution of other factors like age, gender, etc., and you gave drug A to the experimental group and a placebo to the control group, then if there was a difference between the survival rates of the two groups over time (especially a large difference), you would be justified in inferring that the drug probably increases survival rates. But you have to be careful, and make sure the experiment is done right: that it is properly randomized, that you have tried to reduce bias as much as possible, that there is not some other uncontrolled variable that is really the cause, etc. The best way to do this is by having lots of other scientists try to repeat the experiment and see if they find the same thing.

So what does this have to do with learning and neural networks?

In my model, I'm using a form of learning known as Hebbian learning, named after Donald Hebb. It is a form of unsupervised learning, which just means that there is no error signal feeding back to let the system know if it's right or wrong. The basic idea is that if one neuron that is connected to another neuron repeatedly causes it to fire, then the effectiveness of the synapse is increased. Colloquially this is described as "neurons that fire together wire together".

Ah, but there's the problem. The way Hebb described it was as a causal relationship. The colloquialism broadens the relationship out to a simple correlation.

I've been thinking about this distinction because I think it bears on my decision about what type of artificial neurons to use. You may or may not know that a real neuron actually transmits action potentials, or "spikes". Whether or not a neuron "fires" at a given point in time depends on the incoming information to that neuron. So, if neuron A's axon terminated on neuron B, in what cases would it make sense to strengthen their connection? The most widely used artificial neurons are not spiking models, but represent the average firing rate of a neuron using a function like the logistic sigmoid, an S-shaped curve.

Using a simple Hebb rule, with these simpler, non-spiking neurons, we would increase the connection weight between two neurons if they happen to be active at the same time. But this is learning a correlation, not necessarily causation. To more precisely learn causal relationships, it may be necessary to use spiking models, and only increase the connection weight when neuron A fires just before neuron B. This model of learning is called Spike Timing Dependent Plasticity.

This may all seem a bit archaic, especially if you're not familiar with the concepts, but what I'm basically saying is that when we use a simpler neural network model comprised of non-spiking neurons, we may be using models that are more prone to making mistakes confusing correlation with causation.

For example, I remember reading about how birds will make incorrect causal inferences about feeding times. A bird that sits in different places in its cage may just happen to be sitting in a particular corner when its owner puts food in the bin. It may then make an incorrect association between sitting in that particular corner and feeding time, so that it will tend to sit more in that particular corner when its hungry, hoping that its behavior will cause the act of feeding.

We see this in people all the time as well, and one good example are streaks in sports. A pitcher who has won five games straight may continue to wear the same pair of socks for the sixth game, thinking perhaps that the socks are "lucky" and play some sort of causal role in the continuing streak.

In a more discrete temporal example, if two billiard balls (A and B) are rolling toward ball C from different angles, we all know from common experience that the first ball to hit C will cause it to move. But if ball A actually hits C first, and then ball B hits C right around the same time, a network that is averages out the temporal resolution and the order of events may be more prone to ascribe the initial movement of ball C to the actions of ball B.

So one of the things I might do very early on in the modeling process is look at differences in learning sequences between spiking and non-spiking models. My intuition is that spiking models using spike timing dependent plasticity should be able to learn causal chains faster and better than non-spiking models.

And right now I'm not sure if there is existing literature on such a comparison. Either way, if I look into this, I'll let you know.




	Thinking as a Hobby Home Get Email Updates LINKS JournalScan Email Me Admin Password Remember Me 3478438 Curiosities served Share on Facebook				2007-11-26 9:54 AM Correlation, Causality, and Neural Nets Previous Entry :: Next Entry Read/Post Comments (2) In my research, I've been thinking a lot more about causation and correlation in the context of learning. Basically, correlation is when two things are linked together, though such a relationship doesn't convey much information. It's hot when the sun's out, so there's a correlation between hot days and sunshine, but we haven't said anything about cause and effect. There are four possiblities: 1) Heat on earth causes the sun to shine. 2) The sun causes heat on earth. 3) Some third factor (e.g. the moon) could cause both the sun to shine and the heat on earth at the same time. 4) It could just be a coincidence. To describe a causal relationship, we're being much more specific, so when we say the answer is #2, we're eliminating the other options and providing more information about the specific relationship between the sun shining and hot days. A common mistake in science (and everyday life) is to confuse correlation and causation, to assume that just because two things occur together that there is a causal relationship. A famous example of this is the correlation between the decline of worldwide piracy and global warming. Over the past 200 years, piracy has decreased, while average global temperatures have risen. This doesn't mean there's a relationship between these two things. It's a pretty fun game to play, actually. The human population has increased while the average size of computers has decreased. So how do we distinguish broad correlation from specific causation? Well, that's one of the main things scientists do. In a good experiment, there are two groups, which are roughly equal in terms of all factors except the one you're interested in (e.g. a drug). So you have a control group, where everything is held constant, and the experimental group, where everything is the same except for the introduction of the variable of interest. If you see a difference between the two groups, and you've set up the experiment reasonably well, you can chalk up some evidence that the variable you tweaked had a causal role in the difference. Following the drug example, if you had two groups of cancer patients, who were similar in their distribution of other factors like age, gender, etc., and you gave drug A to the experimental group and a placebo to the control group, then if there was a difference between the survival rates of the two groups over time (especially a large difference), you would be justified in inferring that the drug probably increases survival rates. But you have to be careful, and make sure the experiment is done right: that it is properly randomized, that you have tried to reduce bias as much as possible, that there is not some other uncontrolled variable that is really the cause, etc. The best way to do this is by having lots of other scientists try to repeat the experiment and see if they find the same thing. So what does this have to do with learning and neural networks? In my model, I'm using a form of learning known as Hebbian learning, named after Donald Hebb. It is a form of unsupervised learning, which just means that there is no error signal feeding back to let the system know if it's right or wrong. The basic idea is that if one neuron that is connected to another neuron repeatedly causes it to fire, then the effectiveness of the synapse is increased. Colloquially this is described as "neurons that fire together wire together". Ah, but there's the problem. The way Hebb described it was as a causal relationship. The colloquialism broadens the relationship out to a simple correlation. I've been thinking about this distinction because I think it bears on my decision about what type of artificial neurons to use. You may or may not know that a real neuron actually transmits action potentials, or "spikes". Whether or not a neuron "fires" at a given point in time depends on the incoming information to that neuron. So, if neuron A's axon terminated on neuron B, in what cases would it make sense to strengthen their connection? The most widely used artificial neurons are not spiking models, but represent the average firing rate of a neuron using a function like the logistic sigmoid, an S-shaped curve. Using a simple Hebb rule, with these simpler, non-spiking neurons, we would increase the connection weight between two neurons if they happen to be active at the same time. But this is learning a correlation, not necessarily causation. To more precisely learn causal relationships, it may be necessary to use spiking models, and only increase the connection weight when neuron A fires just before neuron B. This model of learning is called Spike Timing Dependent Plasticity. This may all seem a bit archaic, especially if you're not familiar with the concepts, but what I'm basically saying is that when we use a simpler neural network model comprised of non-spiking neurons, we may be using models that are more prone to making mistakes confusing correlation with causation. For example, I remember reading about how birds will make incorrect causal inferences about feeding times. A bird that sits in different places in its cage may just happen to be sitting in a particular corner when its owner puts food in the bin. It may then make an incorrect association between sitting in that particular corner and feeding time, so that it will tend to sit more in that particular corner when its hungry, hoping that its behavior will cause the act of feeding. We see this in people all the time as well, and one good example are streaks in sports. A pitcher who has won five games straight may continue to wear the same pair of socks for the sixth game, thinking perhaps that the socks are "lucky" and play some sort of causal role in the continuing streak. In a more discrete temporal example, if two billiard balls (A and B) are rolling toward ball C from different angles, we all know from common experience that the first ball to hit C will cause it to move. But if ball A actually hits C first, and then ball B hits C right around the same time, a network that is averages out the temporal resolution and the order of events may be more prone to ascribe the initial movement of ball C to the actions of ball B. So one of the things I might do very early on in the modeling process is look at differences in learning sequences between spiking and non-spiking models. My intuition is that spiking models using spike timing dependent plasticity should be able to learn causal chains faster and better than non-spiking models. And right now I'm not sure if there is existing literature on such a comparison. Either way, if I look into this, I'll let you know. Read/Post Comments (2) Previous Entry :: Next Entry Back to Top