[Insert adorable elephant photo.]
This post is about a linear regression lesson. Nothing Earth-shattering, but something to add to the list of data to analyze. In short: here is some data that is pretty linear, and the relationship is graspable by students. But since the data is only pretty linear, it forces us to have a discussion about outliers, error-bars and correlation coefficient. So, better to use data that is only sort of linear (r = +/- 0.6 sounds right) for linear regression.
First, some facts:
* An opossum pregnancy lasts about 15 days, and their children live for about a year.
* Dogs spend about 60 days gestating, and they can expect to live for 10 years.
* Elephants spend, on average, 624 days in the womb.
When I consider these three facts, I find myself really bugged by one question – how long does the elephant live?
The first thing we need is more info about the relationship between gestation and birth.
After that, we need to figure out a way of representing this info in a clean and easy to read way. Then we want to see if we can figure out any pattern that would help us predict how long an animal will live, depending on gestation period.
Here we hit a problem: There does seem to be a general tendency for animals to live longer when they spend more time gestating. But there’s no perfect pattern here. Some animals seem to fit the pattern, while others don’t.
And this is what many of my students find confusing about statistics. They’re used to dealing with perfect patterns and absolute relationships, but what do we do with half-patterns and tendencies?
We make an educated guess, and make sure that we are clear about how much we’re guessing. We make a guess, in this case, by drawing a line that more-or-less passes through the data points.
So what about elephants? If this line correctly describes the pattern, then we would expect elephants to live for about 40 years, on average. As it turns out, the average elephant lifespan is 40 years. In this case, at least, the guess is right on.