Population Std.Dev. vs Std. Dev and Variance

markgoldman

New member
Joined
Aug 10, 2010
Messages
6
I understand that for the Population Std. Dev. the denominator is N, but why is the denominator in the Std. Dev. of a sample N-1? Perhaps I don't understand the underlying derivation of the formula for the sample--but why N-1 vs say N-2, etc.? In a small sample the difference between N-1 and N-another number would make a significant difference; and of course in a large sample the difference between N-1 and N in the denominator is very small---so I'm really not grasping the reason for, or source of N-1 in the formula.

Second, if I am permitted 2 questions, I am not experience in using this board--if the Std.Dev is simply the sq. root of variance, then what additional information is conveyed by reporting the variance of a sample or population when the Std. Dev is already known, or visa versa? Why doesn't one of the two measurements fall into to disuse?

Please forgive the very basic questions, and if they are inappropriate to this forum I apologize. I am a very long time out of stat classes and find now when trying to use some statistics for pleasure that questions are occurring to me that I wish I had asked in college.
 
Not a basic question at all. Actually, the denominator in the division related to variance and standard deviation is motivated by a more advanced concept called "bias". Dividing by "n" produces a biased estimater and "n-1" produces an unbiased estimator. Feel free to look up these terms. You may not be ready for them, but it may also be a worthwhile discovery.

Variance is sufficient for many things. In particular, it adds more easily when combining random variables in linear combinations. However, probabilities around a mean are often defined in terms of the standard deviation. You need both, depending on what you are doing.
 
markgoldman said:
I understand that for the Population Std. Dev. the denominator is N, but why is the denominator in the Std. Dev. of a sample N-1? Perhaps I don't understand the underlying derivation of the formula for the sample--but why N-1 vs say N-2, etc.? In a small sample the difference between N-1 and N-another number would make a significant difference; and of course in a large sample the difference between N-1 and N in the denominator is very small---so I'm really not grasping the reason for, or source of N-1 in the formula.

Second, if I am permitted 2 questions, I am not experience in using this board--if the Std.Dev is simply the sq. root of variance, then what additional information is conveyed by reporting the variance of a sample or population when the Std. Dev is already known, or visa versa? Why doesn't one of the two measurements fall into to disuse?

Please forgive the very basic questions, and if they are inappropriate to this forum I apologize. I am a very long time out of stat classes and find now when trying to use some statistics for pleasure that questions are occurring to me that I wish I had asked in college.

Second question first. The variance is easier to use in certain computations. But it is not in the same units as the mean (remember it is squared). So if the mean is 2.4 children, and the variance is 0.81 children squared, the variance is not easily comprehended in practical terms (although when children are being noisy, it sometimes SEEMS that they have been squared). But 0.9 children can be compared directly to the 2.4 children of the mean. Does that make sense?

First question second. I suspect that the only way to understand fully why it is (n - 1) rather than n or (n - 2) is to actually do the math behind the relationship between standard deviation estimated from a sample and the standard deviation measured from the entire population. (If I ever knew that math, it long ago slipped out of my head.) But intuitively it does not seem hard to grasp that the sample's standard deviation is calculated from the sample's mean, which may differ the population mean. That discrepency will carry through to the variance non-symmetrically, creating a bias, because all the terms in the variance are non-negative. The bias is not removed by taking the square root. The statisticians have proved that dividing by (n - 1) does remove the bias. As the sample size increases, the adjustment to remove bias effectively goes down, which also makes intuitive sense because the estimate of the population mean is getting better. After all if I sampled randomly 900 out of a population of 1,000, I'd be amazed if the sample mean was very different from the populations mean. If I sampled randomly 2 out of a population of 1000, I would not be surprised at all if the sample mean differed considerably from the population mean. The difference between dividing by 2 and by 1 is huge; the difference between dividing by 900 and 899 is minuscule. The adjustment is declining as the expected accuracy of the mean improves.The mathmaticians may cringe at my explanation, which is entirely intuitive and non-mathematical, but it may ease your doubts.
 
Top