The correct interpretation is:
1. 95% of the time I apply method A, the population mean is contained in the computed interval.
In other words, if you apply method A 20 times, you would expect that 19 of the 20 CIs generated will include the population mean.
View attachment 38583
I believe that Pedja's explanation really agrees with statement 1 NOT statement 3.
Both:
2. There's a 95% chance that the population mean m is in the interval (a, b)
3. There's a 95% chance that the interval (a, b) captures the population mean m
are incorrect interpretations.
Unfortunately some textbooks incorrectly interpret it this way.
Here's my explanation (using 90% CI as an example):
Probably the best way to explain it is via an example. Take the example of tossing a die.
Let’s say you toss a die 10 000 times and record the number of 5s.
You would expect to get 1/6 x 10 000 ie about 1667 fives (because You KNOW that p=0.1667 but let’s pretend you don’t.)
Each time you do the experiment, you will probably get different values.
Let’s say you get 1200 fives out of 10 000 tosses. This gives you p-hat=1200/10000=0.12.
Your 90% confidence interval would then be:
[0.12-1.645xsqrt(0.12x0.88/10000) , 0.12+1.645xsqrt(0.12x0.88/10000)]
=[0.1147 , 0.1253]
So you
can say you are 90% confident that the real value of p is in this interval. (That's why it is called a Confidence Interval.)
BUT it isn’t (remember p=0.1667) and 0.1667 does NOT lie between 0.1147 and 0.1253
This just happens to be bad luck. It’s not very likely that you would only get 1200 fives, but it
is possible.
If you do this numerous times, 90% of your Cis WILL contain 0.1667, but 10% won’t.
In an example like this, we don’t really need to find a confidence interval for p, because we know, theoretically what it is. We usually find CIs when we don’t know what the population proportion or mean is.
But this example shows that you
can’t say “The probability that the real proportion lies in the interval [0.1147, 0.1253] is 0.90” because it doesn’t.
The prob that the real p(ie 0.1667) lies in [0.1147, 0.1253]
is actually 0, because it doesn’t.
If we happen to choose a sample where 0.1667
does lie in the interval we find (and that will happen 90% of the time), eg [0.15, 0.17], then the probability that the real p lies in
that interval is 1, because it
certainly does.
So, even in cases where we
don’t know the population proportion, or can’t work it out theoretically, we still can’t equate the confidence level with the probability that it lies in the interval we find. That real p is out there somewhere, we just don’t know what it is. So, the prob that the real p lies in the CI is either 0 or 1.
I'm interested to see others' comments.