95% Confidence Intervals

Agent Smith · Sep 8, 2024

Applying statistical method A, I arrive at a 95% confidence interval (a, b) i.e if m is the mean of the population a < m < b.
What does this mean?

1. 95% of the time I apply method A, the population mean is contained in the computed interval
2. There's a 95% chance that the population mean m is in the interval (a, b)
3. There's a 95% chance that the interval (a, b) captures the population mean m

Pedja · Sep 8, 2024

The correct interpretation is:

3. There's a 95% chance that the interval (a, b) captures the population mean m.

Explanation
Confidence intervals are a way of expressing uncertainty about an estimate based on sample data. When we say a 95% confidence interval, we're saying that if we repeated the process of sampling from the population and applying method A many times, then 95% of those confidence intervals would contain the true population mean.

Dr.Peterson · Sep 8, 2024

Agent Smith said:
Applying statistical method A, I arrive at a 95% confidence interval (a, b) i.e if m is the mean of the population a < m < b.
What does this mean?

1. 95% of the time I apply method A, the population mean is contained in the computed interval
2. There's a 95% chance that the population mean m is in the interval (a, b)
3. There's a 95% chance that the interval (a, b) captures the population mean m

What's the difference between (1) and (3)?

Did the wording here come from a textbook or similar source, or is it your own?

Agent Smith · Sep 8, 2024

@Dr. Peterson, I don't know. They seem different because in 3 there's a specific interval (a, b).

Dr.Peterson · Sep 8, 2024

Agent Smith said:
@Dr. Peterson, I don't know. They seem different because in 3 there's a specific interval (a, b).

Again, did you write this, or is it a problem given to you? Is there an answer in the back of the book (or equivalent)? This will be very helpful.

The wording can be very subtle, and, not being a statistician, my impression is that there are different opinions; some people prefer different wordings. I've seen questions like this given to students I've tutored, and I would expect (1) to be considered better, if any, because the focus is on the method, not the particular interval; but I'm not sure. ChatGPT told me (3) was best, but didn't convince me. I'd like to see what others say.

Agent Smith · Sep 9, 2024

@Dr.Peterson si, I wrote this. It seems a reasonable formulation for possible interpretations of a 95% confidence interval. There's no "back of the book answer" unfortunately. I cobbled together a question from what I've read/heard on the topic. I'm taking a basic course in statistics right now. The lessons on confidence intervals are a bit sketchy (possibly this my own shortcomings rather than any fault of the content creators).

As of this moment I believe @Pedja nailed it.

I hope you'll forgive my very informal way of doing math, a subject known for its intense formalism.

mario99 · Sep 9, 2024

Agent Smith said:
@Dr.Peterson si, I wrote this. It seems a reasonable formulation for possible interpretations of a 95% confidence interval. There's no "back of the book answer" unfortunately. I cobbled together a question from what I've read/heard on the topic. I'm taking a basic course on statistics right now. The lessons on confidence intervals are a bit sketchy (possibly this my own shortcomings rather than any fault of the content creators).

As of this moment I believe @Pedja nailed it.

I hope you'll forgive my very informal way of doing math, a subject known for its intense formalism.

Now, I have just discovered why I cannot (And could not) understand your questions; it was because the author was always Agent Smith. Thanks a lot for letting me know that I was not stupid!

Agent Smith, you are really from another planet. Jupiter may be?

Agent Smith · Sep 9, 2024

mario99 said:
Now, I have just discovered why I cannot (And could not) understand your questions; it was because the author was always Agent Smith. Thanks a lot for letting me know that I was not stupid!

Agent Smith, you are really from another planet. Jupiter may be?

Unwittingly I may have been putting my own spin on these concepts/questions.

mario99 · Sep 9, 2024

Agent Smith said:
Unwittingly I may have been putting my own spin on these concepts/questions.

Don't worry about it. In my eyes, you are still a Genius. We have been waiting for so long to discuss another [imath]\infty[/imath] problem. I hope that it will be released this week!

Harry_the_cat · Sep 9, 2024

The correct interpretation is:

1. 95% of the time I apply method A, the population mean is contained in the computed interval.

In other words, if you apply method A 20 times, you would expect that 19 of the 20 CIs generated will include the population mean.

I believe that Pedja's explanation really agrees with statement 1 NOT statement 3.

Both:

2. There's a 95% chance that the population mean m is in the interval (a, b)
3. There's a 95% chance that the interval (a, b) captures the population mean m

are incorrect interpretations.

Unfortunately some textbooks incorrectly interpret it this way.

Here's my explanation (using 90% CI as an example):

Probably the best way to explain it is via an example. Take the example of tossing a die.
Let’s say you toss a die 10 000 times and record the number of 5s.

You would expect to get 1/6 x 10 000 ie about 1667 fives (because You KNOW that p=0.1667 but let’s pretend you don’t.)

Each time you do the experiment, you will probably get different values.

Let’s say you get 1200 fives out of 10 000 tosses. This gives you p-hat=1200/10000=0.12.

Your 90% confidence interval would then be:

[0.12-1.645xsqrt(0.12x0.88/10000) , 0.12+1.645xsqrt(0.12x0.88/10000)]

=[0.1147 , 0.1253]

So you can say you are 90% confident that the real value of p is in this interval. (That's why it is called a Confidence Interval.)

BUT it isn’t (remember p=0.1667) and 0.1667 does NOT lie between 0.1147 and 0.1253

This just happens to be bad luck. It’s not very likely that you would only get 1200 fives, but it is possible.

If you do this numerous times, 90% of your Cis WILL contain 0.1667, but 10% won’t.

In an example like this, we don’t really need to find a confidence interval for p, because we know, theoretically what it is. We usually find CIs when we don’t know what the population proportion or mean is.

But this example shows that you can’t say “The probability that the real proportion lies in the interval [0.1147, 0.1253] is 0.90” because it doesn’t.

The prob that the real p(ie 0.1667) lies in [0.1147, 0.1253] is actually 0, because it doesn’t.

If we happen to choose a sample where 0.1667 does lie in the interval we find (and that will happen 90% of the time), eg [0.15, 0.17], then the probability that the real p lies in that interval is 1, because it certainly does.

So, even in cases where we don’t know the population proportion, or can’t work it out theoretically, we still can’t equate the confidence level with the probability that it lies in the interval we find. That real p is out there somewhere, we just don’t know what it is. So, the prob that the real p lies in the CI is either 0 or 1.

I'm interested to see others' comments.

Harry_the_cat · Sep 9, 2024

Agent Smith said:
@Dr.Peterson... There's no "back of the book answer" unfortunately. ... The lessons on confidence intervals are a bit sketchy (possibly this my own shortcomings rather than any fault of the content creators).

Unfortunately, I have found that a lot of maths educators, especially at high school level, have a gap in their knowledge when it comes to statistics. In Australia at least, Confidence Intervals have recently been introduced into the syllabus for some senior mathematics (ie Years 11 and 12) courses.
So, many maths teachers (and good ones at that) are now teaching something they are learning themselves from textbooks, several of which have been written with incorrect interpretations. I say this because I don't think you are finding the lessons "sketchy" because of your "own shortcomings".

Dr.Peterson · Sep 9, 2024

Agent Smith said:
@Dr.Peterson si, I wrote this. It seems a reasonable formulation for possible interpretations of a 95% confidence interval. There's no "back of the book answer" unfortunately. I cobbled together a question from what I've read/heard on the topic. I'm taking a basic course in statistics right now. The lessons on confidence intervals are a bit sketchy (possibly this my own shortcomings rather than any fault of the content creators).

As of this moment I believe @Pedja nailed it.

I hope you'll forgive my very informal way of doing math, a subject known for its intense formalism.

The reason I asked whether you wrote the problem is that a good textbook will explain as @Harry_the_cat did, and then give a question much like yours, with one technically correct answer and two that look superficially equivalent but are not. But if you wrote all three options without input from anywhere, then the distinction between them might not be so clear. (I think it is, though! ChatGPT and @Pedja just made me doubt myself.)

Harry_the_cat said:
The correct interpretation is:

1. 95% of the time I apply method A, the population mean is contained in the computed interval.

In other words, if you apply method A 20 times, you would expect that 19 of the 20 CIs generated will include the population mean.

View attachment 38583

I believe that Pedja's explanation really agrees with statement 1 NOT statement 3.

Both:

2. There's a 95% chance that the population mean m is in the interval (a, b)
3. There's a 95% chance that the interval (a, b) captures the population mean m

are incorrect interpretations.

Unfortunately some textbooks incorrectly interpret it this way.

Here's my explanation (using 90% CI as an example):

Probably the best way to explain it is via an example. Take the example of tossing a die.
Let’s say you toss a die 10 000 times and record the number of 5s.

You would expect to get 1/6 x 10 000 ie about 1667 fives (because You KNOW that p=0.1667 but let’s pretend you don’t.)

Each time you do the experiment, you will probably get different values.

Let’s say you get 1200 fives out of 10 000 tosses. This gives you p-hat=1200/10000=0.12.

Your 90% confidence interval would then be:

[0.12-1.645xsqrt(0.12x0.88/10000) , 0.12+1.645xsqrt(0.12x0.88/10000)]

=[0.1147 , 0.1253]

So you can say you are 90% confident that the real value of p is in this interval. (That's why it is called a Confidence Interval.)

BUT it isn’t (remember p=0.1667) and 0.1667 does NOT lie between 0.1147 and 0.1253

This just happens to be bad luck. It’s not very likely that you would only get 1200 fives, but it is possible.

If you do this numerous times, 90% of your Cis WILL contain 0.1667, but 10% won’t.

In an example like this, we don’t really need to find a confidence interval for p, because we know, theoretically what it is. We usually find CIs when we don’t know what the population proportion or mean is.

But this example shows that you can’t say “The probability that the real proportion lies in the interval [0.1147, 0.1253] is 0.90” because it doesn’t.

The prob that the real p(ie 0.1667) lies in [0.1147, 0.1253] is actually 0, because it doesn’t.

If we happen to choose a sample where 0.1667 does lie in the interval we find (and that will happen 90% of the time), eg [0.15, 0.17], then the probability that the real p lies in that interval is 1, because it certainly does.

So, even in cases where we don’t know the population proportion, or can’t work it out theoretically, we still can’t equate the confidence level with the probability that it lies in the interval we find. That real p is out there somewhere, we just don’t know what it is. So, the prob that the real p lies in the CI is either 0 or 1.

I'm interested to see others' comments.

Thanks. That's exactly the explanation I've seen; I don't know why I didn't find one like that when I searched for a good source.

Now on looking again I find this which says something similar, but then adds,

Recall from the introductory section in the chapter on probability that, for some purposes, probability is best thought of as subjective. It is reasonable, although not required by the laws of probability, that one adopt a subjective probability of 0.95 that a 95% confidence interval, as typically computed, contains the parameter in question.

I also found this textbook with a good explanation:

7.5: Interpreting a Confidence Interval

stats.libretexts.org

Agent Smith · Sep 9, 2024

@Harry_the_cat nice! Gracias for the detailed explanation.

It's true that a confidence interval statement claims: The researcher is 95% confident that the true parameter lies in the interval computed from sample statistics. However, I can't say if it's me, the meaning of 95% confident is unclear. The Wikipage does go the extra mile to say that it does not mean there's a 95% probability that the true mean is contained in the given interval because the true mean is not a variable. However, from what you posted, the interval is (a variable). Wouldn't that mean a probabilistic interpretation would hinge on that (my option 3, which @Pedja says is correct)?

That out of the way, I can't but agree that interpretation 1 is the standard/official interpretation despite, to my reckoning, not saying anything about a given research result. There doesn't seem to be a connection between how a confidence interval is to be understood and any particular computed confidence interval. I mean here I have is a sample mean and a confidence interval (say \(\displaystyle \overline x = 3\) and \(\displaystyle s_x = 0.8\)) and I'm interested in making inferences about the population. I would feel better with interpretations 2 and 3 than 1 because they actually use the information I have).

Agent Smith · Sep 9, 2024

@Dr.Peterson

Agent Smith · Sep 9, 2024

Agent Smith · Sep 10, 2024

@Harry_the_cat

X: Shouldn't it be \(\displaystyle 2\) instead of \(\displaystyle 1.96\)???

Harry_the_cat · Sep 10, 2024

Agent Smith said:
@Harry_the_cat

View attachment 38589
X: Shouldn't it be \(\displaystyle 2\) instead of \(\displaystyle 1.96\)???

Why?

Agent Smith · Sep 10, 2024

Apologies, I skipped a step.

Agent Smith · Sep 12, 2024

@Harry_the_cat

my post

Agent Smith · Sep 15, 2024

I just went through my notes on statistics and I have a question.

So a 95% confidence interval, say (45, 60), is correctly understood as: repeat the procedure (say) a 100 times (which I suppose means work on a 100 samples), and of the 100 intervals computed thence, 95 will contain the true parameter (95%). This seems to concern itself with the "method" or procedure employed rather any particular interval, which (45, 60) is.

How does the above relate to the interval I computed viz. (45, 60)? I know I can't say that there's a 95% chance that the population parameter is in the interval (45, 60).

Another statement that isn't wrong is: We're 95% confident that the population parameter is captured by the interval (45, 60). I feel better with this because it utilizes the info that I (painstakingly) computed viz. the interval (45, 60), but ... I don't quite get what "95% confident" means?

Clarification requested.

95% Confidence Intervals

Full Member

New member

Elite Member

Full Member

Elite Member

Full Member

Full Member

Full Member

Full Member

Elite Member

Elite Member

Elite Member

Full Member

Full Member

Full Member

Full Member

Elite Member

Full Member

Full Member

Full Member