question about the medical test paradox and bayes rule

eric beans

Junior Member
Joined
Sep 17, 2019
Messages
72
So I was watching the video on youtube on bayes rule as it applies to the medical text paradox.

The problem presented is this:
" A 50 year old woman, no symptoms participates in a routine mammography screening. She tests positive, is alarmed and wants to know from you whether she has breast cancer for certain or what her chances are. Apart from the screening result you know nothing else about this woman.

Doctors were then told that the prevalence of breast cancer for women of this age is about 1%, and then to suppose that the test sensitivity is 90% and that its specificity was 91%. They were then asked: "How many women who test positive actually have breast cancer?"

I still don't understand why it would not be 9 out of 10.

If out of a sample of 100, 10 test positive, that means 9 were true positives and 1 was a false positive. So why would the answer not be 9 out of 10? What am I not getting?

The answer they said was actually 1 out of 10. I understand how to use the bayes factor to multiply to the prior to get the answer but I still don't get it. Maybe I'm stuck on the language of the question.
 
Last edited:
Of the people who DO have the disease 9 out of ten test positive. But you need to look at the overall number of people both who have it and don't have it who test positive. Then you can determine what percent of those who test positive actually have it. It is a small percentage because there are just a large number of people who don't have it who will get a false positive. Try to fill out the following table which assumes a sample of 1000 women of the same age. Notice that only 10 of the women (1%) actually have breast cancer.


Have Breast CancerDon't Have Breast Cancertotals
Tests positive (+)
Tests negative (-)
totals109901000
 
Of the people who DO have the disease 9 out of ten test positive. But you need to look at the overall number of people both who have it and don't have it who test positive. Then you can determine what percent of those who test positive actually have it. It is a small percentage because there are just a large number of people who don't have it who will get a false positive. Try to fill out the following table which assumes a sample of 1000 women of the same age. Notice that only 10 of the women (1%) actually have breast cancer.


Have Breast CancerDon't Have Breast Cancertotals
Tests positive (+)9 (90% sensitivity)110
Tests negative (-)89901 (91% specificity)990
totals989021000
one thing i'm confused about as i'm filling this chart out is the actual people with the disease would be estimated at 98/1000 which is closer to 10%. But the prior said only 1% of the population had the disease. so I'm not sure how to make sense of the numbers.

the original wording of the question was "How many women who test positive actually have breast cancer?"
i think that was poorly worded question. The question sounded to me like it was asking: "How many women who test positive (i.e. that would be 10) actually have breast cancer? (i.e. that would be 9)"

I think what the person meant to ask was "What are the odds of women with the disease being detected?" that would be 9 out of 98 or close to 1 out of 10.

that's my thinking.... what do you think?

By the way, in the original post i realized i made a typo. it should have been out of 1000 not 100 people.
 
Last edited:
It is fundamentally a weighting problem.

We test 10,000 women, 100 have breast cancer and 9900 do not.

How many will test positive? 90% of the 100 women who do have breast cancer will test positive; that's 90. And 9% of the women who do not have breast cancer will test positive; that's 891. So the total number who test positive is 981. And what percentage actually have cancer?

90/981 or < 10%.

Remember: we do not know whether the woman has breast cancer or not. We just know the results of the test. The number of false positives is almost 10 times the number of true positives.

It is not that the test is bad. It means is that a positive result calls for further, more expensive or more debilitating tests. It is a screening test.

Turn it around. Suppose a woman gets a negative result. That will be

(100 - 90) + (9000 - 891) = 10 + 8109 = 8119. So the percentage of women who have breast cancer and are falsely reassured is

10/8119 < 0.2%. That is, fewer than 20 women out of 10000 who take the test will be missed for breast cancer.

The test is very unlikely to miss someone with breast cancer, and that makes it a great test. But it will generate a lot of false positives.

[MATH]\text {P(positive result given cancer present)} = 0.9.[/MATH]
[MATH]\text {P(negative result given cancer present)} = 0.1.[/MATH]
[MATH]\text {P(positive result given cancer absent)} = 0.09.[/MATH]
[MATH]\text {P(negative result given cancer absent)} = 0.91.[/MATH]
Does that make sense? Remember that these are CONDITIONAL probabilities.

What are the underlying probabilities of cancer?

[MATH]\text {P(Cancer absent)} = 0.99.[/MATH]
[MATH]\text {P(Cancer present)} = 0.01.[/MATH]
It is the weights of 99 to 1 that undermines our intuition.

[MATH]\text {P(positive result and cancer present)} = 0.01 * 0.9 = 0.0090.[/MATH]
[MATH]\text {P(negative result and cancer present)} = 0.01 * 0.1 = 0.0010.[/MATH]
[MATH]\text {P(positive result and cancer absent)} = 0.99 * 0.09 = 0.0891.[/MATH]
[MATH]\text {P(negative result given cancer absent)} = 0.99 * 0.91 = 0.9009[/MATH]
[MATH]0.0090 + 0.0010 + 0.0891 + 0.9009 = 1.0000 \ \checkmark[/MATH]
So the probability of a positive result is [MATH]0.0090 + 0.0891 = 0.0981.[/MATH]
And the probability of cancer given a positive result is

[MATH]\dfrac{0.009}{0.0981} \approx 9.2\%.[/MATH]
I think the question was worded correctly, "Given a positive result, what is the probability that a specific woman actually has breast cancer." What we know with respect to this woman is her test results and that false positives are almost 10 times as common as true positives.
 
Last edited:
Alright well I was hoping you could find the answer on your own but there it is above. If you still don't understand I REALLY suggest filling out the table with some actual numbers of women so that it makes sense. It may help to know the difference between test sensitivity and specificity.

Test sensitivity is the probability someone with the disease gets a positive result.
Test specificity is the probably someone without the disease gets a negative result.

Use the 90% test sensitivity for the left column (the people who actually have cancer) and the 91% test speicificity for the right column (the people who do not).
 
None of your numbers work

1% of 1000 is 10, not 98. So your 89 makes no sense. 99% of 1000 is 990, not 902. So your 1 makes no sense

[MATH]\ \ 9 \ \ \ \ \ \ \ 89.1 \ \ \ \ \ \ \ \ 98.1\\ \ \ 1 \ \ \ \ \ 900.9 \ \ \ \ \ \ 901.9\\ 10 \ \ \ \ \ 990.0 \ \ \ \ 1000.0[/MATH]To avoid fractional people was the reason I used 10000. That gives the more intuitive table:

[MATH]\ \ 90 \ \ \ \ \ \ \ 891 \ \ \ \ \ \ \ \ 981\\ \ \ 10 \ \ \ \ \ 9009 \ \ \ \ \ \ 9019\\ 100 \ \ \ \ \ 9900 \ \ \ \ 10000[/MATH]So the probability of actually having breast cancer if you test positive is 90/981 < 10%.

Add across to get test results. Add down to get incidence of disease.
 
It is fundamentally a weighting problem.

We test 10,000 women, 100 have breast cancer and 9900 do not.

How many will test positive? 90% of the 100 women who do have breast cancer will test positive; that's 90. And 9% of the women who do not have breast cancer will test positive; that's 891. So the total number who test positive is 981. And what percentage actually have cancer?

90/981 or < 10%.

Remember: we do not know whether the woman has breast cancer or not. We just know the results of the test. The number of false positives is almost 10 times the number of true positives.

It is not that the test is bad. It means is that a positive result calls for further, more expensive or more debilitating tests. It is a screening test.

Turn it around. Suppose a woman gets a negative result. That will be

(100 - 90) + (9000 - 891) = 10 + 8109 = 8119. So the percentage of women who have breast cancer and are falsely reassured is

10/8119 < 0.2%. That is, fewer than 20 women out of 10000 who take the test will be missed for breast cancer.

The test is very unlikely to miss someone with breast cancer, and that makes it a great test. But it will generate a lot of false positives.

[MATH]\text {P(positive result given cancer present)} = 0.9.[/MATH]
[MATH]\text {P(negative result given cancer present)} = 0.1.[/MATH]
[MATH]\text {P(positive result given cancer absent)} = 0.09.[/MATH]
[MATH]\text {P(negative result given cancer absent)} = 0.91.[/MATH]
Does that make sense? Remember that these are CONDITIONAL probabilities.

What are the underlying probabilities of cancer?

[MATH]\text {P(Cancer absent)} = 0.99.[/MATH]
[MATH]\text {P(Cancer present)} = 0.01.[/MATH]
It is the weights of 99 to 1 that undermines our intuition.

[MATH]\text {P(positive result and cancer present)} = 0.01 * 0.9 = 0.0090.[/MATH]
[MATH]\text {P(negative result and cancer present)} = 0.01 * 0.1 = 0.0010.[/MATH]
[MATH]\text {P(positive result and cancer absent)} = 0.99 * 0.09 = 0.0891.[/MATH]
[MATH]\text {P(negative result given cancer absent)} = 0.99 * 0.91 = 0.9009[/MATH]
[MATH]0.0090 + 0.0010 + 0.0891 + 0.9009 = 1.0000 \ \checkmark[/MATH]
So the probability of a positive result is [MATH]0.0090 + 0.0891 = 0.0981.[/MATH]
And the probability of cancer given a positive result is

[MATH]\dfrac{0.009}{0.0981} \approx 9.2\%.[/MATH]
I think the question was worded correctly, "Given a positive result, what is the probability that a specific woman actually has breast cancer." What we know with respect to this woman is her test results and that false positives are almost 10 times as common as true positives.


1613596592219.png

Oh I think I got it. The confusion was in definitional. I thought "sensitivity" implied out of all the positives results. But it really meant out of only the ones that had the disease. Also, the term "specificity", which you need to figure out the total number of positives in the test. I never saw this stuff in high school algebra so this whole bayes thing is completely new to me. I'm trying to understand bayesian analysis from an intuitive and practical way. I'm a visual learner so I can never keep track of formulas. it's just easier to picture what's going on with ven diagram.


By the way, is there a special math term for the total positives (true positives and false positives) and total negatives (false negatives and true negatives)?

Is sensitivity another word for accuracy? Is specificity another word for precision?
 
Last edited:
Top