Exploring correlation coefficients

Probability

Full Member
Joined
Jan 26, 2012
Messages
431
I have plotted a graph (0,0), (1,2), (3,6), (4,8). The graph produced a linear rise. y = 2x and R^2 = 1.

Basically my understanding is that the closer packed the data is to the linear line then the coefficient becomes very close to or equals 1.

I was then asked to plot another graph (0,0), (1,-2), (3,-6), (4,-8). This is the same graph but the linear line is negative. Looking at the correlation coefficient R^2 = 1.

Now I'd of thought the coefficient R^ = 1 should have been R^ = - 1.

If the data point(s) are not on the regression line, the value of the correlation coefficient is not (- 1), but some value between (- 1) and (1).

When I add the correlation coefficient data to the graphs the value recorded shows as positive for both graphs. Should I take that to mean that the computer software program is not able to show a negative coefficient?

I'm using excel 2007.
 
When you write R^, do you mean R^2, or something else?

You are aware that the square of any real number is positive, right?

In your second example, R = -1, so R^2 = 1. Check what you were taught, reading carefully to distinguish what is said about R vs. R^2.
 
When you write R^, do you mean R^2, or something else?

You are aware that the square of any real number is positive, right?

In your second example, R = -1, so R^2 = 1. Check what you were taught, reading carefully to distinguish what is said about R vs. R^2.
Thanks Dr. Peterson, the R^ is a typo sorry, it should read R^2.

Yes I'm aware that real square numbers are positive.

In my second example graph above the computer shows R^2 = 1, yet the solution to the problem shows it as a correlation coefficient of R^2 = - 1, hence asking about the software ability to reverse the results?
 
Well once again you do not cite a text word for word. But it appears that either you or what you are reading has confused two different terms, namely coefficient of determination and coefficient of correlation. It is common to use r for a coefficient of correlation, and it has a range from minus one through plus one inclusive. It is common to use r^2 for a coefficient of determination. If the text you are looking at calls r^2 a coefficient of correlation, it is using non-standard terminology. If you are calling r^2 a coefficient of correlation, you are using non-standard terminology. In any case, confusion will abound.
 
PS A squared real number is always non-negative rather than positive though all but one is positive.
 
Well once again you do not cite a text word for word. But it appears that either you or what you are reading has confused two different terms, namely coefficient of determination and coefficient of correlation. It is common to use r for a coefficient of correlation, and it has a range from minus one through plus one inclusive. It is common to use r^2 for a coefficient of determination. If the text you are looking at calls r^2 a coefficient of correlation, it is using non-standard terminology. If you are calling r^2 a coefficient of correlation, you are using non-standard terminology. In any case, confusion will abound.
JeffM, I'm following the instructions given. The book is saying correlation coefficient. What I've said in post 1 is worded like this sentence, so either the author is saying something different to you, or as you say using a different terminology, however I'm not to know because this is again another new learning curve with information that I've not seen before.

So getting back to the question I was asking, when the graph is negative should the R^2 = - 1.

According to the solution given it should, but excel does not show it as R^2 = - 1
 
JeffM, I'm following the instructions given. The book is saying correlation coefficient. What I've said in post 1 is worded like this sentence, so either the author is saying something different to you, or as you say using a different terminology, however I'm not to know because this is again another new learning curve with information that I've not seen before.

So getting back to the question I was asking, when the graph is negative should the R^2 = - 1.

According to the solution given it should, but excel does not show it as R^2 = - 1
The point is that the [MATH]R^2[/MATH] you are seeing on Excel is not the correlation coefficient. Excel doesn't even show the correlation coefficient!

That is, the statement in the book is not about [MATH]r^2[/MATH], but about [MATH]r[/MATH].

Are you saying that the book said, word for word, "the correlation coefficient [MATH]R^2 = 1[/MATH]"? Please quote exactly what it says, in an image if possible.
 
JeffM, I'm following the instructions given. The book is saying correlation coefficient. What I've said in post 1 is worded like this sentence, so either the author is saying something different to you, or as you say using a different terminology, however I'm not to know because this is again another new learning curve with information that I've not seen before.

So getting back to the question I was asking, when the graph is negative should the R^2 = - 1.

According to the solution given it should, but excel does not show it as R^2 = - 1
You say:

".....when the graph is negative ........."​

Do you mean the "slope" of the graph is negative?

If that's what is meant, then you need to be careful about your "vocabulary"
 
The point is that the [MATH]R^2[/MATH] you are seeing on Excel is not the correlation coefficient. Excel doesn't even show the correlation coefficient!

That is, the statement in the book is not about [MATH]r^2[/MATH], but about [MATH]r[/MATH].

Are you saying that the book said, word for word, "the correlation coefficient [MATH]R^2 = 1[/MATH]"? Please quote exactly what it says, in an image if possible.
No, the book talks about the correlation coefficient being either -1 or 1, or close to it in the solutions provided. I think where all the confusion is occurring is from me using the computer software program to produce the graphs and then letting the computer add the equation, which it does correctly but then the computer also includes R^2 = 1.

I've interpreted that as the correlation coefficient 1 or - 1, however looking more into the subject it may not be so. Correct me if I'm wrong but it seems that the scatter plot graph I have put a trend line through the data points and depending upon how far away or close to the trend line the points are this R^2 = 1 value is something to do with that part of the math? The book does not get that involved saying it will be introduced if statistics are carried out in further studies.
 
The coefficient of correlation may not even be provided in a statistics package. The more statistically meaningful coefficient of determination will be provided. The two coefficients are related. The coefficient of determination is the SQUARE of the coefficient of correlation. The two are frequently abbreviated as

[MATH]r \text { and } r^2.[/MATH]
I do not use statistical packages often enough to remember all the vocabulary reliably. So I must review my texts before using the packages.

I know you think that we are often short with you. It would happen far less often were you to quote things exactly rather than giving your interpretation. It induces great frustration when it turns out you have used technical vocabulary in the wrong way. You send all of us, including yourself, down wrong paths. I am not asking that you know the technical terms at first. I am asking that you quote things completely and exactly. Centuries of effort have gone into creating a tightly defined mathematical vocabulary. When you substitute what you think words or symbols mean for the words and symbols themselves, everyone may get lost.

Here, you asked a sensible question about the coefficient of correlation but mixed it up with a question about the value given to you by a statistical package about the coefficient of determination. Had you told us, in the words of the authorities that you were relying on, why you were confused, you would have received quicker and less snippy answers. And you would have got an indication that the two coefficients are different but algebraically related and tell you different things. There are people here who have spent their careers on probability and statistics. If you see two authorities who seem to disagree, tell us exactly what each said. Someone here can dispose of the question in no time.
 
OK so there is a lot of confusion being introduced which is clouding the issue I had. Reading the book there is no mention of coefficient of determination, it might mean the same thing but I'm reading this subject for the first time, and the book says correlation coefficients. The information advises in the book that the regression line fits the data points, and depending upon how close those data points are to the regression line will then depend on the 'r' value. The excel package calculates the 'r' value from the equation of the regression line. As I'm understanding this 'r' value at the moment, no matter whether the slope of the regression line is positive or negative, the 'r' value is shown as 1 if the regression line is a good fit to the data points on the excel package.

If the data points are scattered from the regression line then the correlation coefficient will be closer to zero. That means 'r' = 0

Now from the book I'm to understand that a perfect positive correlation means that 'r' = 1, while

A perfect negative correlation means that 'r' = - 1.

As the plotted points on my negative sloping graph all lie on the regression line, then the data points are all very close to the regression line means to me that the 'r' value should read - 1 and not 1. In other words the data points are not scattered from the regression line on the negative sloping graph.

This correlation coefficient 'r' value on the computer is as I understand it saying that the value the computer calculates from the equation shows how close the data points are to the regression line, i.e. 0.99705, and on a negative sloping graph 'r' = - 0.99705

It's a pity we can't ask Francis Galton what it all means properly, he'd be the man for the job.
 
The problem seems to be that you are not paying attention to the fact that R^2 simply is not the correlation coefficient.

No, the book talks about the correlation coefficient being either -1 or 1, or close to it in the solutions provided. I think where all the confusion is occurring is from me using the computer software program to produce the graphs and then letting the computer add the equation, which it does correctly but then the computer also includes R^2 = 1.

I've interpreted that as the correlation coefficient 1 or - 1, however looking more into the subject it may not be so. Correct me if I'm wrong but it seems that the scatter plot graph I have put a trend line through the data points and depending upon how far away or close to the trend line the points are this R^2 = 1 value is something to do with that part of the math? The book does not get that involved saying it will be introduced if statistics are carried out in further studies.
Yes, the correlation coefficient, R (not R^2) is +1 for an exact line sloping upward, and -1 for an exact line sloping downward. The book is right.
... The excel package calculates the 'r' value from the equation of the regression line. As I'm understanding this 'r' value at the moment, no matter whether the slope of the regression line is positive or negative, the 'r' value is shown as 1 if the regression line is a good fit to the data points on the excel package.
...
As the plotted points on my negative sloping graph all lie on the regression line, then the data points are all very close to the regression line means to me that the 'r' value should read - 1 and not 1. In other words the data points are not scattered from the regression line on the negative sloping graph.

No, what Excel gives you is NOT the correlation coefficient. It does not show [MATH]r[/MATH] as 1; it shows the square of the correlation coefficient, [MATH]r^2[/MATH], which is +1 in both cases. Excel is right, too, but it is giving you the square of the number your book is talking about.

There is absolutely no conflict.
 
You pay NO attention to what you are told here. You cannot have a dialogue with a book or a software package, but you can with tutors here. Instead, you want to insist that your mistakes are justified by what you say someone else said except that you know that a true quotation will prove that it was never said. I am about to put you in the harpazoa box and not even register your questions.
OK so there is a lot of confusion being introduced which is clouding the issue I had. Reading the book there is no mention of coefficient of determination, it might mean the same thing
How in the world will we know what the book really says if you do not quote it? And no, you have been told at least twice that the coefficient of determination is not the same thing as the coefficient of correlation and that they are represented by different symbols. In your very first post, you introduced the symbol for the coefficient of determination. No one introduced any confusion except you by using a word and a symbol for a different word.

but I'm reading this subject for the first time, and the book says correlation coefficients. The information advises in the book that the regression line fits the data points,
Everyone here recognizes that you are introducing yourself to the material. So, as a complete newbie, why do you think your paraphrase of what the book says will be more informative than what the book in fact says? Quote the book. I'll bet you a lot of money that the book NEVER says r^2 for correlation coefficient. Show us a picture where it does. That is something you simply made up to try to justify your confusion.

and depending upon how close those data points are to the regression line will then depend on the 'r' value.
Logically, this reverses cause and effect. The r value determins nothing. It is determined by the closeness of fit and the slope of the regresion line. This is like saying that the meter stick determines how tall a child will grow.

The excel package calculates the 'r' value from the equation of the regression line.
Give us the screen shot of where excel did this. Or perhaps it showed a line that said r^2, and you simply paid no atention to the fact that your book said r and excel said r^2. I strongly suspect that is exactly what happened, but instead of admitting to a perfectly understandable oversight by someone just learning, you want us to agree that your minor but very confusing oversight is not your fault at all. It is the behavior of a six-year old.

As I'm understanding this 'r' value at the moment, no matter whether the slope of the regression line is positive or negative, the 'r' value is shown as 1 if the regression line is a good fit to the data points on the excel package.
You are understanding it wrong.

r and r^2 are different. Your book and Excel are talking about different things.

You read carelessly. You pay no attention to what people tell you, and, MOST IMPORTANTLY, you refuse, despite many, many requests. to actually quote word for word, symbol for symbol, what is confusing you.

This correlation coefficient 'r' value on the computer is as I understand it

Except, as you have been told now several times, you DON'T understand it because the computer is not talking about the coefficient of correlation. And you will not understand if you keep trying to defend your misunderstandings.
 
Last edited:
There are 2 different concepts. They are NOT the same thing. Excel gives R2 NOT R.
Correlation CoefficientCoefficient of Determination
-properly called Pearson's r correlation coefficient
- symbolised by r or R
- lies between -1 and 1
- if r=1, the dots will lie of an perfect upward sloping line - positive slope
- if r=-1, the dots will lie on a negative sloping line
-symbolised r2 or R2
-NEVER negative
- lies between 0 and 1
- equals (correlation coefficient)2, hence r2
-does not tell anything about the slope of the dots.
 
You pay NO attention to what you are told here. You cannot have a dialogue with a book or a software package, but you can with tutors here. Instead, you want to insist that your mistakes are justified by what you say someone else said except that you know that a true quotation will prove that it was never said. I am about to put you in the harpazoa box and not even register your questions.

How in the world will we know what the book really says if you do not quote it? And no, you have been told at least twice that the coefficient of determination is not the same thing as the coefficient of correlation and that they are represented by different symbols. In your very first post, you introduced the symbol for the coefficient of determination. No one introduced any confusion except you by using a word and a symbol for a different word.

JeffM, I don't even want to go there. Your blaming me now here for what you have introduced. I'll highlight it for you and explain it. I never introduced "Coefficient of determination" YOU DID. I advised that I did not know whether coefficient of determination and coefficient of correlation were the same thing, i'd no idea. YOU introduced a word "determination" not me. Yes R^2 on the computer is different than r and I did not recognise the difference concentrating on the difference between 1 and - 1, that was MY fault.



Everyone here recognizes that you are introducing yourself to the material. So, as a complete newbie, why do you think your paraphrase of what the book says will be more informative than what the book in fact says? Quote the book. I'll bet you a lot of money that the book NEVER says r^2 for correlation coefficient. Show us a picture where it does. That is something you simply made up to try to justify your confusion.


Logically, this reverses cause and effect. The r value determins nothing. It is determined by the closeness of fit and the slope of the regresion line. This is like saying that the meter stick determines how tall a child will grow.


Give us the screen shot of where excel did this. Or perhaps it showed a line that said r^2, and you simply paid no atention to the fact that your book said r and excel said r^2. I strongly suspect that is exactly what happened, but instead of admitting to a perfectly understandable oversight by someone just learning, you want us to agree that your minor but very confusing oversight is not your fault at all. It is the behavior of a six-year old.


You are understanding it wrong.

r and r^2 are different. Your book and Excel are talking about different things.

You read carelessly. You pay no attention to what people tell you, and, MOST IMPORTANTLY, you refuse, despite many, many requests. to actually quote word for word, symbol for symbol, what is confusing you.



Except, as you have been told now several times, you DON'T understand it because the computer is not talking about the coefficient of correlation. And you will not understand if you keep trying to defend your misunderstandings.
Now JeffM in one of your replies to my thread you admitted that you'd need to do some research on this topic to become fully familiar with it, and while I'm learning from first principles from a book, it is not easy to get the correct understanding all the time without guidance. That said, your not very patient with me, your becoming short tempered and if this is how you behave with your students, then it might be time to take a break.
 
There are 2 different concepts. They are NOT the same thing. Excel gives R2 NOT R.
Correlation CoefficientCoefficient of Determination
-properly called Pearson's r correlation coefficient
- symbolised by r or R
- lies between -1 and 1
- if r=1, the dots will lie of an perfect upward sloping line - positive slope
- if r=-1, the dots will lie on a negative sloping line
-symbolised r2 or R2
-NEVER negative
- lies between 0 and 1
- equals (correlation coefficient)2, hence r2
-does not tell anything about the slope of the dots.
Thank you, common sense to explain the understanding, clear as daylight.
 
Now JeffM in one of your replies to my thread you admitted that you'd need to do some research on this topic to become fully familiar with it, and while I'm learning from first principles from a book, it is not easy to get the correct understanding all the time without guidance. That said, your not very patient with me, your becoming short tempered and if this is how you behave with your students, then it might be time to take a break.
You say:

".... if this is how you behave with your students, then it might be time to take a break. ...."

You simply disregard your "inattentiveness" by saying "story of my life" or "my bad" and continue on doing the same thing again.

Jeff is one of the tutors here who provides "extremely" detailed explanation of mathematical steps - and you simply ignore those (such as difference between R and R2). I have NOT yet come across a version of Excel where the value of "R" is provided (without square).

Have you gone through a detailed calculation of R and R2 - from a data-set where the equation of the best-fit-curve is a polynomial of degree 2 or higher?

Do that - and come back to discuss!
 
You say:

".... if this is how you behave with your students, then it might be time to take a break. ...."

You simply disregard your "inattentiveness" by saying "story of my life" or "my bad" and continue on doing the same thing again.

Jeff is one of the tutors here who provides "extremely" detailed explanation of mathematical steps - and you simply ignore those (such as difference between R and R2). I have NOT yet come across a version of Excel where the value of "R" is provided (without square).

Have you gone through a detailed calculation of R and R2 - from a data-set where the equation of the best-fit-curve is a polynomial of degree 2 or higher?

Do that - and come back to discuss!
From a beginners point of view Jeff can become too involved and go over peoples heads too easily. I make many mistakes and am very open to admit them all. I'd no idea about the difference being shown regarding the 'r' and R^2, I'd not recognised it, taking the computer program as being correct just not understanding why it did not show a minus 1.

No I have not carried out any detailed studies of 'r' and R^2, that is not part of the written material in the book, in fact R^2 is not mentioned in the book at all, . and I didn't even pick up on that point between the book and the computer, I just asked why the computer did not show -1.

Post 15 explained all the understanding required in plain text. Sometimes there is a lot to be said for KISS when dealing with learners. Remember Basics, basics and more basics. I'm no experienced expert by any stretch of the imagination.
 
Top