What does "outlier" mean for the purposes of a Pearson Correlation Test?

Thadriel

New member
Joined
Feb 25, 2022
Messages
20
I saw on a YouTube :eek: video that to perform a Pearson Correlation Test, your data cannot include outliers. This is obviously true, because removing an outlier drastically changes the best fit line through the data points.

However, it didn't define what constitutes an outlier. I imagine there is some standard definition somewhere, perhaps related to standard deviations from the mean. Then again, maybe it depends on what you're trying to do with the data. In my case, all I'm doing is trying to put a number to how win percentage of particular players in a sport and their various statistical outputs correlate. If I was looking at American football, for example, and wanted to look at how wins and interceptions correlate, for example, Aaron Rodgers and his ridiculously low 4 interceptions would obviously be an outlier, since almost no one who played at least 16 games comes close.

So, how would "outlier" be defined for the purpose of finding how wins and particular stat outputs correlate?
 
Last edited:
The Pearson’s correlation is indeed sensitive to outliers. A simple way to visually check is to plot the 2 variables against each other. If you cannot justify removing the outliers, consider non-parametric tests like Spearman’s or Kendall’s Tau correlation.
 
Thanks.

My issue is, and maybe it's a lack of understanding, but from what I gather, those other correlations are "ranked" correlations. And if I understand that, it means they disentangle your ordered pairs, and instead match them according to their respective rank among their sets.

If that is an inaccurate understanding, then I'll adjust my understanding of it with this forum's help. But if it's accurate, I'm wary about using them because I want to keep my ordered pairs together. Of course, I understand that for the purposes of finding correlation it may not matter if the ordered pairs remain tied together. But then again it might.

Sorry, but statistics is not something I have ever really studied (despite taking the intro class). In school, the teacher I had basically gave us one homework assignment with ten problems, we turned it in, it got graded and returned, and then there was a corresponding test that was exactly the same except different numbers. I confess that I showed up on homework day and test day, got a perfect score in the class, and learned basically nothing at all about statistics. I am shame. ?
 
Thanks.

My issue is, and maybe it's a lack of understanding, but from what I gather, those other correlations are "ranked" correlations. And if I understand that, it means they disentangle your ordered pairs, and instead match them according to their respective rank among their sets.

If that is an inaccurate understanding, then I'll adjust my understanding of it with this forum's help. But if it's accurate, I'm wary about using them because I want to keep my ordered pairs together. Of course, I understand that for the purposes of finding correlation it may not matter if the ordered pairs remain tied together. But then again it might.

Sorry, but statistics is not something I have ever really studied (despite taking the intro class). In school, the teacher I had basically gave us one homework assignment with ten problems, we turned it in, it got graded and returned, and then there was a corresponding test that was exactly the same except different numbers. I confess that I showed up on homework day and test day, got a perfect score in the class, and learned basically nothing at all about statistics. I am shame. ?
Pro tip: Always keep a copy of the original data and save as a new version every time you’re trying something new so you always come back to your previous attempt.
 
Top