I saw on a YouTube video that to perform a Pearson Correlation Test, your data cannot include outliers. This is obviously true, because removing an outlier drastically changes the best fit line through the data points.
However, it didn't define what constitutes an outlier. I imagine there is some standard definition somewhere, perhaps related to standard deviations from the mean. Then again, maybe it depends on what you're trying to do with the data. In my case, all I'm doing is trying to put a number to how win percentage of particular players in a sport and their various statistical outputs correlate. If I was looking at American football, for example, and wanted to look at how wins and interceptions correlate, for example, Aaron Rodgers and his ridiculously low 4 interceptions would obviously be an outlier, since almost no one who played at least 16 games comes close.
So, how would "outlier" be defined for the purpose of finding how wins and particular stat outputs correlate?
However, it didn't define what constitutes an outlier. I imagine there is some standard definition somewhere, perhaps related to standard deviations from the mean. Then again, maybe it depends on what you're trying to do with the data. In my case, all I'm doing is trying to put a number to how win percentage of particular players in a sport and their various statistical outputs correlate. If I was looking at American football, for example, and wanted to look at how wins and interceptions correlate, for example, Aaron Rodgers and his ridiculously low 4 interceptions would obviously be an outlier, since almost no one who played at least 16 games comes close.
So, how would "outlier" be defined for the purpose of finding how wins and particular stat outputs correlate?
Last edited: