One of the misuses of statistical terminology that annoys me most is the use of the word “correlation” to describe any variable that increases as another variable increases. This monotonic trend seems worth looking for, but it plainly is not what most people discover when they use standard correlation coefficients. This is because the Pearson product moment correlation coefficient, which is usually the only correlation coefficient students learn to calculate, is strongly biased towards linear trends: those in which a variable y is a noisy linear function of a variable x. Only the Spearman correlation coefficient, which is usually not taught to students, actually detects a general monotonic trend. You can see this for yourself easily by seeing what the correlation coefficient is between **x** and progressively higher-degree polynomials in **x**.

If the Pearson correlation coefficient actually detected monotonic trends, it wouldn’t plunge to zero as the degree of the polynomial in x increases. This is precisely what the Spearman correlation coefficient does.

I hope that we can reconcile our intuitive thinking and our statistical practice by ending the self-contradiction in which the word “correlation” is used in discourse to describe the behavior of an ideal Spearman correlation coefficient, while in practice correlations are computed using Pearson’s formula.

when is it more appropriate to use pearson instead of spearman, or vice versa?

i am trying to figure out an online stats assignment and this is not clear to me at all.

thanks

If you think the relationship is linear, Pearson is better. If you don’t, Spearman is better.

I heared that, “If the data are not normally distributed one can use Spearman correlation coefficient.” Do you agree to this statement?

Well, it’s definitely true that the Pearson correlation coefficient has problems with certain types of non-normally distributed data. Which you should use really depends on the question you’re asking, though: if linearity is what you’re looking for, the Pearson correlation coefficient is worth using.

What does the number 6 that appears in the formula for Spearman’s Rho?

thank you very much.

The formula for Spearman is similar as for Pearson, just using the ranks of the observations instead of theirs values.

The formula with 6 in it is an easier way how to get the same result – it is just a different form of the same prescription, co no worries about the six.

i just want to know that when the same scores are converted into ranks, can it produce different correlations? As in one calculation, the answer of pearson correlation was +.004 and when it was converted into rank, the spearman correlation was -.10. Is this possible and why?

thanx

hi John, i found yr explaantion very helpful i wonder if u know where I can find this justification for spearman in academic journals?

i need to quote such references for my thesis.

my study is btw stigma and communication ard mental illness so it’s not a simple linear relationship i’m sure so i think spearmans the one for me.

i wil look ard too, but jus thot to ask if u knew any offhand.thanks!

– jen

Very good forum, so … my question now is. It is correct to say “there is a correletion between the variable X and variable Y” —–> if i had a significative Rho of a spearman ?¿. Or which would be the best way to present my results in a conference.

Thank you very much !

Hi JOhn, thanks for explanation above. Hope you can help me with below:-

What would be the best to identify correlation between sales of brands available in supermarkets?

Note:- I don’t have transaction data, just the total sales in a given week for two years. I am planning to time series analysis to find out correlation between brand hitting peaks at the same time and brands getting affected by it. Planning to Pearson first and then Spearman or Kindall later once the ranking is defined. Any thoughts on this would be much appreciated.

Jen,

Did you have any luck? I’m trying to find this too.

Suz

I really benefit from the article post. Much obliged.

My concern is just to know which is superior,between the spearman corelation coeficient and pearsman,and why is one superior over another,John pleas help.

I am using this test for my thesis titled Corelation bnetween mib I & IVD wit histology in meningioma. result of correaltion of each variable came to be 0.90 to 0.99. should i accept this result or my calculation may be rong.

Is it also true that when N<30, then instead of using Pearson, we should Spearman?

Is it also true that when N<30, then instead of using Pearson, we should use Spearman?

Pearson should be used for interval or ratio data. However when data is ordinal, you use Spearman’s correlation.

Spearman should also be used when monotonic curves is better than a straight line visual examination in scatterplots and when Pearon assumptions have been violated.

I want to examine the correlation between two quantitative data (average cover of a plant species and disturbance level of the vegetation express in term of distance). I used Pearson coefficient. Is it correct or Spearman coefficient is better for ma analysis?

Please reply with reference article.

My understanding is that if your data are normally distributed, use Pearson. If data not normally distributed, either normalize them through logrithmic transformation or use Spearman.

Thank you for your contribution.

My data were not normally distributed and I normalized them after “Arcsinus” transformation to normalise the proportion data. When I used Pearson correlation on normalised data, results did not show a significant correlation between average cover of the species and the disturbance level and Pearson coefficient was low 0.3 <r< 0.5. However, when I used Spearman test on the real data (without normalised), the results show a significant correlation. In addition, there is a strong correlation(O.5 <rs < 0.8) between the two factors.

What is the strength of the two correlation tests (Spearman and Pearson). Spearman is it stronger than Pearson? In my condition, do you suggest me to use Spearman or Pearson.

Thank you for your great contributions

Hie,

Thank you for a informative discussion. My question is I am trying to establish whether there is a relationship between HIV disclosure and HIV related-stigma, which of the two should i use.

Thank you

Hi!

As I know you should use Spearman when you have ordinal data and use Pearson when you have scale data.

What about if the two variable are different kinds?

nominal-ordinal ?

nominal-scale ?

ordinal-scale ?