**Introduction**

*r*or

*r*, and its values may range from -1 to +1. The value of the correlation coefficient from 0 to 1 is positive correlation and it designates proportional growth of values in both data sets. An example of positive correlation is the duration of diabetes mellitus and the degree of damage of eye capillaries. The longer the duration of the disease, the higher the damage to eye capillaries. The correlation coefficient value from 0 to -1 indicates negative correlation, i.e. a rise in the value of one variable that is proportional to a decline in the value of the other; e.g. oxygen concentration in the air drops with the rise in altitude above sea level. Perfect correlations, i.e. the values of the coefficient of correlation r = ± 1 are not characteristical for biological systems and most frequently refer to theoretical models. The zero value of the coefficient of correlation indicates absence of linear correlation, i.e. by knowing the values of one variable, we can conclude nothing on the values of the other. Thus, for instance, if we observe the correlation between the size of the pupil of the eye and calcium ion concentration in the blood, we can conclude that there is no correlation, i.e. each size of the pupil could be associated to any calcium ion concentration (understandably, within physiological limits) (2).

_{p}_{s}) or rank correlation is calculated when one of the data sets is on ordinal scale, or when data distribution significantly deviates from normal distribution and data are available that considerably diverge from most of those measured (outliers) (3). Linear correlation, implied by the Pearson’s coefficient of correlation, is not required for the Spearman’s correlation coefficient which can also be calculated for small samples (N<35). In case of r

_{s}= 0, it may be concluded that there is no actual correlation between variables (1).

**Conditions for calculating correlation**

**Question:**Is it correct to calculate the Pearson’s correlation coefficient for the degree of burns on the body and the duration of hospitalization expressed by the number of days?

**Answer**: It is not correct.

**Explanation:**Initial step in calculating correlation is to check if the measured data meet the conditions for calculating the Pearson’s correlation. The degree of burns on the body can be ranked on a scale from 1 to 4; such data are categorical (classifying subjects in predefined “classes”) and they follow an ordinal scale. The duration of hospital therapy expressed in the number of days is on a ratio scale and is suitable for calculating the Pearson’s correlation coefficient if the other variable is on an interval or ratio scale. The Pearson’s coefficient of correlation can be calculated only if the following conditions are met: the data for both examined variables are on an interval or ratio scale, the data for at least one variable have normal, i.e symmetrical distribution, the examined sample is large (N > 35), and the condition of linear correlation is met, which may be read from a scatterplot (1).

**Interpretation and significance of the coefficient of correlation**

**Question:**In a study of correlation between the mood and the amount of liquid consumed by daily drinking, the correlation r = 0.12;

*P*= 0.003 was obtained. Is it correct to conclude that there is a significant correlation between the mood and the amount of the consumed liquid?

**Answer:**It is not correct.

**Explanation:**After calculating the coefficient of correlation, it is important to know how to interpret the result, that is, the real meaning of the correlation coefficient. In presenting the results of correlation, the coefficient of correlation “r” should be expressed by a number with two decimal places, and the significance of the coefficient of correlation “P” in a number with three decimal places (4). If the coefficient of correlation is significant in regard to the set limit of significance (commonly

*P*< 0.05), we may conclude that the coefficient of correlation is significant and may be interpreted. If the value is

*P*> 0.05, we can conclude that the coefficient of correlation is not significant and in this case it may not be interpreted regardless of its value. When interpreting the value of the corrrelation coefficient, the same rules are valid for both Pearson’s and Spearman’s coefficient, and r values from 0 to 0.25 or from 0 to -0.25 are commonly regarded to indicate the absence of correlation, whereas r values from 0.25 to 0.50 or from -0.25 to -0.50 point to poor correlation between variables. r values ranging from 0.50 to 0.75 or -0.50 to -0.75 indicate moderate to good correlation, and r values from 0.75 to 1 or from -0.75 to -1 point to very good to excellent correlation between the variables (1).

**High value of the correlation coefficient**

**Question:**The correlation value obtained in a study of correlation between body height and biological age was r = 0.97. May we conclude that height and age are definitely excellently correlated?

**Answer**: No, at least not beyond doubt.

**Interpretation**: If the correlation coefficient calculated for biological variables is r > 0.95, an error in measurement and sampling or possible alteration of measured results should be suspected. Due to natural variety of biological systems, it is virtually impossible to obtain such a high correlation coefficient if measurements have been done correctly (representative sample, sufficiently sensitive instrument, etc.) (1). The type of data collected by measurements and processed statistically should always be taken into account. For example, if comparison is made of the values of glucose measured in a series of blood samples by two different instruments, i.e. biochemical analyzers, the coefficient of correlation may be expected to be very high (even up to r = 0.99), which in this case indicates good agreement between the two instruments.

**Correlation and causal relationship**

**Question:**r = 0.78 and

*P*= 0.002 were determined in a study of correlation between blood alcohol level and traffic accidents. Are we allowed to conclude that alcohol consumption is the cause of traffic accidents, i.e. that the observed traffic accidents are the consequence of alcohol consumption?

**Answer:**No, we are not.

**Explanation:**Correlation provides information on association rather than a cause- and-effect relationship between variables. Thus, if there is a high correlation between alcohol consumption and traffic accidents, we may not conclude that one variable affects the other, i.e. that alcohol consumption causes traffic accidents. It is possible that increased amount of alcohol causes the increased number of accidents, yet there is a possibility of a considerable effect of other uninvestigated factors or rare events (7,8). In the example described above, these factors or events could be road condition, proper operation of a vehicle, potential illness of a driver unrelated to alcohol, action of other pharmacologically active substances, and the like.

**The strength of correlation**

**Question:**By comparing catalytic concentration of two enzymes in the blood, the correlation r = 0.52;

*P*= 0.002 was obtained. Can we conclude that enzyme values share 52% of catalytic concentration values?

**Answer:**No, we cannot.

**Explanation:**The coefficient of correlation is not a measure of the strength of correlation. The correlation coefficient value r = 0.52 cannot be interpreted as 52% correlation, i.e. 52% of the joint values for the two catalytic enzyme concentrations. The proportion of shared values, i.e. the strength of linear correlation is expressed by the coefficient of determination. The coefficient of determination is calculated simply by squaring the correlation coefficient, and is denoted by r

^{2}. It can be calculated only for the Pearson’s correlation (3). Therefore the strength of correlation (coefficient of determination) in this example is r

^{2 }= 0.52 x 0.52 = 0.27, ie. the catalytic concentrations of two enzymes share 27% of common values. Twice as high correlation does not imply the twofold strength of correlation; e.g., if the correlation was r

_{1}= 0.26, the strength of correlation would be r

_{1}

^{2}= 0.07 (7%); also, it would be r

_{2}

^{2}= 0.27 (27%) for the twofold higher correlation, r

_{2}= 0.52.

**Comparison of two correlation coefficients with the same properties on two subject samples**

**Question:**Correlation between the time spent at computer work and the speed of typing a text into computer has been examined for women (N

_{1}= 60) and men (N

_{2}= 40). The coefficient of correlation, for women is r

_{1}= 0.70 and for men r

_{2}= 0.50: both are statistically significant. Can we conclude that r

_{1}> r

_{2}, i.e. that the correlation between the time spent at computer work and computer typing speed is higher in women?

**Answer:**No, we cannot.

**Explanation:**The two coefficients of correlation should by no means be directly compared but the significance of difference between the correlations for two data sets should be examined. The procedure of establishing the significance of the difference between two coefficients of correlation takes into account the value of correlation coefficients and the size of both samples (8).