Individual Activity - Linear Correlation Coefficient

GOALS:

a. To understand qualitatively the correlation coefficient

b. To be able to calculate the correlation coefficient

c. To understand the limits of the correlation coefficient.

DISCUSSION:

The linear correlation coefficient (r) is a number between -1 and 1 which measures how close to a straight line a set of points falls. The closer to zero the correlation coefficient is, the less the points fall on a straight line (hence the term "linear" correlation coefficient).

Graph R equals 0

 

The closer the correlation coefficient is to one,
the more the points will fall along a line stretching from the lower left to the upper right.
The closer the correlation coefficient is to negative one, the more the points will fall along a line stretching from the upper left to the lower right.
Graph R close to 1 Graph R close to -1

CALCULATING THE CORRELATION COEFFICIENT:

- Consider a data set of N pairs of numbers

n x y
1 x1 y1
2  x2   y2
3  x3   y3
. . .
. . .
. . .
N xN yN

(i) The first step is to calculate the averages (<x> and <y>) of the x and y values:

   <x> = 1/N (x1 + x2 + x3 ...+ xN)

   <y> = 1/N (y1 + y2 + y3 ...+ yN)

(ii) Calculate the standard deviations (s x and s y) of the x and y data sets:

calculation 1

(iii) Calculate the covariance between the two data sets:

calculation 2

(iv) The correlation coefficient is then defined as:

calculation 3

EXAMPLE 1:

Calculate the correlation coefficient for the following x and y data sets: (Here is some graph paper in case you don't have any at home.)

n x y
1 1 4
2 2 4
3 3 3
4 4 2
5 5 1

In this case, the x variable corresponds to a constantly increasing parameter, such as time.

(i) Calculate <x> and <y>, average x and average y:

   <x> = 1/5 (1 + 2 + 3 + 4 + 5) = 3.0

   <y> = 1/5 (4 + 4 + 3 + 2 + 1) = 2.8

(ii) Calculate s x and s y, the standard deviation of x and y:

calculation 4

(iii) Calculate s xy, the covariance:

calculation 5

(iv) Calculate r, the correlation coefficient:

calculation 6

EXAMPLE 2: (Here is some graph paper in case you don't have any at home.)

n x y
1 2 2
2 3 3
3 3 2
4 3 1
5 4 2

(i) The first step is to calculate the averages (<x> and <y>) of the x and y values:

   <x> = 1/5 (2 + 3 + 3 + 3 + 4) = 3.0

   <y> = 1/5 (2 + 3 + 2 + 1 + 2) = 2.0

(ii) Calculate the standard deviations (s x and s y) of the x and y data sets:

calculation 7

(iii) Calculate s xy, the covariance:

calculation 8

(iv) Calculate r, the correlation coefficient:

calculation 9
Now see how well you understand what we've been teaching or you can return to the Activities Page.

Green Line

Last Modified: June 29, 2001