[WW HOME] [TEACHING] [MATH] [SEARCH] [FEEDBACK]


Wind/Chill:
A Data Analysis Exercise and Assessment Vehicle

H. W. Straley

Introduction

The following exercise can be used as group activity, a test, a pair quiz, or some other assessment vehicle. The purpose of this paper is to present the problem, a model/solution, and a grading standard. Hints that may be given to student(s) will also be provided. Finally some problems associated with depending solely upon the Pearson Product-Moment Correlation Coefficient, commonly referred to as the correlation coefficient r, to determine the quality of a curve fit will be discussed.

Problem

We often hear the weather man tell us that the wind/chill factor brings the comfort temperature to -14 degrees Fahrenheit. Is there a formula that will generate this table? If so, what is this formula? What is the comfort temperature if the actual temperature is 12F° and the wind is 22 mph?

Table I is a wind/chill table (Source: World Almanac). To read the table note the wind speed on the horizontal axis and the actual temperature on the left vertical axis. The comfort temperature is the entry in the actual temperature row and the wind speed column. For example, if the actual temperature is 5F° and the wind speed is 20 mph then the comfort temperature is -31F°.

Wind speed 5 10 15 20 25 30 35 40 45

Temp F° 35 33 22 16 12 8 6 4 3 2

30 27 16 9 4 1 -2 -4 -5 -6

25 21 10 2 -3 -7 -10 -12 -13 -14

20 16 3 -5 -10 -15 -18 -20 -21 -22

15 12 -3 -11 -17 -22 -25 -27 -29 -30

10 7 -9 -18 -24 -29 -33 -35 -37 -38

5 0 -15 -25 -31 -36 -41 -43 -45 -46

0 -5 -22 -31 -39 -44 -49 -52 -53 -54

-5 -10 -27 -38 -46 -51 -56 -58 -60 -62

-10 -15 -34 -45 -53 -59 -64 -67 -69 -70

-15 -21 -40 -51 -60 -66 -71 -74 -76 -78

-20 -26 -46 -58 -67 -74 -79 -82 -84 -85

-25 -31 -52 -65 -74 -81 -86 -89 -92 -93

-30 -36 -58 -72 -81 -88 -93 -97 -100 -102

-35 -42 -64 -78 -88 -96 -101 -105 -107 -109

-40 -47 -71 -85 -95 -103 -109 -113 -115 -117

-45 -52 -77 -92 -103 -110 -116 -120 -123 -125

________________________________________________________________

Wind Chill Table

Comfort Temperature (F°) versus Actual Temperature (F°)

Table I

As you consider the above questions you should examine each of the regressions and the associated residuals. Pay close attention to the graphs of the actual data and the graphs of the residuals. Some interesting phenomena will occur. Describe these phenomena and tell what you think causes them.

When you have completed your analysis, write a formal paper discussing your research. You must also prepare a 10 minute oral presentation of your paper. The first draft of your paper is due December 15 and will be evaluated on content only. The final draft of your paper is due February 1 and will be evaluated equally for content and for grammar/spelling/style. If you are not satisfied with your final draft grade you will have one week to improve that grade by rewriting the paper. The oral presentation will occur during the week of March 1 and will be graded on both content and presentation.

Please note that your paper and your presentation are far more important than answers to specific numerical questions. Please defend all of your conclusions with either a plausibility argument or a proof.

Hints for Teacher to Give Student(s)

The teacher will likely want to give students hints when they begin to have trouble with this activity. In this way the student(s) do not give up when they are stumped nor are they penalized unduly for early errors. The teacher may also want to lower the grade a little, depending upon the hint that is provided. An early hint that helps student(s) get started is to tell them to hold the wind speed constant and find a regression equation that relates the actual temperature (x) and the comfort temperature (y).

A second hint suggests examining each of the above regression equations and their associated graphs for interesting phenomenon. The teacher may want to caution the student(s) to be sure they consider their residual graphs before finalizing any judgments.

Many groups will not know what to do after having determined the linear regression equations for each wind speed. In other words, they will have a formula y = mx + b for each wind speed, but will be confused as to their next step. They will not realize that they now need to determine m as a function of wind speed and to determine b as a function of wind speed. Regression techniques are necessary for these relations and these groups will need guidance in this direction. As the reader will note, these relationships are not linear.

Content of the Correct Analysis

Holding wind speed constant one finds that each set of points (A, C) where A is the actual temperature and C is the comfort temperature can be reasonably approximated by a linear function. These equations and selected other data are shown in Table II.

The student(s) should analyze the residuals for each of the above equations and produce regression graphs and residual graphs. The regression and residual graphs for the wind speed of 45 mph are shown in Figures 1 and 2.

Analysis of Figure 1 indicates that a linear relationship exists between the independent (actual temperature) and dependent (comfort temperature) variables. This figure is typical of the results for each of the wind speeds. The linearity conclusion is further supported by the respective regression coefficients. The residuals in Figure 2 are typical of the residuals for the other wind speeds and indicate that the error is due to round off. Hence, one can reasonably accept the linear conjecture.

error error/median

regression equation r range range wind speed

y= 1.06 x -4.65 1.00 1.88 0.19 5

y= 1.23 x -21.3 1.00 1.2 0.05 10

y= 1.34 x -31.35 1.00 1.00 0.03 15

y= 1.42 x -38.46 1.00 1.00 0.02 20

y= 1.48 x -43.9 1.00 1.21 0.02 25

y= 1.52 x -48.1 1.00 1.63 0.03 30

y= 1.55 x -50.72 1.00 1.79 0.03 35

y= 1.57 x -52.49 1.00 1.31 0.02 40

y= 1.59 x -53.77 1.00 1.14 0.02 45

_________________________________________________________

Linear Regression Equations for Each Wind Speed

Table II

It should be noted that the correlation coefficients are constant while the residuals vary quite a bit. In fact the range of residuals and the range of residuals divided by the median of the dependent variable do not seem to be correlated with the correlation coefficient. This surprising and interesting result will be discussed later in this paper. The teacher should caution students regarding the correlation coefficient and suggest they check their regression equations by other means as well.

Regression Graph of Actual Temperature (F°) versus Comfort

Temperature (F°) for wind speed of 45 mph.

Figure 1

Residual Graph of Actual Temperature (F°) versus Comfort

Temperature (F°) for wind speed of 45 mph.

Figure 2

The student should now analyze the data in Table II to determine the slope and y-intercept as a function of wind speed. Analysis of the slopes lead to the regression line M = .25lnW + .67 where W is the wind speed and M is the slope of the linear function C = MA - B, where A is the actual temperature and C is the comfort temperature. The correlation coefficient for this regression is 1.00. The graph of the original points and M = .25lnW + .67 are shown in Figure 3. The residual graph is shown in Figure 4.

Regression Equation, M = .25lnW + .67, For Slope As Function Of Wind Speed

Figure 3

Analysis of Figures 3 and 4 leads one to conclude that the logarithmic regression equation M = .25lnW + .67 is an exceptionally good predictor. The correlation coefficient is 1 and the fits are almost perfect. The range of the residuals is [-.02, .02] and the range of the residuals divided by the median M value (slope) is [-.01, .01]. The pattern of the residuals as shown in Figure 4 indicates round off error accounts for the differences.

Residuals Accompanying Figure 3

Figure 4

Analysis of the intercepts led to the regression line B = 22.95lnW - 31.28 where W is the wind speed and B is defined by the linear equation C = mA - B. The correlation coefficient for this regression is 1.00. The graph of the original points and B= 22.95lnW - 31.28 are shown in Figure 5. The residual graph is shown in Figure 6.

Regression Equation for B in C = mA - B

Regression Equation is B= 22.95lnW - 31.28, where W = wind speed

Figure 5

Analysis of Figures 5 and 6 leads one to conclude that the logarithmic regression equation B= 22.95lnW - 31.28 is also an excellent predictor. The correlation coefficient is 1.00 and the fits are almost perfect. The range of the residuals is [-2.32, 1.32] and the range of the residuals divided by the median y value (negative of y-intercept) is [-.05, .03]. The saw tooth pattern of the residuals as shown in Figure 6 indicates round off very likely accounts for the error. However, there is a curvature in the residual graph that is disconcerting. Other regressions were attempted including lnW vs. lnB (power) and ln(ln(W)) vs. lnB (exponential and logarithmic combination). Neither of these regressions led to acceptable results. The nature of the data also did not suggest a slide of the regression function. Hence, the above equation was accepted.

Residuals Accompanying Figure 5

Figure 6

We know C = mA + B where C is the comfort temperature, in F° and A is the actual temperature, in F°. We also know m = .25lnW + .67 where W is the wind speed in mph. In addition we have B = -22.95lnW + 31.28. Therefore,

C = (.25lnW + .67)A - 22.95lnW + 31.28.

Table III shows absolute differences between the given table values and the calculated values for each actual temperature and each wind speed value. The average absolute error is 1.03 degrees and the standard deviation of the absolute errors is .91. The middle temperature is -51 (i.e. the comfort temperature associated with the median actual temperature and the median wind speed). Since |1.03/-51| = .02 we conclude that the middle absolute error is about 2% which is quite acceptable.

One of the questions asked at the beginning of this paper was to find the comfort temperature if the actual temperature was 12 F° and the wind speed was 22 mph. Using our formula the comfort temperature would be -F°.

Wind speed 5 10 15 20 25 30 35 40 45

Temp •F 35 1 0 0 0 1 0 0 1 1

30 0 0 1 1 1 1 0 1 1

25 0 0 1 1 1 1 1 1 2

20 0 0 1 1 2 2 1 1 2

15 2 0 0 1 2 1 0 0 2

10 2 0 1 1 1 1 0 0 2

5 0 0 1 1 1 2 0 0 2

0 1 0 0 2 1 2 2 0 2

-5 1 1 0 1 1 2 0 1 2

-10 1 0 1 1 2 2 1 0 2

-15 1 0 0 1 1 1 0 1 2

-20 1 0 0 1 2 2 1 1 4

-25 1 1 0 1 2 1 0 1 4

-30 2 1 1 1 1 1 0 1 3

-35 1 1 0 1 2 1 0 2 4

-40 2 0 0 1 1 1 0 2 4

-45 2 1 1 2 1 1 0 2 4

__________________________________________________________________

Absolute Error Between Given Table Values (Table I) and Calculated Values

Table III

How Reliable Is the Correlation Coefficient?

The correlation coefficients, error ranges, and error ranges divided by the median dependent variable are given for each wind speed in Table IV. Examination of this data indicates the relationship between the correlation coefficient, r, and the error range deserves some attention. This same data was also included in Table II. Because all the values of r are 1.00 (to 2 decimal places) the Pearson Product-Moment Correlation was not be calculated. However, it is clear the relationship between the associated value of r (1.00 in each case) and both the error range and the error range divided by the median dependent variable is open to some question. This relationship can be estimated by the Spearman's Rank Order Correlation, R. R for r vs. error range is .5 and R for r vs. error range divided by median dependent variable is also .5. One can certainly conclude that it is not wise to rely solely on the correlation coefficient when evaluating the quality of a regression curve fit.

error error/median

r range range wind speed

1.00 1.88 0.19 5

1.00 1.2 0.05 10

1.00 1.00 0.03 15

1.00 1.00 0.02 20

1.00 1.21 0.02 25

1.00 1.63 0.03 30

1.00 1.79 0.03 35

1.00 1.31 0.02 40

1.00 1.14 0.02 45

____________________________________

Correlation Coefficients and Error Ranges for Each Wind Speed

Table IV

It is common practice to compute a regression line or curve and then to rely on the correlation coefficient, r, as a dependable measure of the goodness of the curve fit. The closer the absolute value of r is to 1 the better the fit. In other words the less error between the actual data and the regression equation. In truth this is not always the case. Consider the data shown in Table V.

data data

using reg. residuals using y = reg. residuals

x y=x+5±1 line r(x) x 10x+5±1 line r(x)

0 5 5.27 0.27 0 5 5.27 0.27

1 7 6.2 -0.8 1 16 15.2 -0.8

2 6 7.13 1.13 2 24 25.13 1.13

3 9 8.07 -0.93 3 36 35.07 -0.93

4 8 9 1 4 44 45 1

5 11 9.93 -1.07 5 56 54.93 -1.07

6 10 10.87 0.87 6 64 64.87 0.87

7 13 11.8 -1.21 7 76 74.8 -1.21

8 12 12.73 0.73 8 84 84.73 0.73

y = .93x + 5.27, r = .93333 y= 9.93x + 5.27, r = .99935

__________________________________________________________________

Regression Lines From Two Sets of Data Yielding Equal Residuals But Different r

Table V

The first and fifth columns display identical x values. The second column is a function of the second column, x, calculated using the following function:

The sixth column was a function of the fifth column, x, calculated using the following function:

The residuals are identical, yet the r values are different. Figure 7 shows the graph of y = f(x) and its regression line. Figure 8 shows the graph of y = g(x) and its regression line. Figure 9 is the residual graph for both of these functions. It is obvious that the correlation coefficient is not always a reliable measure of goodness of fit. One must concentrate on the residuals as the residuals are certainly the best measure of the error between the actual data and the regression equation.

If we define y = F(x) and y = G(x) as below we have an even more extreme case.

In this case we have r for F(x) = -.1826 and r for G(x) = .9998, yet the residuals are identical.

It would seem reasonable, given these two cases, to conclude that the differences in slopes account for these results and it does. If one plots the points for these last two functions, y = F(x) and y = G(x), one obtains the results shown in Figure 10. The regression line for the solid points, y = G(x), will have a much higher r than the light points, y = F(x), even though these sets of points have identical residuals.

y = f(x) And Its Regression Line

Figure 7

y = g(x) And Its Regression Line

Figure 8

Residuals For Both y = f(x) and y = g(x)

Figure 9

Demonstration Of Why Slope Effects Reliability of r

Figure 10

[WW HOME] [TEACHING] [MATH] [SEARCH] [FEEDBACK]


Woodrow Wilson Leadership Program in Mathematics * lpt@www.woodrow.org
The Woodrow Wilson National Fellowship Foundation * webmaster@woodrow.org
CN 5281, Princeton NJ 08543-5281 * Tel:(609)452-7007 * Fax:(609)452-0066