# Data Analysis in Algebra II

## Abstract

Data Analysis can and should be included at every level of the high school mathematics curriculum. As most traditional textbooks offer minor, if any, treatment of this topic, we present several lessons which are appropriate for inclusion in an Algebra II course. These lessons use sets of data which lead to linear, quadratic, inverse variation, and exponential functions, and can be used throughout the course as students acquire knowledge of the particular type of function. A sample lesson is given for each, with suggestions for further study, dependent upon the abilities of the students. We also include written sources for acquiring data, and data collection projects which can be done by students.

Any technological tool which is capable of receiving data sets, plotting them in a specified manner, and determining regression equations may be used. We use the TI-82 graphics calculator in these lessons, as it is readily available to students and teachers and is also very user-friendly. Several of the lessons also use spreadsheets, which can very clearly demonstrate, both in tabular and in graphical form, the manner in which variables change. Included are brief descriptions of how to use the statistics capabilities of the TI-82, and how to utilize a spreadsheet to do data analysis.

## Data Analysis on the TI-82

To input data sets into memory, select EDIT Edit.... The values of the independent variable should be placed in the column headed L1 and the values of the dependent variable in column L2. If you wish to sort the data with respect to the independent variable, press EDIT SortA( [L1] , [L2] ) . This sorts the data in L1 into ascending order, with L2 dependent. SortD( places the data in descending order.

To plot the data points, press Plot 1.... Turn on Plot 1, choose scatter plot as type, L1 for Xlist, L2 for Ylist, for Mark. Press ZOOM 9 (to select Zoom Stat), and the scatter plot will be graphed. Examine the scatter plot to determine the most appropriate form of the regression equation. The options which the TI-82 offers are: median-median, linear (ax+b), quadratic, cubic, quartic, linear (a+bx), logarithmic, exponential, and power. To obtain the regression equation, select s CALC Set-Up.... Xlist for 2-Var Stats should be L1, Ylist should be L2, and Freq should be 1. Again pressCALC, and select a regression equation. The values for the constants in the equation will be displayed. To graph this equation, press Statistics...EQ RegEQ, and then . Both the scatter plot and the regression equation will be displayed.

## Data Analysis on a Spreadsheet

Enter the values of the independent and dependent variables into two columns on the spreadsheet. Look at a scatter plot of the data to determine the type of function behavior this data represents. It may be necessary to re-express the values of the independent variable, in order to get an accurate and reasonable mathematical model of the data. If so, place these re-expressed values into another column, plot the independent variable values against the re-expressed values, and run a linear regression. Check that the residuals are small and random. If so, then the re-expression is a good one. Applying the inverse of the re-expression to the regression line gives the mathematical model. For an example of a template which can be used for re-expression and regression, see the example lesson on exponential growth.

## Median-Median Line

The process of finding the median-median line can be summarized as follows:

1) Divide the data into three groups, left to right, as evenly as possible.

2) Find the median "x" value and the median "y" value of the data in each of the three intervals, and call these points the summary points. It is not necessary that they be actual data points; they simply represent the median data values in each of the three data intervals.

3) Using the summary points in the first and third intervals, find the equation of the line passing through them using standard two-point techniques.

4) Using the equation of the line found in part three, find the predicted "y" value for the "x" of the middle summary point. The difference between the calculated or predicted "y" and the middle summary "y" is very important.

5) To complete the process simply slide the line from part 3 one-third of the difference we found in part 4. That is, change the y-intercept enough to spread the difference evenly over the three intervals.

In addition to being a manageable technique for high school students, this median-median line has the virtue of being resistant to outliers, data points whose extreme values are frequently, but not exclusively, caused by error.

## Example of Linear Data

This lesson is intended for use following a unit on linear functions. Students are expected to know how to graph linear functions by hand and to have a rudimentary knowledge of the TI-82.

We have the following data on family income versus percent of the group registered to vote in 1988. (Source of data: "World Eagle, The Monthly Social Studies Resource", February 1990, p.17.)

 income (\$) % registered 0-4999 47.6 5000-9999 52.8 10000-14999 57.4 15000-19999 63.2 20000-24999 67.4 25000-34999 71.9 35000-49999 77.9 50000- 81.8

Using all but the last piece of data (7 points) and using the lower endpoint in each income category, enter the data in L1 and L2 and plot. Then under Calc find the linear regression and plot it on the graph with the data.

Depending on your group, you may discuss the least squares method for deriving this line, or just open the discussion of what "line of best fit" might mean and how you might find it. Then, add in the last piece of data (making 8 points) and see how the line changes. Encourage the students to experiment with changing pieces of data to see what the effect on the line will be. As an extension, you might also introduce residuals at this point.

Figure 1 and Figure 2 show the scatter plots for the data (7 points and then 8 points). The respective equations for the linear regressions are y = 8.85x + 48.67 and y = 7.09x + 50.81.

This lesson is intended for use following a unit on quadratic functions. Students should be familiar with the various forms of quadratic functions, how these are derived (e.g. completing the square) and how the forms indicate shifts (e.g. y = 3(x + 2)2 - 5 is shifted 2 units to the left and 5 units down).

We begin with a very simple set of data. As usual, enter the data in L1 and L2 and plot. You should get something similar to Figure 1.

 x y 0 0 1 1 2 4.2 3 8.8 4 16.3 5 25.2 6 36.02 7 48.9 8 64 9 81.3 10 100

Now, use Calc to find the quadratic regression. Again, you may wish to discuss what kind of method(s) might produce such an equation. This equation should be approximately y = .99x2 + .02x +.01 and you can graph this by Y=5.Statistics, followed by two right arrows to enter EQ, then 7.RegEQ.

Here is the interesting part. Go to L3 at the very top of the list. We can enter an expression for the entries instead of filling them in individually. Enter L3 = and allow the entries to be calculated. Next graph the points from L1 and L3. Remember, you will need to change what plots are on and which lists are graphed under . What you will see is the data forming a linear plot. Why should this be so? Think of L1 as x and L2 as y. Then L3 is . We have:

y = .99x2 + .02x + .01 which is essentially just y = x2

so = = x which is a line.

Notice that if we find the linear regression for the x, data, we find = z = .99x + .01 which is essentially z = x. Encourage students to graph the lines to see how close they are. This process of "straightening" the data is called linearization. The main idea is that if you do not have a quadratic regression key (as on the TI-81) then you can guess at the basic form of the equation for the data, find an inverse (or close to this), and find the linear regression for the x values plotted against the inverse. If you get a line (or close to a line) then you guessed the form of the original equation correctly and you can find this equation by taking the inverse of the inverse, that is:

= .99x +.01 so then y = (.99x + .01)2 = .98x2 + .02x + .0001

This is very close to the quadratic regression equation we found with the quadratic regression key.

Next, let us consider a slightly more complex situation with real data. Start with the following data on fish caught in "joint venture catches" of the U.S. and other countries. (Data adapted from: "World Eagle, The Monthly Social Studies Resource", February 1990, p.18).

 years fish (since 1979) (in 10 M lbs.) 0 2.5 1 15 2 32 3 59.8 4 99.6 5 145 6 202 7 290 8 330

Entering this data and graphing we have something like Figure 2.

Now, without using the quadratic regression key, work through the example with the linearization technique. This data looks quadratic; could it be y = x2 ? A quick check on the calculator will show that this is not correct. Open the discussion to what other forms a quadratic might take. Responses should include y = a(x + b)2 + c (horizontal and vertical shift) and y = a(x + b)2 (only horizontal shift). This particular data does not look as if it has much vertical shift so lead the group to the hypothesize that y = a(x + b)2. Then:

= = x + b which is a line.

To find the line, and thus a and b, we again graph x versus , yielding a graph like Figure 3.

The linear regression for this data gives the equation = 2.12x +1.54 . Thus y = (2.12x + 1.54)2 = 4.49x2 + 6.53x + 2.37 is our estimate of a quadratic function that approximates the original data. (Note: we could also calculate the approximate values of a and b from the equations = 2.12 and b = 1.54 which gives a similar quadratic.) Finally it is time to use the quadratic regression key and see that it gives a similar approximating function: y = 4.43x2 + 7.17x + 1.38. These two lines may seem a bit different to the students, but graphing them along with the original data should convince them that the method is fairly good (and great, when you do not have a quadratic regression key!). More advanced groups may also want to do a further check. Complete the square of the function from the quadratic regression, yielding y = 4.43(x + .80)2 - 1.47. We see that this does have a small vertical shift, but small enough to ignore. So we have:

y = 4.43(x + .80)2 which implies = = 2.10x + 1.68

This is relatively close to our linear regression equation = 2.12x + 1.54 .

The issue of horizontal and vertical shift together provides an entirely separate lesson, because y = a(x + b)2 + c involves three parameters, for which the linear regression will only give two. The best place to begin then is estimating c by examining the original data, using = to find a and b and finally reversing the transformation. However, this can involve a lot of guessing and checking, unless you want to bring in some calculus methods. So be careful what you get into!

## An Extension

This lesson introduces the concept of the rate of change of a non-linear function. The technology employed is the TI-82. We use the function y = x2, with graphing window: Xmin = 0, Xmax = 2, Ymin = 0, Ymax = 4. Trace along the graph to any point and Zoom in. Go to EDIT and enter the x-value of the point into L1 by placing the cursor in the L1 column and entering "x". Return to the graph and trace to a point which is to the left of the chosen point. On the Home Screen, store the current X-value into A and the Y-value into B. Return to the graph and trace to a point to the right of the originally chosen point. Store this x-value into C and this y-value into D. On the Home Screen, assign the expression (B-D)/(A-C) to M. Return to the lists and enter M into L2. Go back to the original graph by pressing MEMORY ZPrevious, and repeat the process described above for a few more points on the graph. It might be good to have each person in the class do this process for a single point or two, and then list the resulting values for L1 (the x values) and L2 (the M values) on the board. Each student can then enter all of the data into his calculator. Performing a linear regression on the data gives an equation which comes remarkably close to being y = 2x.

Inverse Variation () or Inverse Square ()

The following lesson is intended for use after students are familiar with inverse variation and inverse square variation. They should be able to recognize the graph and know how to find the value for k. The instructions are given for a TI-82, but other technology can be used.

### A) Inverse Variation

1) Below is a table of data relating the pressure of a gas to its volume. (Note: this is not experimental data)
 x=vol. y=pres. 1 2 1.5 1.3 2 1.0 3 .67 4 .5
2) The first step in analyzing the data is to graph it on your TI-82. Put the data for pressure in L1 and the data for volume in L2. Your graph should look like the one below.

3) There are two ways you can determine if this is an inverse variation graph:

a) Numerically: Using corresponding values from the table, plug into the equation and solve for k. Check the other values and see if that particular k works for all other points.

Continue checking the other values and see if k is close to 2.01

b) Using linearization: On your calculator make L3 the inverse function of . In this case it is . To do this on your calculator, go to s then Edit . Move the cursor on top of the L3 and type . Under plot L1 and L3 on the same graph. If it is close to linear, you know that your original guess that the data is inversely related was correct. Your graph should look like the one below.

c) The next step is to get the regression equation for the line above. On your calculator press then CALC , LinReg (L1, L3). Your calculator will give you values for the slope and the y-intercept. The equation of the regression line is y =.496x + .012. Remember, to get this line you replaced y with 1/y To get the equation for your data, reverse the process and solve for y. So the equation which models your original data is:

.

### B) Inverse Square

1) Below is a table which relates the height of a cylinder to its radius for a constant volume.
 x=radius y=height 1 63.66 2 15.92 3 7.07 4 3.98 5 2.55 6 1.77 7 1.30 8 .99
2) Once again, the first step is to graph the data on your calculator. Clear the lists by going to , ClrList (L1, L2, L3). Put the data for the radius in L1 and the height into L2. Your graph should look like the one below:

3) You cannot tell at first if this is inverse or inverse squared. You can check both. Further, you can check it numerically or using linearization.

i) Numerically: Pick values from the table and plug into both formulas and solve for k. Try another point using that k. For example:

but so it is

ii) Linearization: To check if it is inverse: make and graph L1 and L3. Your graph should look like the one below. Notice that it is not linear.

You can assume that it is not y = . Guess that it is y = . Linearize this. Remember, to linearize you want to replace y with the inverse function of your guess. In this situation it will be . Make and graph L1 and L4. Your graph should look like the one below.

Notice that this is linear. Again, find the linear regression equation of this data. The equation you get is: . Remember that this equation for your data. This equation is:

## Data Collecting Projects

There are several data collecting experiments that could be assigned which would give data that is not perfectly inverse or inverse squared (but which would employ the strategies outlined above). The following is a list of suggestions:

1) Pour a half cup of sugar from varying heights. Measure the height of the resulting pile and compare it to the height from which it is poured.
2) Use a piece of pasta (spaghetti works best) and clip it to a Dixie cup. Hang the spaghetti over the edge of a table and slowly drop pennies into the cup. Measure the length of the pasta from the edge of the table to the Dixie cup and compare it to the number of pennies you can drop in the cup before the pasta breaks.
3) Use a light meter to measure the intensity of the light versus the distance from the source.
4) Make a teeter-totter out of a ruler or a piece of wood. Use several weights of different sizes. Pick one to remain fixed. Measure the distance from the pivot versus the weight in order to balance the fixed weight.

## Introduction to Exponential Modeling

If we suspect that the data varies exponentially, the following linearization method may be applied: each original ordered pair, (x, y) should be re-expressed as (x, ln x).

Input the values of x into L1.

Input the values of y into L2.

View the scatter plot of L2 vs. L1.

If the data appears to follow an exponential curve, re-express each (x, y) in the form (x, ln y)

Enter the values of ln y into L3.

Find the equation of the median-median line for the re-expressed data and plot it with the re-expressed data to check its likely fit.

To write the actual model for the original data, use exponentiation on the ln equation.

e.g. ln y = .037 x + .102

y = e.037x+.102

= e.102 e .037x

y = 1.107 e .037 x

Plot your model with the original data points to check its fit.

Use your graphing calculator and the method of linearization to find an exponential model for the following data set.

### Toxic Fumes Problem

You accidentally inhale some poisonous fumes. Six hours later, you see a doctor. From a blood sample, she determines that the poison concentration is 0.00267 milligrams per cubic centimeter (mg/cc), and admits you to the hospital for observation. Blood tests over the next 36 hours reveal the following:

time concentration

6 .00267

10 .00205

14 .00157

18 .00121

22 .00093

26 .00071

30 .00054

34 .00042

38 .00032

42 .00025

Find an appropriate function equation which represents the relationship between elapsed time and concentration of the poison in the bloodstream.

1. Write the equation for this function. Let C represent concentration and t represent elapsed time since exposure to the fumes.
2. Plot the graph of poison concentration versus time t.
3. The doctor says that you may have had serious tissue damage if the concentration was ever as high as 0.015 mg/cc. Based on your best-fit model, was the concentration ever that high?
4. You may resume normal activities when the poison concentration has dropped to below 0.00010 mg/cc. How long after you have inhaled the fumes will you be able to resume normal activities?
5. The biological half-life of the poison is the time it takes to drop to half of its present value. Find the biological half-life of this poison.

## Regression

The purpose of this section is to explore two sets of real data that seem to demonstrate an exponential growth form. The first example will explore the rate of acute poliomyelitis in the U.S. between 1912 and 1954 and the second example will explore the national debt between 1939 and 1992. Both examples lend themselves to excellent discussions, projects and interdisciplinary work.

## Model:

Since the two models appear to be of the form , taking the natural log of each side of this equation will yield a new equation: . Now ln y is expressed as a linear function of x with a slope of b and a y-intercept of ln a. At this point, the student can run a linear regression and consider the fit of the regression line to the ln y line by considering those two graphs and by looking at the graph of the residuals to check that they are random. Once the student is confident of the fit of the curve, the regression slope can be put back in place of b, the natural log of the regression intercept can be put back in place of a, and the model can be graphed against the original data.

The formulas for the linear regression are:

m =

(where and are the means of the re-expressed data and m is the slope of the regression line)

- m

(where b is the y-intercept of the regression line)

The student can use the template below or can use the linear regression provided on his or her spreadsheet.

 A B C D E 1 X(old) Y(old) X(new) y(new) 2 sum: =SUM(B5:B403) =SUM(C5:C403) =SUM(D5:D403) =SUM(E5:E403) 3 mean: =AVERAGE(B5:B403) =AVERAGE(C5:C403) =AVERAGE(D5:D403) =AVERAGE(E5:E403)
 F G 1 xy x^2 2 =SUM(F5:F404) =SUM(G5:G404) 3 =AVERAGE(F5:F404) =AVERAGE(G5:G404)
 K L 1 m b 2 =(F2-COUNT(G5:G404)*D3*E3)/(G2-COUNT(G5:G404)*D3*D3) =E3-K2*D3

The data sets which follow represent the cases of polio in the U.S. between 1912 and 1954. The graph below represents the data between 1933 and 1954 re-expressed by taking the natural logarithm. The linear regression line has the equation:

Students could discuss how well this line fits the re-expressed data by looking at the graph of the residuals.

LN Y AND THE LINEAR REGRESSION LINE

RESIDUALS

Students should now write the exponential model for the original data and compare their graph with the original data.

The equation is:

ORIGINAL DATA AND EXPONENTIAL MODEL

Activities could include looking at the entire data set, doing research on the development of the Salk vaccine and its implications for the mathematical model, and considering the outlier in 1916 and how it affects the curve.

 Year Rate of cases of acute poliomyelitis in the US per 100,000 Year Rate of cases of acute poliomyelitis in the US per 100,000 1912 5.5 1933 4 1913 4 1934 5.9 1914 2.4 1935 8.5 1915 3.1 1936 3.5 1916 41.1 1937 7.4 1917 4.9 1938 1.3 1918 2.8 1939 5.6 1919 2.3 1940 7.4 1920 2.2 1941 6.8 1921 5.8 1942 3.1 1922 2 1943 9.3 1923 3.1 1944 14.3 1924 4.6 1945 10.3 1925 5.3 1946 18.3 1926 2.3 1947 7.5 1927 8.8 1948 19 1928 4.3 1949 28.3 1929 2.4 1950 22.1 1930 7.5 1951 18.5 1931 12.8 1952 37.2 1932 3.1 1953 22.5 1933 4 1954 23.9

The second activity uses the following data on the Gross U.S. Debt to find an exponential model. Students will find that this is a piece-wise defined function. It will be important to investigate various subsets of the entire domain to see if they can be defined separately. This would be an appropriate time to work with history teachers on the implications of the trends within each period.

 Fiscal Year Gross US Debt in Billions of Dollars Fiscal Year Gross US Debt in Billions of Dollars 1939 252 1966 328.5 1940 252.6 1967 340.4 1941 256.9 1968 368.7 1942 255.3 1969 365.8 1943 259.1 1970 380.9 1944 266 1971 408.2 1945 270.8 1972 435.9 1946 271 1973 466.3 1947 257.1 1974 483.9 1948 252 1975 541.9 1949 252.6 1976 643.6 1950 256.9 1977 706.4 1951 255.3 1978 776.6 1952 259.1 1979 828.9 1953 266 1980 908.5 1954 270.8 1981 994.3 1955 274.4 1982 1136.8 1956 272.7 1983 1371.2 1957 272.3 1984 1564.1 1958 279.7 1985 2120.1 1959 287.5 1986 2345.6 1960 290.5 1987 2600.8 1961 292.6 1988 2867.5 1962 302.9 1989 3206.3 1963 310.3 1990 3599 1964 316.1 1991 4002.7 1965 322.3 1992 4410.5

A simple classroom project to collect a manageable set of data involves the use of an overhead projector. After placing a small object, a six inch plastic rule is nice, on the overhead, students can form small groups to measure both the length and width of the image as the projector is gradually moved away from the screen and refocused. Each group of three or four can be responsible for one position of the projector, which is one data point, and nine data points yield a data set which is particularly nice for median-median data reduction. Someone needs to be responsible for recording the data clearly on the board so that each student can have a copy to work with.

Plotting the data of length of image vs. distance from screen yields very linear data, and this is probably the biggest weakness of this model - the data is just too good. As an extension, since both length and width of the image vary directly with the distance of the projector from the screen, their product, the area of the shadow should vary as the square of that distance. What we get is a nice quadratic relationship. Finally, if we consider the volume of the pyramid whose base is the shadow and whose height is the distance we have a volume which varies as the distance and will yield cubic data.

This one activity quickly produces data which fits three different models, involves the entire class, and can be taught at a number of levels depending on the skills of the students. In a general math class students can simply use a ruler to draw a line of best fit to the linear data, and use the graph to make interpolative predictions. Then it's off to the projector and screen to test their predictions. On a more advanced level we can talk about regressions, re-expressions, and scaling factors in the real world.

## Data Sources

Economic Report of the President 1993, US Government Printing Office

The World Fact Book 1992, US Government Printing Office

Economic Indicator Handbook, Arsen J. Dornay, Editor.

Gale Research, Inc. Detroit 1992

World Eagle, Monthly Social Studies Resource

64 Washburn Ave. Wellesley, MA 02181

Statistical Abstract of the US

Almanacs

US Census Bureau

Woodrow Wilson Leadership Program in Mathematics lpt@www.woodrow.org
The Woodrow Wilson National Fellowship Foundation webmaster@woodrow.org
CN 5281, Princeton NJ 08543-5281 Tel:(609)452-7007 Fax:(609)452-0066