[WW HOME] [TEACHING] [MATH] [SEARCH] [FEEDBACK]


Data Analysis in Algebra II

Karen Bryant, Betsy Kumm, Karen Levin,

Mary Viruleg, Betsey Wood, Tom Walters

Abstract

Data Analysis can and should be included at every level of the high school mathematics curriculum. As most traditional textbooks offer minor, if any, treatment of this topic, we present several lessons which are appropriate for inclusion in an Algebra II course. These lessons use sets of data which lead to linear, quadratic, inverse variation, and exponential functions, and can be used throughout the course as students acquire knowledge of the particular type of function. A sample lesson is given for each, with suggestions for further study, dependent upon the abilities of the students. We also include written sources for acquiring data, and data collection projects which can be done by students.

Any technological tool which is capable of receiving data sets, plotting them in a specified manner, and determining regression equations may be used. We use the TI-82 graphics calculator in these lessons, as it is readily available to students and teachers and is also very user-friendly. Several of the lessons also use spreadsheets, which can very clearly demonstrate, both in tabular and in graphical form, the manner in which variables change. Included are brief descriptions of how to use the statistics capabilities of the TI-82, and how to utilize a spreadsheet to do data analysis.

Data Analysis on the TI-82

To input data sets into memory, select EDIT Edit.... The values of the independent variable should be placed in the column headed L1 and the values of the dependent variable in column L2. If you wish to sort the data with respect to the independent variable, press EDIT SortA( [L1] , [L2] ) . This sorts the data in L1 into ascending order, with L2 dependent. SortD( places the data in descending order.

To plot the data points, press Plot 1.... Turn on Plot 1, choose scatter plot as type, L1 for Xlist, L2 for Ylist, for Mark. Press ZOOM 9 (to select Zoom Stat), and the scatter plot will be graphed. Examine the scatter plot to determine the most appropriate form of the regression equation. The options which the TI-82 offers are: median-median, linear (ax+b), quadratic, cubic, quartic, linear (a+bx), logarithmic, exponential, and power. To obtain the regression equation, select s CALC Set-Up.... Xlist for 2-Var Stats should be L1, Ylist should be L2, and Freq should be 1. Again pressCALC, and select a regression equation. The values for the constants in the equation will be displayed. To graph this equation, press Statistics...EQ RegEQ, and then . Both the scatter plot and the regression equation will be displayed.

Data Analysis on a Spreadsheet

Enter the values of the independent and dependent variables into two columns on the spreadsheet. Look at a scatter plot of the data to determine the type of function behavior this data represents. It may be necessary to re-express the values of the independent variable, in order to get an accurate and reasonable mathematical model of the data. If so, place these re-expressed values into another column, plot the independent variable values against the re-expressed values, and run a linear regression. Check that the residuals are small and random. If so, then the re-expression is a good one. Applying the inverse of the re-expression to the regression line gives the mathematical model. For an example of a template which can be used for re-expression and regression, see the example lesson on exponential growth.

Median-Median Line

The process of finding the median-median line can be summarized as follows:

1) Divide the data into three groups, left to right, as evenly as possible.

2) Find the median "x" value and the median "y" value of the data in each of the three intervals, and call these points the summary points. It is not necessary that they be actual data points; they simply represent the median data values in each of the three data intervals.

3) Using the summary points in the first and third intervals, find the equation of the line passing through them using standard two-point techniques.

4) Using the equation of the line found in part three, find the predicted "y" value for the "x" of the middle summary point. The difference between the calculated or predicted "y" and the middle summary "y" is very important.

5) To complete the process simply slide the line from part 3 one-third of the difference we found in part 4. That is, change the y-intercept enough to spread the difference evenly over the three intervals.

In addition to being a manageable technique for high school students, this median-median line has the virtue of being resistant to outliers, data points whose extreme values are frequently, but not exclusively, caused by error.

Example of Linear Data

This lesson is intended for use following a unit on linear functions. Students are expected to know how to graph linear functions by hand and to have a rudimentary knowledge of the TI-82.

We have the following data on family income versus percent of the group registered to vote in 1988. (Source of data: "World Eagle, The Monthly Social Studies Resource", February 1990, p.17.)

income ($)

% registered

0-4999

47.6

5000-9999

52.8

10000-14999

57.4

15000-19999

63.2

20000-24999

67.4

25000-34999

71.9

35000-49999

77.9

50000-

81.8

Using all but the last piece of data (7 points) and using the lower endpoint in each income category, enter the data in L1 and L2 and plot. Then under Calc find the linear regression and plot it on the graph with the data.

Depending on your group, you may discuss the least squares method for deriving this line, or just open the discussion of what "line of best fit" might mean and how you might find it. Then, add in the last piece of data (making 8 points) and see how the line changes. Encourage the students to experiment with changing pieces of data to see what the effect on the line will be. As an extension, you might also introduce residuals at this point.

Figure 1 and Figure 2 show the scatter plots for the data (7 points and then 8 points). The respective equations for the linear regressions are y = 8.85x + 48.67 and y = 7.09x + 50.81.

Examples of Quadratic Data

This lesson is intended for use following a unit on quadratic functions. Students should be familiar with the various forms of quadratic functions, how these are derived (e.g. completing the square) and how the forms indicate shifts (e.g. y = 3(x + 2)2 - 5 is shifted 2 units to the left and 5 units down).

We begin with a very simple set of data. As usual, enter the data in L1 and L2 and plot. You should get something similar to Figure 1.

x

y

0

0

1

1

2

4.2

3

8.8

4

16.3

5

25.2

6

36.02

7

48.9

8

64

9

81.3

10

100

Now, use Calc to find the quadratic regression. Again, you may wish to discuss what kind of method(s) might produce such an equation. This equation should be approximately y = .99x2 + .02x +.01 and you can graph this by Y=5.Statistics, followed by two right arrows to enter EQ, then 7.RegEQ.

Here is the interesting part. Go to L3 at the very top of the list. We can enter an expression for the entries instead of filling them in individually. Enter L3 = and allow the entries to be calculated. Next graph the points from L1 and L3. Remember, you will need to change what plots are on and which lists are graphed under . What you will see is the data forming a linear plot. Why should this be so? Think of L1 as x and L2 as y. Then L3 is . We have:

y = .99x2 + .02x + .01 which is essentially just y = x2

so = = x which is a line.

Notice that if we find the linear regression for the x, data, we find = z = .99x + .01 which is essentially z = x. Encourage students to graph the lines to see how close they are. This process of "straightening" the data is called linearization. The main idea is that if you do not have a quadratic regression key (as on the TI-81) then you can guess at the basic form of the equation for the data, find an inverse (or close to this), and find the linear regression for the x values plotted against the inverse. If you get a line (or close to a line) then you guessed the form of the original equation correctly and you can find this equation by taking the inverse of the inverse, that is:

= .99x +.01 so then y = (.99x + .01)2 = .98x2 + .02x + .0001

This is very close to the quadratic regression equation we found with the quadratic regression key.

Next, let us consider a slightly more complex situation with real data. Start with the following data on fish caught in "joint venture catches" of the U.S. and other countries. (Data adapted from: "World Eagle, The Monthly Social Studies Resource", February 1990, p.18).

years

fish

(since 1979)

(in 10 M lbs.)

0

2.5

1

15

2

32

3

59.8

4

99.6

5

145

6

202

7

290

8

330

Entering this data and graphing we have something like Figure 2.

Now, without using the quadratic regression key, work through the example with the linearization technique. This data looks quadratic; could it be y = x2 ? A quick check on the calculator will show that this is not correct. Open the discussion to what other forms a quadratic might take. Responses should include y = a(x + b)2 + c (horizontal and vertical shift) and y = a(x + b)2 (only horizontal shift). This particular data does not look as if it has much vertical shift so lead the group to the hypothesize that y = a(x + b)2. Then:

= = x + b which is a line.

To find the line, and thus a and b, we again graph x versus , yielding a graph like Figure 3.

The linear regression for this data gives the equation = 2.12x +1.54 . Thus y = (2.12x + 1.54)2 = 4.49x2 + 6.53x + 2.37 is our estimate of a quadratic function that approximates the original data. (Note: we could also calculate the approximate values of a and b from the equations = 2.12 and b = 1.54 which gives a similar quadratic.) Finally it is time to use the quadratic regression key and see that it gives a similar approximating function: y = 4.43x2 + 7.17x + 1.38. These two lines may seem a bit different to the students, but graphing them along with the original data should convince them that the method is fairly good (and great, when you do not have a quadratic regression key!). More advanced groups may also want to do a further check. Complete the square of the function from the quadratic regression, yielding y = 4.43(x + .80)2 - 1.47. We see that this does have a small vertical shift, but small enough to ignore. So we have:

y = 4.43(x + .80)2 which implies = = 2.10x + 1.68

This is relatively close to our linear regression equation = 2.12x + 1.54 .

The issue of horizontal and vertical shift together provides an entirely separate lesson, because y = a(x + b)2 + c involves three parameters, for which the linear regression will only give two. The best place to begin then is estimating c by examining the original data, using = to find a and b and finally reversing the transformation. However, this can involve a lot of guessing and checking, unless you want to bring in some calculus methods. So be careful what you get into!

An Extension

This lesson introduces the concept of the rate of change of a non-linear function. The technology employed is the TI-82. We use the function y = x2, with graphing window: Xmin = 0, Xmax = 2, Ymin = 0, Ymax = 4. Trace along the graph to any point and Zoom in. Go to EDIT and enter the x-value of the point into L1 by placing the cursor in the L1 column and entering "x". Return to the graph and trace to a point which is to the left of the chosen point. On the Home Screen, store the current X-value into A and the Y-value into B. Return to the graph and trace to a point to the right of the originally chosen point. Store this x-value into C and this y-value into D. On the Home Screen, assign the expression (B-D)/(A-C) to M. Return to the lists and enter M into L2. Go back to the original graph by pressing MEMORY ZPrevious, and repeat the process described above for a few more points on the graph. It might be good to have each person in the class do this process for a single point or two, and then list the resulting values for L1 (the x values) and L2 (the M values) on the board. Each student can then enter all of the data into his calculator. Performing a linear regression on the data gives an equation which comes remarkably close to being y = 2x.

Inverse Variation () or Inverse Square ()

The following lesson is intended for use after students are familiar with inverse variation and inverse square variation. They should be able to recognize the graph and know how to find the value for k. The instructions are given for a TI-82, but other technology can be used.

A) Inverse Variation

1) Below is a table of data relating the pressure of a gas to its volume. (Note: this is not experimental data)

x=vol.

y=pres.

1

2

1.5

1.3

2

1.0

3

.67

4

.5

2) The first step in analyzing the data is to graph it on your TI-82. Put the data for pressure in L1 and the data for volume in L2. Your graph should look like the one below.

3) There are two ways you can determine if this is an inverse variation graph:

a) Numerically: Using corresponding values from the table, plug into the equation and solve for k. Check the other values and see if that particular k works for all other points.

Continue checking the other values and see if k is close to 2.01

b) Using linearization: On your calculator make L3 the inverse function of . In this case it is . To do this on your calculator, go to s then Edit . Move the cursor on top of the L3 and type . Under plot L1 and L3 on the same graph. If it is close to linear, you know that your original guess that the data is inversely related was correct. Your graph should look like the one below.

c) The next step is to get the regression equation for the line above. On your calculator press then CALC , LinReg (L1, L3). Your calculator will give you values for the slope and the y-intercept. The equation of the regression line is y =.496x + .012. Remember, to get this line you replaced y with 1/y To get the equation for your data, reverse the process and solve for y. So the equation which models your original data is:

.

B) Inverse Square

1) Below is a table which relates the height of a cylinder to its radius for a constant volume.

x=radius

y=height

1

63.66

2

15.92

3

7.07

4

3.98

5

2.55

6

1.77

7

1.30

8

.99

2) Once again, the first step is to graph the data on your calculator. Clear the lists by going to , ClrList (L1, L2, L3). Put the data for the radius in L1 and the height into L2. Your graph should look like the one below:

3) You cannot tell at first if this is inverse or inverse squared. You can check both. Further, you can check it numerically or using linearization.

i) Numerically: Pick values from the table and plug into both formulas and solve for k. Try another point using that k. For example:

but so it is

ii) Linearization: To check if it is inverse: make and graph L1 and L3. Your graph should look like the one below. Notice that it is not linear.

You can assume that it is not y = . Guess that it is y = . Linearize this. Remember, to linearize you want to replace y with the inverse function of your guess. In this situation it will be . Make and graph L1 and L4. Your graph should look like the one below.

Notice that this is linear. Again, find the linear regression equation of this data. The equation you get is: . Remember that this equation for your data. This equation is:

Data Collecting Projects

There are several data collecting experiments that could be assigned which would give data that is not perfectly inverse or inverse squared (but which would employ the strategies outlined above). The following is a list of suggestions:

1) Pour a half cup of sugar from varying heights. Measure the height of the resulting pile and compare it to the height from which it is poured.
2) Use a piece of pasta (spaghetti works best) and clip it to a Dixie cup. Hang the spaghetti over the edge of a table and slowly drop pennies into the cup. Measure the length of the pasta from the edge of the table to the Dixie cup and compare it to the number of pennies you can drop in the cup before the pasta breaks.
3) Use a light meter to measure the intensity of the light versus the distance from the source.
4) Make a teeter-totter out of a ruler or a piece of wood. Use several weights of different sizes. Pick one to remain fixed. Measure the distance from the pivot versus the weight in order to balance the fixed weight.

Introduction to Exponential Modeling

If we suspect that the data varies exponentially, the following linearization method may be applied: each original ordered pair, (x, y) should be re-expressed as (x, ln x).

On your graphing calculator: (TI-82)

Input the values of x into L1.

Input the values of y into L2.

View the scatter plot of L2 vs. L1.

If the data appears to follow an exponential curve, re-express each (x, y) in the form (x, ln y)

Enter the values of ln y into L3.

Find the equation of the median-median line for the re-expressed data and plot it with the re-expressed data to check its likely fit.

To write the actual model for the original data, use exponentiation on the ln equation.

e.g. ln y = .037 x + .102

y = e.037x+.102

= e.102 e .037x

y = 1.107 e .037 x

Plot your model with the original data points to check its fit.

Use your graphing calculator and the method of linearization to find an exponential model for the following data set.

Toxic Fumes Problem

You accidentally inhale some poisonous fumes. Six hours later, you see a doctor. From a blood sample, she determines that the poison concentration is 0.00267 milligrams per cubic centimeter (mg/cc), and admits you to the hospital for observation. Blood tests over the next 36 hours reveal the following:

time concentration

6 .00267

10 .00205

14 .00157

18 .00121

22 .00093

26 .00071

30 .00054

34 .00042

38 .00032

42 .00025

Find an appropriate function equation which represents the relationship between elapsed time and concentration of the poison in the bloodstream.

1. Write the equation for this function. Let C represent concentration and t represent elapsed time since exposure to the fumes.
2. Plot the graph of poison concentration versus time t.
3. The doctor says that you may have had serious tissue damage if the concentration was ever as high as 0.015 mg/cc. Based on your best-fit model, was the concentration ever that high?
4. You may resume normal activities when the poison concentration has dropped to below 0.00010 mg/cc. How long after you have inhaled the fumes will you be able to resume normal activities?
5. The biological half-life of the poison is the time it takes to drop to half of its present value. Find the biological half-life of this poison.

Regression

On a Spreadsheet:

The purpose of this section is to explore two sets of real data that seem to demonstrate an exponential growth form. The first example will explore the rate of acute poliomyelitis in the U.S. between 1912 and 1954 and the second example will explore the national debt between 1939 and 1992. Both examples lend themselves to excellent discussions, projects and interdisciplinary work.

Model:

Since the two models appear to be of the form , taking the natural log of each side of this equation will yield a new equation: . Now ln y is expressed as a linear function of x with a slope of b and a y-intercept of ln a. At this point, the student can run a linear regression and consider the fit of the regression line to the ln y line by considering those two graphs and by looking at the graph of the residuals to check that they are random. Once the student is confident of the fit of the curve, the regression slope can be put back in place of b, the natural log of the regression intercept can be put back in place of a, and the model can be graphed against the original data.

The formulas for the linear regression are:

m =

(where and are the means of the re-expressed data and m is the slope of the regression line)

- m

(where b is the y-intercept of the regression line)

The student can use the template below or can use the linear regression provided on his or her spreadsheet.


A

B

C

D

E

1


X(old)

Y(old)

X(new)

y(new)

2

sum:

=SUM(B5:B403)

=SUM(C5:C403)

=SUM(D5:D403)

=SUM(E5:E403)

3

mean:

=AVERAGE(B5:B403)

=AVERAGE(C5:C403)

=AVERAGE(D5:D403)

=AVERAGE(E5:E403)


F

G

1

xy

x^2

2

=SUM(F5:F404)

=SUM(G5:G404)

3

=AVERAGE(F5:F404)

=AVERAGE(G5:G404)


K

L

1

m

b

2

=(F2-COUNT(G5:G404)*D3*E3)/(G2-COUNT(G5:G404)*D3*D3)

=E3-K2*D3

The data sets which follow represent the cases of polio in the U.S. between 1912 and 1954. The graph below represents the data between 1933 and 1954 re-expressed by taking the natural logarithm. The linear regression line has the equation:

Students could discuss how well this line fits the re-expressed data by looking at the graph of the residuals.

LN Y AND THE LINEAR REGRESSION LINE

RESIDUALS

Students should now write the exponential model for the original data and compare their graph with the original data.

The equation is:

ORIGINAL DATA AND EXPONENTIAL MODEL

Activities could include looking at the entire data set, doing research on the development of the Salk vaccine and its implications for the mathematical model, and considering the outlier in 1916 and how it affects the curve.

Year

Rate of cases of acute poliomyelitis in the US per 100,000

Year

Rate of cases of acute poliomyelitis in the US per 100,000


1912

5.5

1933

4


1913

4

1934

5.9


1914

2.4

1935

8.5


1915

3.1

1936

3.5


1916

41.1

1937

7.4


1917

4.9

1938

1.3


1918

2.8

1939

5.6


1919

2.3

1940

7.4


1920

2.2

1941

6.8


1921

5.8

1942

3.1


1922

2

1943

9.3


1923

3.1

1944

14.3


1924

4.6

1945

10.3


1925

5.3

1946

18.3


1926

2.3

1947

7.5


1927

8.8

1948

19


1928

4.3

1949

28.3


1929

2.4

1950

22.1


1930

7.5

1951

18.5


1931

12.8

1952

37.2


1932

3.1

1953

22.5


1933

4

1954

23.9


The second activity uses the following data on the Gross U.S. Debt to find an exponential model. Students will find that this is a piece-wise defined function. It will be important to investigate various subsets of the entire domain to see if they can be defined separately. This would be an appropriate time to work with history teachers on the implications of the trends within each period.

Fiscal Year

Gross US Debt in Billions of Dollars

Fiscal Year

Gross US Debt in Billions of Dollars

1939

252

1966

328.5

1940

252.6

1967

340.4

1941

256.9

1968

368.7

1942

255.3

1969

365.8

1943

259.1

1970

380.9

1944

266

1971

408.2

1945

270.8

1972

435.9

1946

271

1973

466.3

1947

257.1

1974

483.9

1948

252

1975

541.9

1949

252.6

1976

643.6

1950

256.9

1977

706.4

1951

255.3

1978

776.6

1952

259.1

1979

828.9

1953

266

1980

908.5

1954

270.8

1981

994.3

1955

274.4

1982

1136.8

1956

272.7

1983

1371.2

1957

272.3

1984

1564.1

1958

279.7

1985

2120.1

1959

287.5

1986

2345.6

1960

290.5

1987

2600.8

1961

292.6

1988

2867.5

1962

302.9

1989

3206.3

1963

310.3

1990

3599

1964

316.1

1991

4002.7

1965

322.3

1992

4410.5

Overhead Projector Data

A simple classroom project to collect a manageable set of data involves the use of an overhead projector. After placing a small object, a six inch plastic rule is nice, on the overhead, students can form small groups to measure both the length and width of the image as the projector is gradually moved away from the screen and refocused. Each group of three or four can be responsible for one position of the projector, which is one data point, and nine data points yield a data set which is particularly nice for median-median data reduction. Someone needs to be responsible for recording the data clearly on the board so that each student can have a copy to work with.

Plotting the data of length of image vs. distance from screen yields very linear data, and this is probably the biggest weakness of this model - the data is just too good. As an extension, since both length and width of the image vary directly with the distance of the projector from the screen, their product, the area of the shadow should vary as the square of that distance. What we get is a nice quadratic relationship. Finally, if we consider the volume of the pyramid whose base is the shadow and whose height is the distance we have a volume which varies as the distance and will yield cubic data.

This one activity quickly produces data which fits three different models, involves the entire class, and can be taught at a number of levels depending on the skills of the students. In a general math class students can simply use a ruler to draw a line of best fit to the linear data, and use the graph to make interpolative predictions. Then it's off to the projector and screen to test their predictions. On a more advanced level we can talk about regressions, re-expressions, and scaling factors in the real world.

Data Sources

Economic Report of the President 1993, US Government Printing Office

The World Fact Book 1992, US Government Printing Office

Economic Indicator Handbook, Arsen J. Dornay, Editor.

Gale Research, Inc. Detroit 1992

World Eagle, Monthly Social Studies Resource

64 Washburn Ave. Wellesley, MA 02181

Statistical Abstract of the US

Almanacs

US Census Bureau

[WW HOME] [TEACHING] [MATH] [SEARCH] [FEEDBACK]


Woodrow Wilson Leadership Program in Mathematics * lpt@www.woodrow.org
The Woodrow Wilson National Fellowship Foundation * webmaster@woodrow.org
CN 5281, Princeton NJ 08543-5281 * Tel:(609)452-7007 * Fax:(609)452-0066