Home Work Assignment 4

for MSIS 385 section 02

Last modified on 4/9/99 11:26 PM

Due April 13 1999 in class

Note:

Make sure to put your name, student ID number and the date on the Assignment. The results of questions 1, 2 and 10 are Excel worksheets which should be turned in on a diskette. Make sure to put your name ID and Homework number 4 on the diskette's label.

 

Question 1

Simple nonlinear regression with Excel

This question is the continuation of Question 9d. Use the data in the same world bank Excel spreadsheet data sheet.

Use the adult male illiteracy rate as the dependent variable and per capita GDP as the independent variable. Create a scatter diagram as before. But this time add several trend lines (i.e. regression curves) that fit the data best. Specifically experiment with the following chart types: linear, polynomial of degree 6, and logarithmic. Which one gets the best fit? What is your criterion for "best fit"? Use the following guidelines to draw the trend lines:

  • On the same scatter diagram draw all four of trend lines.
  • Use different colors for each curve.
  • Make sure for each curve you print its equation and R2; use the same color for each equation as its corresponding curve. (Thus if the logarithmic curve is drawn in red, then the equation and R2 of the logarithmic curve should also be in red) To change the color of an object, be it the curve or the box containing its equation, you may right click on the object and choose format and within the format menu choose the color option to set the color.
  • Save the charts in separate worksheet named Question 1.

Question 2

Multiple regression with Excel

Recordings of the levels of pollutants and various meteorological conditions are made hourly at several stations by the Los Angles Pollution Control District. This agency attempts agency attempts to construct mathematical/statistical models to predict pollution levels and to gain a better understanding of the complexities of air pollution. Obviously very large quantities of data are collected and analyzed, but only a small set will be considered for this problem. The table in the file "polute.xls" gives the maximum level of an oxidant (a photochemical pollutant) and the morning averages of four meteorological variables: wind speed, temperature, humidity, and insolation (a measure of the amount of sunlight.) The data cover 30 days during one summer.

 

Examine the relationship of oxidant level to each of the four meteorological variables. To do so you need to click on tools menu and choose data analysis and then within that choose regression. Then choose the column that contains the oxidant as the dependent variable and the block of all data that contain meteorological data as the independent variable. Within the regression tool menu enter the appropriate cell addresses in "input X range" and "input Y range". Also set the confidence level to 99%; Excel will report both the default 95% and 99% significance levels. Check the labels button. In then output range area check the "new worksheet ply" button so that your results are produced in a new worksheet. In the residuals area check "residuals", "residuals plot" and "line fit plots". Leave all other buttons unchecked. Check OK to get your new worksheet with your charts and statistical analysis. Rename the worksheet to "Question 2a".

Answer the following questions by referring to the data generated by Excel:

  1. For each of the four independent variables let the Null hypothesis indicate that the slope equals zero, that is the there is no relationship between that variable and level of the oxidant. (Thus you will have four different null hypotheses for four different tests, one for each independent variable.) For each one let the alternative hypothesis be that the slope is different from zero implying that there is a relationship between that variable and the level of pollutant in the air. Based on confidence levels of 95% and 99% decide for each variable if the null hypothesis is to be rejected or not.
  2. Look at the confidence intervals (both 95% and 99% significance levels) and report how by just looking at these values you can decide if the null hypothesis is rejected or not for each of the independent variables.
  3. By looking at the data generated in part a) which one of the four meteorological factors seem most important? List these factors in order of decreasing importance. Next repeat part a) but only running the multiple regression on the two most important factors. Compare the R2 and the t-significance levels and P-values of the two most important parameters to their corresponding values in part a). Can you explain the difference?
   

Question 3

Do problem 11.6 on page 390.

Question 4

Do problem 11.10 on page 392.

Question 5

Do problem 11.13 on page 392.

Question 6

Do Problem 11.21 on page 396.

Question 7

Do problem 11.30 on page 399.

Question 8

Do problem 11.37 on page 401.

Question 9

Do problems 11.45, 11.46, 11.47 and 11.48

Question 10

Use Excel to do problem 12.2 on page 429.

Question 11

Do problem 12.5 on page 430