Predicting birthweight based on maternal risk factors (34 points). In this problem, we will consider predicting the birthweight (birthwt) of newborns in milligrams, using the age of the mother at the time of delivery (age), the weight of the mother at the last menstrual period (lwt) in kilograms, and the mother’s race (race). For maternal race, we have three levels: “white,” “black,” and “other.” This type of analysis is of clinical interest to obstetricians and obstetrics researchers who want to study the risk factors for low birth weight (birth weight less than 2.5 kg), which is a common adverse pregnancy outcome. The dataset birthweight.csv contains this data on 𝑛 = 189 randomly selected deliveries.
a. (2 points) Download the dataset birthweight.csv and read it into R. In order to register the race column as a factor (categorical variable), please use the following R code:
birth.dat <- read.csv(“birthweight.csv”, header=T, stringsAsFactors = T)
Next, make white the baseline (or reference) group for race with the following line of code:
birth.dat$race <- relevel(birth.dat$race, ref=”white”)
Finally, fit a multiple linear regression model to this data, with birthwt as the response variable. What is the fitted equation for our model, and what is the adjusted 𝑅2 for this model?
b. (2 points) We know that the patients in this study are independent, so the independent errors assumption is reasonable. Use diagnostic plots to check the other model assumptions for linear regression and check for outliers. Include these plots in your homework submission. Are the assumptions met, and are there any clear influential points?
c. Now we will conduct inference for our model.
i. (1 point) What are the F-statistic, p-value, and conclusion of the F- test for our model?
ii. (3 points) What are the interpretations for the estimates of the non- intercept regression coefficients in the context of this problem?
iii. (2 points) What are the t-statistics, p-values, and conclusions of the t-tests for the non-intercept regression coefficients?
iv. (3 points) What are the confidence intervals for the non-intercept regression coefficients in the context of this problem? Give their interpretation in the context of this problem.
d. (2 points) Conduct a marginal analysis of the association between age and birthweight. Is age marginally associated with birthweight? Explain.