Predicting home sales prices from square footage (36 points). In this problem, we will consider predicting house sale prices (SalePrice) from square footage (SqrFeet) using a dataset of 𝑛 = 506 houses.

a. (2 points) Download the dataset homes.csv and load it into R. Fit a simple linear regression model to this model with SalePrice as the response variable and SqrFeet as the predictor. What is the fitted equation for our model, and what is the coefficient of determination for this model?

b. (3 points) Plot a scatterplot of the data (SqrFeet, SalePrice), along with the best fit line from part (a) on top of this scatterplot. Make sure to add an appropriate title and labels to your plot and ensure that the best fit line is distinguishable from the scatterplot points.

c. (2 points) In our dataset, the houses were randomly selected, so we can assume that the independence assumption is met. Use diagnostic plots to check that the other linear model assumptions are met for the model in part (a), and check for possible outliers. Report the diagnostic plots in your homework submission. Do any of the assumptions appear to be violated? If so, which ones? Are there any potential outliers?

d. Sometimes an effective way to fix violations of the model assumptions and outliers in linear regression is to log transform the response variable and/or the predictor variable. We will consider a log transformation of just the response variable.

  • i. (1 point) In your dataframe, create a new column called logSalePrice by log-transforming SalePrice. Report the first 10 observations in the updated dataframe using the head() function.
  • ii. (1 point) Fit a new simple linear regression model with logSalePrice as the response and SqrFeet as the predictor. What are the least squares estimators for 𝛽0 and 𝛽1?
  • iii. (1 point) What is the coefficient of determination for this model? How does it compare to your answer in part (a)? Explain. (Note: For SLR, we can compare the 𝑅2 between different models since we only have one predictor.)
  • iv. (3 points) Plot a scatterplot of (SqrFeet, logSalePrice), along with the best fit line from part (i) on top of this scatterplot. Make sure to add an appropriate title and labels to your plot and ensure that the best fit line is distinguishable from the scatterplot points. What do you observe about this new log-transformed model?

e. (2 points) Use diagnostic plots to check that the linear regression assumptions (besides the independent errors assumption) are met for the model fit in part (d), and check for possible outliers. Report the diagnostic plots in your homework submission. What do you observe? How do these plots compare to the plots in part (c)?

Instant Solution Available for $5

Chat with us or submit your question here.

Related Questions and Answers

Predicting health care charges of insurance beneficiaries (30 points). Health insurance companies often have to cover a sizable portion of the health care costs of their primary beneficiaries. Thus, it is of practical interest for these companies to predict the cost of medical bills from individual patient characteristics. In this problem, we will consider predicting the medical costs (or charges) in dollars for 𝑛 = 1338 primary beneficiaries

Check Solution »

Predicting birthweight based on maternal risk factors (34 points). In this problem, we will consider predicting the birthweight (birthwt) of newborns in milligrams, using the age of the mother at the time of delivery (age), the weight of the mother at the last menstrual period (lwt) in kilograms, and the mother’s race (race)

Check Solution »

A prime number (or a prime) is a natural number that has exactly two distinct natural number divisors: 1 and itself. The purpose of this problem is to write a function (say its name is check.prime) to check whether or not a given natural number is a prime. Unless you want to use some other more advanced method, you can write your function based on the so-called “trial division” method. The idea is as follows. For a positive integer

Check Solution »

Share this question:

Facebook
Twitter
Pinterest
LinkedIn
WhatsApp

Get Step-by-Step Solutions

Experience expert help with your homework
RECENT REVIEWS
Kimberly
Kimberly
Statistics
Read More
Excellent work. Meet my expectations. Thanks.
John
John
Math
Read More
" Learnmathstat.com " is a name that MUST remember when you have a project in mathematics, even if that project is related to an advanced course!
Eva
Eva
Algebra
Read More
Very professional, high quality, and always delivers on time.
Previous
Next
Scroll to Top