Statistics

Assignment 3: Running a regression

Instructions

We will again use the height and weight data (named height-weight.csv). If you do not already have the data set stored on your machine in your dat folder, download it from my website.

As a reminder, the data set has 25,000 observations of two variables:

  • height: Observed height (cm)
  • weight: Observed weight (kg)

For this assignment, we will test the relationship between height and weight using a scatter plot and a regression.

Task 1: Create a scatter plot

Generate the plot you see below. Use whatever colors you want!

Take a moment to think about what you might expect to see in the regression. Is there a clearly defined relationship between height and weight?

Task 2: Run the regression

Run a regression that aligns with the following equation:

\[ weight = a + (b \times height) \]

The summary of your regression should look like this:

summary(reg1)
## 
## Call:
## lm(formula = weight ~ height, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.2808  -3.0438  -0.0236   3.0907  17.7321 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -37.455727   1.034289  -36.21   <2e-16 ***
## height        0.550646   0.005987   91.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.572 on 24998 degrees of freedom
## Multiple R-squared:  0.2529, Adjusted R-squared:  0.2528 
## F-statistic:  8461 on 1 and 24998 DF,  p-value: < 2.2e-16

Task 3:

This is the most difficult part. Answer the following questions:

  • How do you interpret the coefficient for height?
  • Is this a statistically significant relationship?
  • How much of the variation in weight is explained by height?
  • Is the intercept meaningful in this example?