Statistics
Assignment 3: Running a regression
Instructions
We will again use the height and weight data (named
height-weight.csv). If you do not already have the data set
stored on your machine in your dat folder, download it from
my website.
As a reminder, the data set has 25,000 observations of two variables:
height: Observed height (cm)weight: Observed weight (kg)
For this assignment, we will test the relationship between height and weight using a scatter plot and a regression.
Task 1: Create a scatter plot
Generate the plot you see below. Use whatever colors you want!
Take a moment to think about what you might expect to see in the regression. Is there a clearly defined relationship between height and weight?
Task 2: Run the regression
Run a regression that aligns with the following equation:
\[ weight = a + (b \times height) \]
The summary of your regression should look like this:
##
## Call:
## lm(formula = weight ~ height, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.2808 -3.0438 -0.0236 3.0907 17.7321
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -37.455727 1.034289 -36.21 <2e-16 ***
## height 0.550646 0.005987 91.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.572 on 24998 degrees of freedom
## Multiple R-squared: 0.2529, Adjusted R-squared: 0.2528
## F-statistic: 8461 on 1 and 24998 DF, p-value: < 2.2e-16
Task 3:
This is the most difficult part. Answer the following questions:
- How do you interpret the coefficient for height?
- Is this a statistically significant relationship?
- How much of the variation in weight is explained by height?
- Is the intercept meaningful in this example?