Bivariate Regression
Instructions
- Homework assignment is here with all questions commented out.
- Complete all of the coding tasks in the
.qmd
file - Upload your individual exercise to your
GitHub
repo by Wed 11:59pm. - Remember to clean your github repo and sort hw submissions by weeks. Each week should have one folder.
Homework
Examine the relationship between an interval-level variable Female Representation (women09) and another interval-level variable Economic Development (gdp 10 thou) by running a regression model. Both of the variables are in the world
data set. In doing so, follow the steps
below. You can download the data
here
-
Before doing any analyses, do some univariate analyses (graphical and numerical) on each variable to get a sense of how these variables are distributed.
-
Omit observations where either one of the two variables is missing. Do so by creating a smaller data set. How many observations do we have left?
-
Before running a regression analysis, calculate the correlation coefficient and report the results. Is there a statistically significant relationship? Is the relationship positive or negative?
-
Treat the Female Representation variable as the dependent variable and the Economic Development variable as the independent variable, and form the null and alternative hypotheses (write your answers as a comment).
-
Estimate a simple linear regression model of Female Representation on Economic Development and report the findings verbally (as a comment). Make sure that you use the smaller data set you created above. (a) What does the estimated regression equation look like? (b) What is the sign of the coefficient for X? (c) What is the size of the coefficient for X? That is, how many percentage points will Female Representation increase / decrease if we increase the value of gdp 10 thou by 10,000 dollars? What if we increase the value of gdp 10 thou by 1,000 dollars instead? (d) Is the estimated coefficient statistically significant? At what confidence level?
-
Illustrate the marginal effect of X on Y by creating a plot of the regression line with 95% confidence intervals around the line.
-
How good is the model fit? Answer this question by answering the following two sets of sub-questions: (a) How much of the variation in Female Representation is explained by Economic Development? Would you say this is big enough? (b) How far off are our predictions, on average? Would you say this is too big?
-
Obtain predicted values of Female Representation and store them in a new column in the data set. Find the predicted value of Female Representation for Rwanda and compare it with the actual value. How far off is the prediction from the actual value?
-
Let’s try manually controlling for Electoral System (i.e., whether or not a country adopts a PR system) in our analysis. That is, do a regression analysis of Female Representation (Y) and Economic Development (X) for PR system countries, and do the same regression analysis for non-PR countries.
-
What is the estimated effect of Economic Development on Female Representation for PR countries? Answer this with a graph and with numbers.
-
What is the estimated effect of Economic Development on Female Representation for non-PR countries? Answer this with a graph and with numbers.
-
Obtain predicted values of Female Representation and store them in a new column in the data set for PR countries. Find the predicted value of Female Representation for Rwanda and compare it with the actual value. How far off is the prediction from the actual value?