Point Estimates and CIs (answer)

Instructions

  1. Homework assignment is here with all questions commented out.
  2. Complete all of the coding tasks in the .qmd file
  3. Upload your individual exercise to your GitHub repo by Mon 11:59pm.
  4. Remember to clean your github repo and sort hw submissions by weeks. Each week should have one folder.
  5. For the extra exercises, please reference to the following R code first here

Create a vector called ID that contains a number from 1 through 6

ID <- seq(from = 1, to = 6)

Create a vector called Race that contains the following: “black”, “white”, “black”, “hispanic”, “white”, “white”

Race <- c("black", "white", "black", "hispanic", "white", "white")

What is the type of the Race vector? Is it a factor? Is it a character? Is it a logical vector? Find out by using some commands.

is.factor(Race)
## [1] FALSE
mode(Race)
## [1] "character"
class(Race)
## [1] "character"

Create a vector called Voted_For_Obama that contains the following: TRUE, TRUE, TRUE, FALSE, FALSE, TRUE

Voted_For_Obama <- c(TRUE, TRUE, TRUE, FALSE, FALSE, TRUE)

Find out the data type of the Voted_For_Obama vector

is.logical(Voted_For_Obama)
## [1] TRUE
mode(Voted_For_Obama)
## [1] "logical"
class(Voted_For_Obama)
## [1] "logical"

Create a vector called Party_ID that contains the following: “Dem”, “Dem”, “Dem”, “Rep”, “Ind”, “Rep”

Party_ID <- c("Dem", "Dem", "Dem", "Rep", "Ind", "Rep")

Create a vector called Income_Level that contains the following: “High”, “Low”, “Low”, “High”, “High”, “Low”

Income_Level <- c("High", "Low", "Low", "High", "High", "Low")

Create a vector called Approval that contains the following: 70, 80, 68, 20, 10, 60

Approval <- c(70, 80, 68, 20, 10, 60)

Create a data frame called vote.data by combining the six vectors you have created above.

vote.data <- data.frame(ID, Race, Voted_For_Obama, Party_ID, Income_Level, Approval)

We have seen above that the Race vector was a character vector. We have learned in the joint exercise that we should treat a vector like this one as a factor, not as a character.

A nice thing about data frame objects is that, it will convert vectors like the Race vector into factors automatically. To convince yourself that the Race vector included in the vote.data is indeed a factor vector, apply the is.factor function on it.

is.factor(vote.data $ Race)
## [1] FALSE

Notice that is.factor(Race) returns FALSE but the command you wrote above should return TRUE.

How many people in this data set voted for Obama?

That is, for how many observations does the variable Voted_For_Obama take the value of TRUE? Write R commands that gives you the answer. Note that I know the answer is 4. What you need to give me is the command that gives us the answer 4.

Using length

length(vote.data $ Voted_For_Obama[vote.data $ Voted_For_Obama == TRUE])
## [1] 4
# Using nrow
nrow(vote.data[vote.data $ Voted_For_Obama == TRUE, ])
## [1] 4

There is one person in this data set who identifies himself as “Ind” (Independent). Did he vote for Obama?

Again, I know he didn’t. Give me the command that gives us the answer FALSE

vote.data $ Voted_For_Obama[vote.data $ Party_ID == "Ind"] 
## [1] FALSE

Create a subset of the data set that contains only “white” people. Store this smaller data set into an object called vote.white

vote.white <- vote.data[vote.data $ Race=="white", ]

Show the third column of the newly created data set vote.white

vote.white[, 3]
## [1]  TRUE FALSE  TRUE

How many white voters in this mini data set voted for Obama?

Again, I know the answer is 2. Write a command that gives the answer 2.

length(vote.white $ Voted_For_Obama[vote.white $ Voted_For_Obama == TRUE])
## [1] 2

Modifying a data frame object

Let’s create a fake data frame called my.data, and see how we can add variables to an existing data frame.

Country_ID <- seq(from = 1, to = 7)

Country_Name <- c("United States", "United Kingdom", "Japan", "China", "Brazil", "Germany", "Egypt")

# The line above is a bit too long. We can write the sentence in the following way
# instead and get the same result (you have to execute the two lines at once).

Country_Name <- c("United States", "United Kingdom", "Japan", 
                  "China", "Brazil", "Germany", "Egypt")

# We can insert a line break at other places as well. 
# For example, the following will also work

Country_Name <- c(
  "United States", "United Kingdom", "Japan", 
  "China", "Brazil", "Germany", "Egypt")

# However, you MUST NOT insert a line break within quotation marks, 
# like the following 

Bad_Vector <- c("United States", "United 
                Kingdom", "Japan", "China", "Brazil", "Germany", "Egypt")

# This is not only difficult to read, but also incorrect. R will interpret it
# to mean that the second element is "United + (line break) + Kingdom". So, if 
# we take a look at it

Bad_Vector
## [1] "United States"                    "United \n                Kingdom"
## [3] "Japan"                            "China"                           
## [5] "Brazil"                           "Germany"                         
## [7] "Egypt"
# The symbol that shows up in the second element, "\n", is "line break" in R. 

Regime_Type <- c("Democracy", "Democracy", "Democracy", "Dictatorship",
                 "Democracy", "Democracy", "Dictatorship")

GDP_PC <- c(51163, 39367, 46838, 6070, 11347, 41376, 3115)

EU_Member <- c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE)

# Notice that we don't put quotation marks around TRUE or FALSE because 
# these are special words that R recognizes as values. 
# Notice also that TRUE and FALSE are shown in a different color (probably 
# light blue / pale purple), because, again, R recognizes that they are special.


# Now that we have all five columns, let's bind them together and create a 
# data set. We use the data.frame() function. 

data.frame(Country_ID, Country_Name, Regime_Type, GDP_PC, EU_Member)
##   Country_ID   Country_Name  Regime_Type GDP_PC EU_Member
## 1          1  United States    Democracy  51163     FALSE
## 2          2 United Kingdom    Democracy  39367      TRUE
## 3          3          Japan    Democracy  46838     FALSE
## 4          4          China Dictatorship   6070     FALSE
## 5          5         Brazil    Democracy  11347     FALSE
## 6          6        Germany    Democracy  41376      TRUE
## 7          7          Egypt Dictatorship   3115     FALSE
# Of course, we could (and should) store this as an object. Let's call it
# my.data.

my.data <- data.frame(Country_ID, Country_Name, Regime_Type, GDP_PC, EU_Member)

To add a new variable, you also use the $ symbol. Specifically, we write DATAFRAME $ NEW_VARIABLE_NAME <- VALUES

For example, in order to add a new variable called Population that contains the following values: 318946000, 64105654, 127090000, 1367420000, 203322000, 80781000, 87354300

We write:

my.data$Population <- c(318946000, 64105654, 127090000, 1367420000, 
                          203322000, 80781000, 87354300)

my.data
##   Country_ID   Country_Name  Regime_Type GDP_PC EU_Member Population
## 1          1  United States    Democracy  51163     FALSE  318946000
## 2          2 United Kingdom    Democracy  39367      TRUE   64105654
## 3          3          Japan    Democracy  46838     FALSE  127090000
## 4          4          China Dictatorship   6070     FALSE 1367420000
## 5          5         Brazil    Democracy  11347     FALSE  203322000
## 6          6        Germany    Democracy  41376      TRUE   80781000
## 7          7          Egypt Dictatorship   3115     FALSE   87354300

If the Console window is wide enough, it should be showing up like this:

#   > my.data
# Country_ID   Country_Name  Regime_Type GDP_PC EU_Member Population
# 1          1  United States    Democracy  51163     FALSE  318946000
# 2          2 United Kingdom    Democracy  39367      TRUE   64105654
# 3          3          Japan    Democracy  46838     FALSE  127090000
# 4          4          China Dictatorship   6070     FALSE 1367420000
# 5          5         Brazil    Democracy  11347     FALSE  203322000
# 6          6        Germany    Democracy  41376      TRUE   80781000
# 7          7          Egypt Dictatorship   3115     FALSE   87354300

We can see that a new column is now added at the end (far right). If the Console window is not wide enough, it may show up like this:

#   > my.data
# Country_ID   Country_Name  Regime_Type GDP_PC EU_Member
# 1          1  United States    Democracy  51163     FALSE
# 2          2 United Kingdom    Democracy  39367      TRUE
# 3          3          Japan    Democracy  46838     FALSE
# 4          4          China Dictatorship   6070     FALSE
# 5          5         Brazil    Democracy  11347     FALSE
# 6          6        Germany    Democracy  41376      TRUE
# 7          7          Egypt Dictatorship   3115     FALSE
# Population
# 1  318946000
# 2   64105654
# 3  127090000
# 4 1367420000
# 5  203322000
# 6   80781000
# 7   87354300

You may want to adjust the width and ask R to show it again.

If you want to browse a data frame object, use the View function

# View(my.data)

We can create a new variable that is an answer to some operations.

For example, GDP_PC measures per capita GDP (in 2013 US dollars). This was calculated as a country’s GDP divided by its population:

#   GDP_PC = GDP / Population
# Therefore, if we multiply GDP_PC and Population, we can obtain its
# GDP: 
#     GDP = GDP_PC * Population

Create a variable within the my.data object called GDP which is equal to the product of GDP_PC and Population (GDP_PC times Population).

my.data$GDP <- my.data$GDP_PC * my.data $ Population

my.data
##   Country_ID   Country_Name  Regime_Type GDP_PC EU_Member Population
## 1          1  United States    Democracy  51163     FALSE  318946000
## 2          2 United Kingdom    Democracy  39367      TRUE   64105654
## 3          3          Japan    Democracy  46838     FALSE  127090000
## 4          4          China Dictatorship   6070     FALSE 1367420000
## 5          5         Brazil    Democracy  11347     FALSE  203322000
## 6          6        Germany    Democracy  41376      TRUE   80781000
## 7          7          Egypt Dictatorship   3115     FALSE   87354300
##            GDP
## 1 1.631823e+13
## 2 2.523647e+12
## 3 5.952641e+12
## 4 8.300239e+12
## 5 2.307095e+12
## 6 3.342395e+12
## 7 2.721086e+11

It is a little bit difficult to read numbers like 1.631823e+13, which means 1.631823 * 10^13. Let’s create another variable that shows re-scaled GDPs by dividing the raw GDP by 1000000 (on million).

my.data $ GDP_mil <- my.data $ GDP / 1000000

my.data
##   Country_ID   Country_Name  Regime_Type GDP_PC EU_Member Population
## 1          1  United States    Democracy  51163     FALSE  318946000
## 2          2 United Kingdom    Democracy  39367      TRUE   64105654
## 3          3          Japan    Democracy  46838     FALSE  127090000
## 4          4          China Dictatorship   6070     FALSE 1367420000
## 5          5         Brazil    Democracy  11347     FALSE  203322000
## 6          6        Germany    Democracy  41376      TRUE   80781000
## 7          7          Egypt Dictatorship   3115     FALSE   87354300
##            GDP    GDP_mil
## 1 1.631823e+13 16318234.2
## 2 2.523647e+12  2523647.3
## 3 5.952641e+12  5952641.4
## 4 8.300239e+12  8300239.4
## 5 2.307095e+12  2307094.7
## 6 3.342395e+12  3342394.7
## 7 2.721086e+11   272108.6

So, now this new variable GDP_mil is shown in 1 million dollars. For example, the value of GDP_mil is 16318234.2 for the United States, which means that GDP of the US is 16318234.2 million dollars.

Create a new variable within my.data called Is_Democracy that is a logical vector that tells us whether or not a country is democratic.

Hint: utilize the Regime_Type variable included in the my.data object.

my.data $ Is_Democracy <- my.data $ Regime_Type == "Democracy"