Probability: The Basics

Birthday Problem (Modified)

There are 100 people in a room. Assume each person’s birthday is equally likely to be any of the 365 days of the year (no leap babies) and birthdays are independent. What is the probability that at least three of them share the same birthday?

Use simulation to answer this question. You need to explain your code so we can assess whether you have got the idea of using random samples and loops to simulate the estimated probability.

set.seed(1)
n <- 100
match <- 0

# Simulate a random sample of 10000 rooms and check for matches in each room
sam = 10000
# i = 1

for(i in 1:sam){
  birthdays <- sample(365, n, replace = TRUE)
  if( any(as.numeric(table(birthdays) ) >= 3) ){
    match <- match + 1
  } 
}


# Calculate the estimated probability of a match and print it
p_match <- match / sam # how many matches you found in your simulation of n
print(p_match)
## [1] 0.6495
rm(match)

The Monty Hall Game

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a sports car, which you want; behind the others, goats, which you do not want. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 2, which has a goat. He then says to you, “Do you want to switch to opening the door No. 3?” Is it to your advantage to switch your choice?

Watch the show here

Use simulation to answer this question. You need to explain your code (use # comments) so we can assess whether you have got the idea of using random samples and loops to simulate the estimated probability.

1. Winning probability with “No switching/sticking”

Let’s write some code to simulate the Monty Hall game show. In this exercise, we will have the contestant choose Door #1, “stick” every time, and find the relative frequency of winning after 10000 iterations of the game.

First, we will code just one iteration. Then, we will incorporate this into a for loop to run the process repeatedly in order to obtain a simulated estimate of the true win probability, using the idea of a counter we discussed previously.

  1. Use the sample function to select one of the doors to have the prize.
set.seed(1)
doors <- c(1,2,3)

# Randomly select one of the doors to have the prize
prize <- sample(x = doors, size = 1)
initial_choice <- 1

# Check if the initial choice equals the prize
if(initial_choice == prize){
  print("The initial choice was correct!")
 }
## [1] "The initial choice was correct!"

Initialize win_count to an appropriate value before any iterations have been run.

# Define counter
win_count <- 0

# Increment win_count if the choice is correct.

# Run 10000 iterations of the game
for(i in 1:10000){
  prize <- sample(x = doors, size = 1)
  initial_choice <- 1
  if(initial_choice == prize){
    win_count <- win_count + 1
  }
}

# Print the estimated probability of winning based on the values from the code.
print(win_count / 10000)
## [1] 0.3423

Great! Now you see although there are only two doors remaining after the reveal, we see that the winning probability stays at 1/3.

2. Winning probability with “switch”

Before we simulate the situation of “switch,” let us first write a function to do the reveal in this context. We will then use this function in our next exercise to simulate the win probability when switching.

Recall that if the initially chosen door is correct, then the host will choose one of the other two doors at random to reveal.

If the initially chosen door is incorrect, then the host will simply reveal the other incorrect door. Note that there is nothing random about the reveal in this case.

reveal_door <- function(doors, prize, initial_choice){
  if(prize == initial_choice){
    # Sample at random from the two remaining doors
    reveal <- sample(x = doors[-prize], size = 1)
  } else {
    
    # When the prize and initial choice are different, reveal the only remaining door 
    reveal <- doors[-c(prize,initial_choice)]
  }  
}

Now, we can now use this function in the following exercise.

prize <- sample(doors,1)
initial_choice <- 1

# Use the reveal_door function to do the reveal
reveal <- reveal_door(doors, prize, initial_choice)

# Switch to the remaining door
final_choice <- doors[-c(initial_choice, reveal)]
print(final_choice)
## [1] 3
# Check whether the final choice equals the prize
if(final_choice == prize){
  print("The final choice is correct!")
}
## [1] "The final choice is correct!"
# Initialize the win counter
win_count <- 0

for(i in 1:10000){
  prize <- sample(doors,1)
  initial_choice <- 1
  reveal <- reveal_door(doors, prize, initial_choice)
  final_choice <- doors[-c(initial_choice, reveal)]
  if(final_choice == prize){
    
    # Increment the win counter
    win_count <- win_count + 1
  }
}

# Print the estimated probability of winning
print(win_count / 10000)
## [1] 0.6674

Great! This demonstrates the counterintuitive result that the probability jumps up to 2/3 by switching.