P3-M 4/25 Simulations
Creating simulations using pandas and python libraries
- Objectives
- What are simulations by College Board definition?
- Analyzing an Example: Air-Traffic Simulator
- Functions we often need (python)
- Functions we often need (js)
- College Board Question 1
- Examples
- Adding images (in Python)
- Population Growth and Plots
- Example on how simplification can cause bias
- JS examples
What are simulations by College Board definition?
- Simulations are abstractions that mimic more complex objects or phenomena from the real world
- Purposes include drawing inferences without the constraints of the real world
- Simulations use varying sets of values to reflect the changing state of a real phenomenon
- Often, when developing a simulation, it is necessary to remove specific details or simplify aspects
- Simulations can often contain bias based on which details or real-world elements were included/excluded
- Simulations allow the formulation of hypotheses under consideration
- Variability and randomness of the world is considered using random number generators
- Examples: rolling dice, spinners, molecular models, analyze chemicals/reactions...
Analyzing an Example: Air-Traffic Simulator
- Say we want to find out what the optimal number of aircrafts that can be in the air in one area is.
- A simulation allows us to explore this question without real world contraints of money, time, safety
- Unfortunately we can't just fly 67 planes all at once and see what happens
- Since the simulation won't be able to take all variables into control, it may have a bias towards one answer
- Will not always have the same result
import random # a module that defines a series of functions for generating or manipulating random integers
random.choice() #returns a randomly selected element from the specified sequence
random.choice(mylist) # returns random value from list
random.randint(0,10) #randomly selects an integer from given range; range in this case is from 0 to 10
random.random() #will generate a random float between 0.0 to 1.
// Math.random(); returns a random number
// Math.floor(Math.random() * 10); // Returns a random integer from 0 to 9:
Question: The following code simulates the feeding of 4 fish in an aquarium while the owner is on a 5-day trip:
numFish ← 4
foodPerDay ← 20
foodLeft ← 160
daysStarving ← 0
REPEAT 5 TIMES {
foodConsumed ← numFish * foodPerDay
foodLeft ← foodLeft - foodConsumed
IF (foodLeft < 0) {
daysStarving ← daysStarving + 1
}
}
- This simulation simplifies a real-world scenario into something that can be modeled in code and executed on a computer.
-
Summarize how the code works:
-
The code defines various variables (
numfish
,foodPerDay
,foodLeft
,daysStarving
) for the simulation. - The code then iterates 5 times (representing 5 days), where each day total food consumption is calculated as the product of the number of fish and the food each fish consumes per day. This consumption is then subtrated from the total remaining food left to update the amount of remaining food.
- Additionally, if the food remaining is found to be negative, then the totla number of days starving will then increment with each iteration, providing the total number of days where the fish starve.
import random
cards = ["Ace", "2", "3", "4", "5", "6", "7", "8", "9", "10", "Jack", "Queen", "King"]
suits = ["Diamonds", "Hearts", "Spades", "Clubs"]
print(random.choice(cards) + " of " + random.choice(suits))
import random
def coinflip(): #def function
randomflip = random.randint(0, 99) #picks any random vlaue from 0 to 100
if randomflip >= 25: # if th evalue is greater than or equal to 25, we have heads
print("Heads")
else:
#assigning 1 to be tails--> if 1 is chosen then it will print, "Tails"
print("Tails")
#Tossing the coin 5 times:
t1 = coinflip()
t2 = coinflip()
t3 = coinflip()
t4 = coinflip()
t5 = coinflip()
Your turn: Change the code to make it simulate the flipping of a weighted coin.
We can change the range from 0 to 99 to simulate percentages and various results
- Add a heads and tails images into your images directory with the correct names and run the code below
import random
# importing Image class from PIL package
from PIL import Image
# creating a object
im = Image.open(r"../images/head.png")
image = Image.open(r"../images/tails.png")
i=random.randint(0,1)
if i == 1:
print("heads")
display(im)
else:
print("tails")
display(image)
In order to display an image in python, we can use the PIL package we previously learned about.
import random
print("Spin the wheel!")
print("----------------------------------")
n = 300
blue = 0
red = 0
res = ""
count = 0
for i in range(n):
spin = random.randint(1,2)
if spin == 1: # head
blue = blue + 1
res = res + "🟦"
else: # tail
res = res + "🟥"
red = red + 1
count += 1
if count %10 == 0:
res = res + "\n"
print('Number of blue:', blue)
print('Number of red:', red)
print("Frequency:")
print(res)
Your turn: Add a visual to the simulation!
import random
totalPopulation = 50
growthFactor = 1.00005
dayCount = 0 #Every decade the population is reported
while totalPopulation < 1000000:
totalPopulation *= growthFactor
#Every year, population is reported
dayCount += 1
if dayCount == 3650:
dayCount = 0
print(totalPopulation)
Here we initialize the total population to be 50, then set the growth factor as 1.00005 (.005 percent change). It will print the population every 56th day until it reaches one million. It multiplies the current population by the growth factor in each iteration, and increments the day count. When the day count reaches 56, it prints the current population and resets the day count to 0.
Note! This simulation assumes that the growth factor remains constant as time progresses, which may not be a realistic assumption in real-world scenarios.
import matplotlib.pyplot as plt
# Define the initial population and growth rate
population = 100
growth_rate = 0.05
# Define the number of years to simulate
num_years = 50
# Create lists to store the population and year values
populations = [population]
years = [0]
# Simulate population growth for the specified number of years
for year in range(1, num_years+1):
# Calculate the new population size
new_population = population + (growth_rate * population)
# Update the population and year lists
populations.append(new_population)
years.append(year)
# Set the new population as the current population for the next iteration
population = new_population
# Plot the population growth over time
plt.plot(years, populations)
plt.xlabel('Year')
plt.ylabel('Population')
plt.title('Population Growth Simulation')
plt.show()
If we create quantative data, we can plot it using the Matplotlib library.
import random
beak = ["small-beak", "long-beak", "medium-beak"],
wing = ["small-wings", "large-wings", "medium-wings"],
height = ["short", "tall","medium"]
naturaldisaster = ["flood", "drought", "fire", "hurricane", "dustbowl"]
print("When a" , random.choice(naturaldisaster) , "hit", random.choice(height), "birds died")
How does this simulation have bias?
This simulation only selects traits from the heights of the birds, but does not take into accoount the other factors of the bird. Also, it is completely random
- Answer all questions and prompts in the notes (0.2)
- Create a simulation
- Create a simulation that uses iteration and some form of data collection (list, dictionary...) (0.4)
- try creating quantative data and using the Matplotlib library to display said data
- Comment and describe function of each parts
- How does your simulation help solve/mimic a real world problem?
- Is there any bias in your simulation? Meaning, are there any discrepancies between your program and the real event?
- Create a simulation that uses iteration and some form of data collection (list, dictionary...) (0.4)
- Answer these simulation questions (0.3)
- Bonus: take a real world event and make a pseudocode representation or pseudocode on a flowchart of how you would make a simulation for it (up to +0.1 bonus)
from random import randint
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
total_dice_rolls, snake_eyes = [0], [0]
def snakeEyesSimulation(num_of_iterations):
for i in range(num_of_iterations): # Here we generate a data
if randint(1,6) == randint(1,6): # If a snake eyes is found, add the new value to our array
snake_eyes.append(snake_eyes[i]+1)
else:
snake_eyes.append(snake_eyes[i]) # Otherwise we keep the current count
total_dice_rolls.append(total_dice_rolls[i]+1)
totalnp = np.array(total_dice_rolls) # Convert our lists into np arrays to perpform statistical analysis
snakenp = np.array(snake_eyes)
slope, intercept, rvalue, pvalue, stderr = stats.linregress(totalnp,snakenp) # Perform linear regression, gives us our data values
line = slope*totalnp+intercept # Get our best fit line
frequency = str(snake_eyes[-1]/num_of_iterations * 100)[:5] + "%" # Get our frequency
plt.style.use('fivethirtyeight') # Spice up our style
plt.plot(totalnp, line, 'r', label='y={:.2f}x+{:.2f}'.format(slope,intercept))
plt.plot(total_dice_rolls, snake_eyes)
plt.plot([], [], ' ', label=f"Frequnecy={frequency}") # Some lables to make it fancy
plt.plot([], [], ' ', label=f"r={rvalue}")
plt.plot([], [], ' ', label=f"p={pvalue}")
plt.plot([], [], ' ', label=f"standard deviation={stderr}")
plt.xlabel('# of dice rolls') # Axis lables
plt.ylabel('# of Snake eyes')
plt.title('Frequency of snake eyes versus total number of d6 dice rolls', fontdict={'fontsize': 15})
plt.ylim(-1000,num_of_iterations)
plt.xlim(-1000,num_of_iterations)
plt.scatter(totalnp,snakenp) # Graph the data points in a scatter plot
plt.legend(fontsize=12)
plt.show() # Display the graph
snakeEyesSimulation(10000)
The program that I created above could provide insight towards a popular probability game called snake eyes. The game essentially works by granting the user a win if they roll two dice rolls with the same number. My simulation makes use of the python random module to create randomized dice rolls, then counts the total number of dice roles and the number of total snake eyes that we have. This could solve the real-world problem of gambling, as it shows the small chances of winning the game of snake eyes with an efficient computer simulation
Despite being simple, this algorithm does contain a few inherant biases. This is because the random number generators in python are actually pseudorandom, meaning that they use pre-determined seed values to generate "random values". In reality, these values aren't random, and with any same seed, the program will generate the same result. Thus, this algorithm isn't completely random, but it still provides a good representation. Over the many times I've ran this program, I have consistently gotten a frequency of 16-17% with different seeds, showing how small the chances are to win a game of snake eyes.
Simulation questions:
A theme park wants to create a simulation to determine how long it should expect the wait time at its most popular ride. Which of the following characteristics for the virtual patrons would be most useful? Select two answers
- A. Ride preference—denotes whether a patron prefers roller coasters, other thrill rides, gentle rides, or no rides.
- B. Walking preference—denotes how far a patron is willing to walk in between rides.
- C. Food preference—denotes the type of food that a patron prefers to eat (e.g., chicken, burgers, salads).
- D. Ticket type—denotes whether the patron has a single-day pass, a multi-day pass, or an annual pass.
The correct answer for this would be A and B, as walking preference shows us how often people will go to a ride, which can give us statistics on how much people arrive at a particular ride, and overall ride perference also allows us find what rides are the most popular, allowing us to predict the waiting time
A programmer has created a program that models the growth of foxes and rabbits. Which of the following potential aspects of the simulation does NOT need to be implemented?
- A. A representation of grass that rabbits must eat frequently to survive.
- B. Each rabbit may only have a certain amount of children per litter.
- C. Each fox must eat a rabbit frequently to survive.
- D. Each rabbit can only live to a certain age, assuming that they are not eaten.
The correct answer is A, as the type of grass the rabbits consume is not of importance as options B-D, which all have direct impacts on rabbit and fox populations as they model the survivability and reproductivity of foxes and rabbits
The heavy use of chemicals called chlorofluorocarbons (CFCs) has caused damage to the Earth’s ozone layer, creating a noticeable hole over Antarctica. A scientist created a simulation of the hole in the layer using a computer, which models the growth of the hole over many years. Which of the following could be useful information that the simulation could produce?
- A. The approximate length of time until the hole would be refilled (due to various atmospheric processes)
- B. The exact size of the hole at any given point in time
- C. The exact length of time until the hole would be refilled (due to various atmospheric processes)
- D. The exact depth of the hole at any point in time
The correct answer is A, as we may never be sure of anything "exact" about a computer simulation, which renders options B through D as invalid options
Suppose that an environmentalist wanted to understand the spread of invasive species. What would be a benefit of doing this with a simulation, rather than in real life?
- A. The species used in the simulation could be designed to mimic many different species at once.
- B. The species created could be quickly tested in multiple environments to better understand how its spread is affected by environmental factors.
- C. The simulation could be run much more quickly than in real life.
- D. All of the above
The correct answer is D. This is because a simulation allows us direct control over our environment, and since it is simulated on the machine, we can reset this environment however many times we wish. Thus, this grants us efficiency, duplicity, and flexibility.
A program is being created to simulate the growth of a brain-based on randomly determined environmental factors. The developer plans to add a feature that lets the user quickly run several hundred simulations with any number of factors kept constant. Why would this be useful? Select two answers.
- A. It would allow the user to gather data without taxing the computer’s hardware.
- B. It would allow the user to see the effect of specific variables by ensuring that the others do not change.
- C. It would quickly provide the user with a large amount of data.
- D. It would make simulations more detailed.
The correct answer would be B and C. Since we can freely keep certain factors constant, we can use this to analyze individual factors and variables to ensure that we can observe trends for all effects. Additionally, option C is also right as we can run the simulation many times over in a matter of hours on a computer, instead of over many years with actual human participants or subjects.
Which of the following statements describes a limitation of using a computer simulation to model a real-world object or system?
- A. Computer simulations can only be built afer the real-world object or system has been created.
- B. Computer simulations only run on very powerful computers that are not available to the general public.
- C. Computer simulations usually make some simplifying assumptions about the real-world object or system being modeled.
- D. It is difficult to change input parameters or conditions when using computer simulations.
The correct answer would be C. Many phenomenons in real life and nature have many miniscule and lurking factors that all contribute to the final outcome. The sheer amount of details in nature is too complex and plentiful for modern computers to fully simulate. Thus, even the most detailed and powerful simulations today still make many assumptions about smaller factors in order to have better efficiency and usability. Additionally, many factors are unknown, making them hard to implement.
Pseudo-code for real-life event
For this code, I will be analyzing the traffic at a stop light point
Intersections ← 4
hoursToRun ← INPUT()
Jams ← 0
REPEAT hoursToRun TIMES {
Traffic1 ← RANDOM(100,200)
Traffic2 ← RANDOM(250,300)
Traffic3 ← RANDOM(50,75)
Traffic4 ← RANDOM(400,600)
AverageTraffic ← (Traffic1+Traffic2+Traffic3+Traffic4)//Intersections
IF (AverageTraffic > 350) {
Jams ← Jams + 1
}
}
DISPLAY("FREQUENCY OF TRAFFIC JAMS: ")
DISPLAY(Jams/hoursToRun)