Section 2.1 Randomness and Simulation
¶We begin our study of randomness and probability by exploring the concept of randomness. What does it mean for a process to be random? Why does it matter? We'll attempt to answer these questions in the first few parts of this section.
Once we've identified sources of randomness, we will focus on how we can utilize them to simulate a process. We'll discuss the key steps in conducting a simulation and look at several examples. We will finish the section by reviewing several limitations of simulations.
Objectives
After finishing this section you should be able to
-
describe the following terms:
component
deterministic process
outcome
pseudo-random
random number table
random process
response variable
simulation
trial
-
accomplish the following tasks:
Identify random, pseudo-random, and non-random processes
Describe a simulation using appropriate vocabulary
Set-up a simulation using appropriate assumptions
Conduct a simulation using random number tables
Identify the limitations of a simulation
Subsection 2.1.1 Random vs. Pseudo-random Numbers
¶We have already seen the terms “random sample” in this course, and we will see it a lot more as we continue. It is important, therefore, that we understand what randomness means.
Example 2.1.1. Identifying Random Processes.
Which of the following processes is random?
a person thinks of a number between one and ten
a student randomly fills in bubbles on a standardized test sheet
a computer program randomly assigns the winning lottery numbers
Surprisingly, the answer is none of these are truly random processes.
To be truly random, a process must have no predictability—show no preference towards one or more outcomes. A person choosing a number is likely to have a “favorite” number or to be influenced by something he just saw or heard. A student filling in bubbles is likely to make a design, or even to “try to be random” and evenly spread out the bubbles, which is not in fact random. Even a computer program comes up with “random numbers” using a predictable algorithm. The computer program is an example of the following.
Definition 2.1.2.
A pseudo-random process is one that appears to be random, but which, when repeated with the same initial inputs, will always produce the same results.
Where then can we get a reliable source of random information? This can actually be a philosophical question. Is anything in the universe truly random, or is everything deterministic—meaning if we know the initial conditions, we can predict exactly what will happen. In this class, we assume that physical phenomena that we observe are, if not random, so complex that they might as well be random. We can gather random numbers then from sources such as:
the time between the decay of radioactive material, or
time between the observation of cosmic rays, or
wind gust speeds and direction.
None of these are terribly practical for us, so instead we use either a pseudo-random number generator on a computer, or a random number table which records digits based on processes similar to those mentioned above.
Definition 2.1.3.
A random number table is a list of digits recorded based on some random process. For example,
2217726304387410092537086270581997622725849795907032825001108963 3217535822643800292254644943760642389043766557204107354186024508 8906427308645681412198226653885873285801699027843110380420067664 8740522639824530519902027044464984322000946238678577902639002954 8887003319933147508331265192321413908608671496383528968974910533 4943760642389043766557204107354186024508432200094623867858226440
To use a random number table to help us generate a string of random numbers, we first “randomly” select a starting point in the table, and then use the digits that follow.
Example 2.1.4. Using a Random Number Table.
You wish to randomly pick a sample of 6 people from a group of 100 people. Use the random number table provided above to do this.
We will assign each person in our group of 100 a two-digit number from 00 to 99. This means we will take groups of two digits from the table above, skipping over any repeating numbers since we don't want to pick the same “person” twice.
In order to select our six pairs of digits, we must first pick a starting point. We'll do this by rolling a six-sided die (since there are six rows in the table). Let's say this comes up with the number 3. Then we will start at the beginning of the third row in the table. The first six pairs of two-digit numbers from that row are \(89, 06, 42, 73, 08,\) and \(65\text{.}\)
Each of these numbers represents one of our people, and there are no repeats. So the six people we will use in our sample are those assigned numbers 89, 6, 42, 73, 8, and 64.
Checkpoint 2.1.7.
You wish to collect a sample of 10 individuals from a population of 100. To do this, you assign numbers 0-99 to these individuals, and then use the random number table below, starting with the first entry and taking two digits at a time to select your sample.
52557 13440 30790 31858 28653 38267 09427 95946 09832 68174 93146 91673 22649 29722 35062 19040 67106 96350 82060 51489 16645 21177 60697 15577 24381 51084 70974 11304 37199 12631
Question: What are the numbers of the individuals will be included in your sample?
Person number 52, 55, 71, 34, 40, 30, 79, 3, 18, and 58.
Checkpoint 2.1.8.
You wish to collect a sample of 10 individuals from a population of 100. To do this, you assign numbers 0-99 to these individuals, and then use the random number table below, starting with the first entry in the second row, and taking two digits at a time to select your sample.
52557 13440 30790 31858 28653 38267 09427 95946 09832 68174 93146 91673 22649 29722 35062 19040 67106 96350 82060 51489 16645 21177 60697 15577 24381 51084 70974 11304 37199 12631
Question: Which of the following individuals will not be included in your sample?
Person number 22
Person number 35
Person number 46
Person number 69
Person number 46
Checkpoint 2.1.9.
You wish to collect a sample of 10 individuals from a population of 100. To do this, you assign numbers 0-99 to these individuals, and then use the random number table below, starting with the first entry in the third row and taking two digits at a time to select your sample.
52557 13440 30790 31858 28653 38267 09427 95946 09832 68174 93146 91673 22649 29722 35062 19040 67106 96350 82060 51489 16645 21177 60697 15577 24381 51084 70974 11304 37199 12631
Question: Which individuals are included in your sample?
Person number 16, 64, 52, 11, 77, 60, 69, 71, 55, and 24.
Subsection 2.1.2 Random Processes
¶In statistics, we use random processes in many different ways. For example, we may wish to use randomness to:
eliminate human biases -- such as in selecting individuals for a sample,
model the real world -- such as flipping a coin, rolling a die, or other more complicated random processes, or
predict how likely a given event is to happen based on the models above.
One of our goals in this lesson is to better understand how we can model the real world using random processes so that we can make predictions and better understand our samples. To model the real world, we must first understand the process we are trying to model.
Definition 2.1.10.
A random process is one in which, even if the initial conditions are known, the final result can not be predicted.
If a process is not random, then the outcome can be predicted or determined based on the initial conditions, or starting points of the process. Such a process is called deterministic.
Definition 2.1.11.
In a deterministic process the outcome of the process is completely determined by the initial conditions. That is, the final result will always be the same if the same starting point is used.
To see the difference between these two types of processes, consider the following examples.
Example 2.1.12. Identifying Processes I.
An otherwise empty lake is stocked with 500 catfish, 750 bass, and 1000 trout. A fisherman decides to catch fish and throw them back until he has caught one of each type of fish. Is this a random or deterministic process?
This is a random process because the answer to the question will vary randomly. In order to model this, we would either need to either find such a pond and fish (probably impractical), come up with a way to count all of the possibilities (see Section 2.2) or simulate the process using random numbers.
Example 2.1.13. Identifying Processes II.
A lamp is connected to five different light switches, each in a different room. The lamp turns on if an even number of the light switches are in the up position and off if an odd number are in the up position. We observe the state of the light (off or on). Is this a random process?
This is not a random process, it is deterministic. Based on the state of the light switches we can say for certain whether the lamp will be off or on.
Checkpoint 2.1.16.
The following describes three different processes.
Shuffling a deck of 52 playing cards four times, and then drawing the top card.
Pushing four specific buttons on your calculator, in a specified order, and then pressing the “=” key.
Spinning the “wheel-of-fortune” from the TV game show.
Question: Identify each of these processes as random or deterministic.
drawing a card and spinning the wheel are random, pushing the calculator keys is deterministic
Checkpoint 2.1.17.
The following describes three different processes.
Typing a specified list of commands into a computer program.
Drawing 4 names from a hat containing 100 names and noting whose name is drawn.
Drawing 4 names from a hat containing 4 names and noting whose name is drawn.
Question: Identify each of these processes as random or deterministic.
drawing names from the 100 name hat is random, the others are deterministic
Checkpoint 2.1.18.
The following describes three different processes.
Typing your PIN number into the ATM after inserting your bank card.
Rolling a pair of dice weighted so that the "1" always comes up and noting the sum of the numbers that appear.
Rolling a pair of fair dice noting the sum of the numbers that appear.
Question: Identify each of these processes as random or deterministic.
using your ATM card and rolling the weighted dice are deterministic.
Subsection 2.1.3 Simulation
¶Our fishing example from the last page illustrates an important point. Many times it is not practical to replicate a particular random process. We don't have a lake in our back yard that we can stock with these fish to perform this exact experiment. Instead, we need to try to construct a model for the process which we can then use as a simulation.
Definition 2.1.19.
A simulation is a sequence of random outcomes that models a random process.
As we construct simulations for this and other examples, keep the following cautions in mind.
A simulation is a model of the real process and therefore is not perfect.
A simulation is meant to be easier to perform than the actual random process.
A simulation needs to correctly model the process, so we should be careful of any underlying assumptions.
In order to describe the simulations that we will create and carry out, it is important to have a well-understood vocabulary. Some of these terms have already been used, but we clarify their exact definitions below.
Definition 2.1.20.
The following terms are often used in describing simulations.
-
Component.
A component is the most basic action in the process being simulated.
-
Outcome.
The outcomes of a process are the possible results of a single component.
-
Trial.
A trial is one series of components which completes the process.
-
Response Variable.
The response variable is the final result for which we are looking.
To better understand these terms, let's apply them to the fishing example from earlier.
Example 2.1.21. Identifying Parts of a Random Process.
Identify the components, outcomes, trials, and response variables in the fishing experiment seen in Example 2.1.12.
These are as follows.
A component is catching a single fish. This is the most basic action in our process.
The outcomes are the various fish that could be caught - catfish, bass, and trout.
A trial is catching a series of fish until we get one of each type.
The response variable is the number of fish we had to catch in our trial.
Checkpoint 2.1.24.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 individuals from upper management were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.
Question: What are the components of such a simulation?
The selection of one employee to receive a parking space.
Checkpoint 2.1.25.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 individuals from upper management were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.
Question: What are the trials of such a simulation?
The selection of 23 employees to receive parking spaces.
Checkpoint 2.1.26.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 individuals from upper management were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.
Question: What are the outcomes of such a simulation?
Picking any one of the 74 employees who participated in the lottery.
Subsection 2.1.4 Conducting Simulations
¶In order to actually carry out a simulation, we will combine the analysis of the random process we saw on the last page with the use of a randomness that we saw earlier in this lesson. The first thing we must do is identify any assumptions we make about the random process.
Example 2.1.27. Identifying Assumptions.
List any assumptions that must be made to simulate the fishing process seen in Example 2.1.12.
There are many assumptions that are made in simulating this example. Here are a few of the more important ones.
Each species of fish has the same chance of being caught--that is, no one fish is more likely to take a hook than another.
The number of fish does not change from those that we are given--no fish have died, been added, etc.
The fisherman is fishing in a spot that is visited equally by all three types of fish--he hasn't found a place where trout will not go, for example.
And others...
All of these assumptions are important as they will dictate how we design our model. Remember that the goal in simulating is to model the process in a simpler and more easily studied form. The steps we must follow to set up such a simulation are listed below.
Algorithm 2.1.28.
To conduct a simulation, follow these steps.
Identify the components.
Explain how the outcome of these components will be modeled.
Explain how we will simulate a trial.
Identify the response variable.
Run several trials using a source of randomness, such as a random number table.
Analyze the response variable.
We have already seen how to identify the components, outcomes, and trials. Let's finish our fishing example by actually conducting the simulation.
Example 2.1.29. Conducting a Simulation.
Simulate 10 trials of the fishing process seen in Example 2.1.12 using the following random number table.
2217726304387410092537086270581997622725849795907032825001108963 3217535822643800292254644943760642389043766557204107354186024508 8906427308645681412198226653885873285801699027843110380420067664 8740522639824530519902027044464984322000946238678577902639002954 8887003319933147508331265192321413908608671496383528968974910533 4943760642389043766557204107354186024508432200094623867858226440
We will follow the steps outlined above.
We have already noted that the component in this process is catching a single fish.
-
Our simulation must model the three types of fish that make up the outcomes. We need to do this in such a way that the fish types and the modeled outcomes are in the same proportion. Recall that there were 500 catfish, 750 bass, and 1000 trout, for a total of 2250 fish. This gives:
\(\frac{500}{2250} = \frac{2}{9}\) — we use digits 0, 1 from 0-9 to represent catfish
\(\frac{750}{2250} = \frac{3}{9}\) — we use digits 2, 3, 4 from 0-9 to represent bass
\(\frac{1000}{2250} = \frac{4}{9}\) — we use digits 5, 6, 7, and 8 for trout
Note that the digit 9 will not be used.
To simulate a trial, we will select a starting point and record digits (representing fish) until we have recorded one of each “type of fish.”
The response variable will be the number of “fish” recorded (so number of digits).
-
The results of 10 trials starting as indicated are shown below. The * symbol in the outcomes represents the unassigned digit “9.”
Starting Point Outcomes Response Var. row 1, column 1 2217726304 \(\Rightarrow\) bbct 4 fish caught row 2, column 11 6438002922 \(\Rightarrow\) tbbtc 5 fish caught row 3, column 21 982266538858732 \(\Rightarrow\) *tbbtttbtttttbc 14 fish caught row 4, column 31 4984322000 \(\Rightarrow\) b*tbbbbc 7 fish caught row 5, column 41 6714963835 \(\Rightarrow\) ttcb 4 fish caught row 6, column 51 23867858226440 \(\Rightarrow\) bbttttttbbtbbc 14 fish caught row 1, column 21 3708627058 \(\Rightarrow\) btc 3 fish caught row 2, column 31 0642389043 \(\Rightarrow\) ctb 3 fish caught row 3, column 41 6990278431 \(\Rightarrow\) t**ccttb 6 fish caught row 4, column 51 7790263900 \(\Rightarrow\) tt*cb 4 fish caught Table 2.1.30. Ten Simulation Trials for Fishing Example -
The response variable had values 4, 5, 14, 7, 4, 14, 3, 3, 6, 4 giving a mean value of:
\begin{equation*} \overline{x} = \frac{4+5+14+7+4+14+3+3+6+4}{10} = 6.4 \text{ fish}. \end{equation*}It appears that the number of fish we would need to catch before having one of each sort is around 6. Note, however, that the 14's may be outliers.
Checkpoint 2.1.33.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 of the upper management team were "randomly" selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.
You decide to conduct the simulation using the following random number table, assigning numbers 00-16 to the upper management individuals and 17-73 to the remaining employees. Numbers 74-99 are skipped, as are repeated numbers.
52557 13440 30790 31858 28653 38267 09427 95946 09832 68174 93146 91673 22649 29722 35062 19040 67106 96350 82060 51489 16645 21177 60697 15577 24381 51084 70974 11304 37199 12631
Question: How many upper-management employees received a parking spot on your first trial if you start with the first entry of row one?
Checkpoint 2.1.34.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 of the upper management team were "randomly" selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.
You decide to conduct the simulation using the following random number table, assigning numbers 00-16 to the upper management individuals and 17-73 to the remaining employees. Numbers 74-99 are skipped, as are repeated numbers.
52557 13440 30790 31858 28653 38267 09427 95946 09832 68174 93146 91673 22649 29722 35062 19040 67106 96350 82060 51489 16645 21177 60697 15577 24381 51084 70974 11304 37199 12631
Question: How many upper-management employees received a parking spot on your first trial if you start with the first entry of row two?
Checkpoint 2.1.35.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 of the upper management team were "randomly" selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.
You decide to conduct the simulation using the following random number table, assigning numbers 00-16 to the upper management individuals and 17-73 to the remaining employees. Numbers 74-99 are skipped, as are repeated numbers.
52557 13440 30790 31858 28653 38267 09427 95946 09832 68174 93146 91673 22649 29722 35062 19040 67106 96350 82060 51489 16645 21177 60697 15577 24381 51084 70974 11304 37199 12631
Question: How many upper-management employees received a parking spot on your first trial if you start with the “38” that begins the sixth block in row 1?
Subsection 2.1.5 Cautions Regarding Simulation
¶In the simulation we conducted above, there seems to be a lot of variation. Consider a further analysis of this data in the following example.
Example 2.1.36. Analyzing Simulation Results.
Recall that the numbers of fish we had to catch in the ten simulated trials before having one of each kind were \(4, 5, 14, 7, 4, 14, 3, 3, 6, \text{ and } 4\text{.}\) Use the mean and standard deviation to further analyze this data.
We saw in Example 2.1.29 that the mean was \(\overline{x} = 6.4\) fish. Using a calculator or spreadsheet program, we find that the standard deviation in this sample is \(s = 4.2\) fish.
The standard deviation is almost as large as the mean, so there is a lot of variation in the number of fish needed to complete a trial. The 14's are not outliers, but they do have a z-score of \(z = \frac{14-6.4}{4.2} \approx 1.81\) which is relatively large.
If we decide that our standard deviation of 4.2 is too large, and we want a more “precise” mean, meaning we want to see better results from the simulation, what can we do? There are several ways to improve a simulation.
Ensure that the underlying assumptions are accurate.
Improve the randomness of the simulation.
Raise the number of trials—usually at least 20 should be conducted.
Our underlying assumptions seem to be reasonable, and we did use a random number table to generate randomness. However, we only did 10 trials. for a better simulation, we should do at least 20 trials. Running trials by hand can be very time consuming. Using a spreadsheet program we could get our results much more quickly.
Be very cautious when conducting and reporting on simulations. The following guidelines should be followed.
Principle 2.1.37. Cautions Regarding Simulations.
-
Don't Overstate Your Case.
A simulation is in some sense always wrong. We didn't actually fish in a lake stocked as the example stated. Add to that the fact that a random process will always turn out differently each time we do it, and you need to be cautious about claiming that “we must catch 6.4 fish” in order to complete the process.
-
Model Outcomes Correctly.
We must keep outcomes in the correct proportion to our random digits. Drop digits as necessary to ensure that these proportions are correct. If you have more than 10 outcomes to model, you can use two digits at a time.
-
Run Enough Trials.
The purpose of a simulation is to make a long and possibly expensive process short and cheap. So use a large number of trials!
Checkpoint 2.1.40.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 upper management employees were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. You decide to use a random number table to simulate the situation. You assign the management employees numbers 00-16, the other employees numbers 17-73, and skip numbers 74-99. You then simulate a 10 trials using a random number table and find that on average 15.2 of the 17 upper management employees were selected. You therefore conclude that the lottery must be fair.
Question: What, if any, mistake did you make?
Overstated your Case
Checkpoint 2.1.41.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 upper management employees were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. You decide to use a random number table to simulate the situation. You assign the management employees numbers 00-16, the other employees numbers 17-73, and skip numbers 74-99. You then simulate 100 trials using a random number table and find that an average of 5.4 upper management employees were selected. You therefore conclude that it is unlikely this was a fair lottery.
Question: What, if any, mistake did you make?
Made No Mistakes--Your Conclusion is Sound
Checkpoint 2.1.42.
Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 upper management employees were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. You decide to use a random number table to simulate the situation. You assign the management the even digits (0,2,4,6,8) and the other employees the odd digits (1,3,5,7,9). You then simulate 100 trials using a random number table and find that 8.2 of the 17 upper management employees were selected. You therefore conclude that all 17 being selected is not very likely if the lottery is indeed fair.
Question: What, if any, mistake did you make?
Didn't Model Outcomes Correctly