Skip to main content

Section 2.1 Randomness and Simulation

We begin our study of randomness and probability by exploring the concept of randomness. What does it mean for a process to be random? Why does it matter? We'll attempt to answer these questions in the first few parts of this section.

Once we've identified sources of randomness, we will focus on how we can utilize them to simulate a process. We'll discuss the key steps in conducting a simulation and look at several examples. We will finish the section by reviewing several limitations of simulations.

Subsection 2.1.1 Random vs. Pseudo-random Numbers

We have already seen the terms “random sample” in this course, and we will see it a lot more as we continue. It is important, therefore, that we understand what randomness means.

Which of the following processes is random?

  1. a person thinks of a number between one and ten

  2. a student randomly fills in bubbles on a standardized test sheet

  3. a computer program randomly assigns the winning lottery numbers

Solution

Surprisingly, the answer is none of these are truly random processes.

To be truly random, a process must have no predictability—show no preference towards one or more outcomes. A person choosing a number is likely to have a “favorite” number or to be influenced by something he just saw or heard. A student filling in bubbles is likely to make a design, or even to “try to be random” and evenly spread out the bubbles, which is not in fact random. Even a computer program comes up with “random numbers” using a predictable algorithm. The computer program is an example of the following.

Definition 2.1.2.

A pseudo-random process is one that appears to be random, but which, when repeated with the same initial inputs, will always produce the same results.

Where then can we get a reliable source of random information? This can actually be a philosophical question. Is anything in the universe truly random, or is everything deterministic—meaning if we know the initial conditions, we can predict exactly what will happen. In this class, we assume that physical phenomena that we observe are, if not random, so complex that they might as well be random. We can gather random numbers then from sources such as:

  • the time between the decay of radioactive material, or

  • time between the observation of cosmic rays, or

  • wind gust speeds and direction.

None of these are terribly practical for us, so instead we use either a pseudo-random number generator on a computer, or a random number table which records digits based on processes similar to those mentioned above.

Definition 2.1.3.

A random number table is a list of digits recorded based on some random process. For example,

2217726304387410092537086270581997622725849795907032825001108963
3217535822643800292254644943760642389043766557204107354186024508
8906427308645681412198226653885873285801699027843110380420067664
8740522639824530519902027044464984322000946238678577902639002954
8887003319933147508331265192321413908608671496383528968974910533
4943760642389043766557204107354186024508432200094623867858226440

To use a random number table to help us generate a string of random numbers, we first “randomly” select a starting point in the table, and then use the digits that follow.

You wish to randomly pick a sample of 6 people from a group of 100 people. Use the random number table provided above to do this.

Solution

We will assign each person in our group of 100 a two-digit number from 00 to 99. This means we will take groups of two digits from the table above, skipping over any repeating numbers since we don't want to pick the same “person” twice.

In order to select our six pairs of digits, we must first pick a starting point. We'll do this by rolling a six-sided die (since there are six rows in the table). Let's say this comes up with the number 3. Then we will start at the beginning of the third row in the table. The first six pairs of two-digit numbers from that row are \(89, 06, 42, 73, 08,\) and \(65\text{.}\)

Each of these numbers represents one of our people, and there are no repeats. So the six people we will use in our sample are those assigned numbers 89, 6, 42, 73, 8, and 64.

Figure 2.1.5. Using Random Number Tables I
Figure 2.1.6. Using Random Number Tables II

You wish to collect a sample of 10 individuals from a population of 100. To do this, you assign numbers 0-99 to these individuals, and then use the random number table below, starting with the first entry and taking two digits at a time to select your sample.

52557 13440 30790 31858 28653 38267 09427 95946 09832 68174
93146 91673 22649 29722 35062 19040 67106 96350 82060 51489
16645 21177 60697 15577 24381 51084 70974 11304 37199 12631

Question: What are the numbers of the individuals will be included in your sample?

Answer

Person number 52, 55, 71, 34, 40, 30, 79, 3, 18, and 58.

You wish to collect a sample of 10 individuals from a population of 100. To do this, you assign numbers 0-99 to these individuals, and then use the random number table below, starting with the first entry in the second row, and taking two digits at a time to select your sample.

52557 13440 30790 31858 28653 38267 09427 95946 09832 68174
93146 91673 22649 29722 35062 19040 67106 96350 82060 51489
16645 21177 60697 15577 24381 51084 70974 11304 37199 12631

Question: Which of the following individuals will not be included in your sample?

  1. Person number 22

  2. Person number 35

  3. Person number 46

  4. Person number 69

Answer

Person number 46

You wish to collect a sample of 10 individuals from a population of 100. To do this, you assign numbers 0-99 to these individuals, and then use the random number table below, starting with the first entry in the third row and taking two digits at a time to select your sample.

52557 13440 30790 31858 28653 38267 09427 95946 09832 68174
93146 91673 22649 29722 35062 19040 67106 96350 82060 51489
16645 21177 60697 15577 24381 51084 70974 11304 37199 12631

Question: Which individuals are included in your sample?

Answer

Person number 16, 64, 52, 11, 77, 60, 69, 71, 55, and 24.

Subsection 2.1.2 Random Processes

In statistics, we use random processes in many different ways. For example, we may wish to use randomness to:

  • eliminate human biases -- such as in selecting individuals for a sample,

  • model the real world -- such as flipping a coin, rolling a die, or other more complicated random processes, or

  • predict how likely a given event is to happen based on the models above.

One of our goals in this lesson is to better understand how we can model the real world using random processes so that we can make predictions and better understand our samples. To model the real world, we must first understand the process we are trying to model.

Definition 2.1.10.

A random process is one in which, even if the initial conditions are known, the final result can not be predicted.

If a process is not random, then the outcome can be predicted or determined based on the initial conditions, or starting points of the process. Such a process is called deterministic.

Definition 2.1.11.

In a deterministic process the outcome of the process is completely determined by the initial conditions. That is, the final result will always be the same if the same starting point is used.

To see the difference between these two types of processes, consider the following examples.

An otherwise empty lake is stocked with 500 catfish, 750 bass, and 1000 trout. A fisherman decides to catch fish and throw them back until he has caught one of each type of fish. Is this a random or deterministic process?

Solution

This is a random process because the answer to the question will vary randomly. In order to model this, we would either need to either find such a pond and fish (probably impractical), come up with a way to count all of the possibilities (see Section 2.2) or simulate the process using random numbers.

A lamp is connected to five different light switches, each in a different room. The lamp turns on if an even number of the light switches are in the up position and off if an odd number are in the up position. We observe the state of the light (off or on). Is this a random process?

Solution

This is not a random process, it is deterministic. Based on the state of the light switches we can say for certain whether the lamp will be off or on.

Figure 2.1.14. Deterministic and Random Processes I
Figure 2.1.15. Deterministic and Random Processes II

The following describes three different processes.

  1. Shuffling a deck of 52 playing cards four times, and then drawing the top card.

  2. Pushing four specific buttons on your calculator, in a specified order, and then pressing the “=” key.

  3. Spinning the “wheel-of-fortune” from the TV game show.

Question: Identify each of these processes as random or deterministic.

Answer

drawing a card and spinning the wheel are random, pushing the calculator keys is deterministic

The following describes three different processes.

  1. Typing a specified list of commands into a computer program.

  2. Drawing 4 names from a hat containing 100 names and noting whose name is drawn.

  3. Drawing 4 names from a hat containing 4 names and noting whose name is drawn.

Question: Identify each of these processes as random or deterministic.

Answer

drawing names from the 100 name hat is random, the others are deterministic

The following describes three different processes.

  1. Typing your PIN number into the ATM after inserting your bank card.

  2. Rolling a pair of dice weighted so that the "1" always comes up and noting the sum of the numbers that appear.

  3. Rolling a pair of fair dice noting the sum of the numbers that appear.

Question: Identify each of these processes as random or deterministic.

Answer

using your ATM card and rolling the weighted dice are deterministic.

Subsection 2.1.3 Simulation

Our fishing example from the last page illustrates an important point. Many times it is not practical to replicate a particular random process. We don't have a lake in our back yard that we can stock with these fish to perform this exact experiment. Instead, we need to try to construct a model for the process which we can then use as a simulation.

Definition 2.1.19.

A simulation is a sequence of random outcomes that models a random process.

As we construct simulations for this and other examples, keep the following cautions in mind.

  • A simulation is a model of the real process and therefore is not perfect.

  • A simulation is meant to be easier to perform than the actual random process.

  • A simulation needs to correctly model the process, so we should be careful of any underlying assumptions.

In order to describe the simulations that we will create and carry out, it is important to have a well-understood vocabulary. Some of these terms have already been used, but we clarify their exact definitions below.

Definition 2.1.20.

The following terms are often used in describing simulations.

  • Component.

    A component is the most basic action in the process being simulated.

  • Outcome.

    The outcomes of a process are the possible results of a single component.

  • Trial.

    A trial is one series of components which completes the process.

  • Response Variable.

    The response variable is the final result for which we are looking.

To better understand these terms, let's apply them to the fishing example from earlier.

Identify the components, outcomes, trials, and response variables in the fishing experiment seen in Example 2.1.12.

Solution

These are as follows.

  • A component is catching a single fish. This is the most basic action in our process.

  • The outcomes are the various fish that could be caught - catfish, bass, and trout.

  • A trial is catching a series of fish until we get one of each type.

  • The response variable is the number of fish we had to catch in our trial.

Figure 2.1.22. Identifying Parts of a Random Process I
Figure 2.1.23. Identifying Parts of a Random Process II

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 individuals from upper management were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.

Question: What are the components of such a simulation?

Answer

The selection of one employee to receive a parking space.

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 individuals from upper management were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.

Question: What are the trials of such a simulation?

Answer

The selection of 23 employees to receive parking spaces.

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 individuals from upper management were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.

Question: What are the outcomes of such a simulation?

Answer

Picking any one of the 74 employees who participated in the lottery.

Subsection 2.1.4 Conducting Simulations

In order to actually carry out a simulation, we will combine the analysis of the random process we saw on the last page with the use of a randomness that we saw earlier in this lesson. The first thing we must do is identify any assumptions we make about the random process.

List any assumptions that must be made to simulate the fishing process seen in Example 2.1.12.

Solution

There are many assumptions that are made in simulating this example. Here are a few of the more important ones.

  • Each species of fish has the same chance of being caught--that is, no one fish is more likely to take a hook than another.

  • The number of fish does not change from those that we are given--no fish have died, been added, etc.

  • The fisherman is fishing in a spot that is visited equally by all three types of fish--he hasn't found a place where trout will not go, for example.

  • And others...

All of these assumptions are important as they will dictate how we design our model. Remember that the goal in simulating is to model the process in a simpler and more easily studied form. The steps we must follow to set up such a simulation are listed below.

We have already seen how to identify the components, outcomes, and trials. Let's finish our fishing example by actually conducting the simulation.

Simulate 10 trials of the fishing process seen in Example 2.1.12 using the following random number table.

2217726304387410092537086270581997622725849795907032825001108963
3217535822643800292254644943760642389043766557204107354186024508
8906427308645681412198226653885873285801699027843110380420067664
8740522639824530519902027044464984322000946238678577902639002954
8887003319933147508331265192321413908608671496383528968974910533
4943760642389043766557204107354186024508432200094623867858226440
Solution

We will follow the steps outlined above.

  1. We have already noted that the component in this process is catching a single fish.

  2. Our simulation must model the three types of fish that make up the outcomes. We need to do this in such a way that the fish types and the modeled outcomes are in the same proportion. Recall that there were 500 catfish, 750 bass, and 1000 trout, for a total of 2250 fish. This gives:

    • \(\frac{500}{2250} = \frac{2}{9}\) — we use digits 0, 1 from 0-9 to represent catfish

    • \(\frac{750}{2250} = \frac{3}{9}\) — we use digits 2, 3, 4 from 0-9 to represent bass

    • \(\frac{1000}{2250} = \frac{4}{9}\) — we use digits 5, 6, 7, and 8 for trout

    Note that the digit 9 will not be used.

  3. To simulate a trial, we will select a starting point and record digits (representing fish) until we have recorded one of each “type of fish.”

  4. The response variable will be the number of “fish” recorded (so number of digits).

  5. The results of 10 trials starting as indicated are shown below. The * symbol in the outcomes represents the unassigned digit “9.”

    Starting Point Outcomes Response Var.
    row 1, column 1 2217726304 \(\Rightarrow\) bbct 4 fish caught
    row 2, column 11 6438002922 \(\Rightarrow\) tbbtc 5 fish caught
    row 3, column 21 982266538858732 \(\Rightarrow\) *tbbtttbtttttbc 14 fish caught
    row 4, column 31 4984322000 \(\Rightarrow\) b*tbbbbc 7 fish caught
    row 5, column 41 6714963835 \(\Rightarrow\) ttcb 4 fish caught
    row 6, column 51 23867858226440 \(\Rightarrow\) bbttttttbbtbbc 14 fish caught
    row 1, column 21 3708627058 \(\Rightarrow\) btc 3 fish caught
    row 2, column 31 0642389043 \(\Rightarrow\) ctb 3 fish caught
    row 3, column 41 6990278431 \(\Rightarrow\) t**ccttb 6 fish caught
    row 4, column 51 7790263900 \(\Rightarrow\) tt*cb 4 fish caught
    Table 2.1.30. Ten Simulation Trials for Fishing Example
  6. The response variable had values 4, 5, 14, 7, 4, 14, 3, 3, 6, 4 giving a mean value of:

    \begin{equation*} \overline{x} = \frac{4+5+14+7+4+14+3+3+6+4}{10} = 6.4 \text{ fish}. \end{equation*}

    It appears that the number of fish we would need to catch before having one of each sort is around 6. Note, however, that the 14's may be outliers.

Figure 2.1.31. Conducting Simulations I
Figure 2.1.32. Conducting Simulations II

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 of the upper management team were "randomly" selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.

You decide to conduct the simulation using the following random number table, assigning numbers 00-16 to the upper management individuals and 17-73 to the remaining employees. Numbers 74-99 are skipped, as are repeated numbers.

52557 13440 30790 31858 28653 38267 09427 95946 09832 68174
93146 91673 22649 29722 35062 19040 67106 96350 82060 51489
16645 21177 60697 15577 24381 51084 70974 11304 37199 12631

Question: How many upper-management employees received a parking spot on your first trial if you start with the first entry of row one?

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 of the upper management team were "randomly" selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.

You decide to conduct the simulation using the following random number table, assigning numbers 00-16 to the upper management individuals and 17-73 to the remaining employees. Numbers 74-99 are skipped, as are repeated numbers.

52557 13440 30790 31858 28653 38267 09427 95946 09832 68174
93146 91673 22649 29722 35062 19040 67106 96350 82060 51489
16645 21177 60697 15577 24381 51084 70974 11304 37199 12631

Question: How many upper-management employees received a parking spot on your first trial if you start with the first entry of row two?

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 of the upper management team were "randomly" selected to receive free parking, the other employees complained that the lottery had been rigged. Use a simulation to determine if their claim has merit.

You decide to conduct the simulation using the following random number table, assigning numbers 00-16 to the upper management individuals and 17-73 to the remaining employees. Numbers 74-99 are skipped, as are repeated numbers.

52557 13440 30790 31858 28653 38267 09427 95946 09832 68174
93146 91673 22649 29722 35062 19040 67106 96350 82060 51489
16645 21177 60697 15577 24381 51084 70974 11304 37199 12631

Question: How many upper-management employees received a parking spot on your first trial if you start with the “38” that begins the sixth block in row 1?

Subsection 2.1.5 Cautions Regarding Simulation

In the simulation we conducted above, there seems to be a lot of variation. Consider a further analysis of this data in the following example.

Recall that the numbers of fish we had to catch in the ten simulated trials before having one of each kind were \(4, 5, 14, 7, 4, 14, 3, 3, 6, \text{ and } 4\text{.}\) Use the mean and standard deviation to further analyze this data.

Solution

We saw in Example 2.1.29 that the mean was \(\overline{x} = 6.4\) fish. Using a calculator or spreadsheet program, we find that the standard deviation in this sample is \(s = 4.2\) fish.

The standard deviation is almost as large as the mean, so there is a lot of variation in the number of fish needed to complete a trial. The 14's are not outliers, but they do have a z-score of \(z = \frac{14-6.4}{4.2} \approx 1.81\) which is relatively large.

If we decide that our standard deviation of 4.2 is too large, and we want a more “precise” mean, meaning we want to see better results from the simulation, what can we do? There are several ways to improve a simulation.

  • Ensure that the underlying assumptions are accurate.

  • Improve the randomness of the simulation.

  • Raise the number of trials—usually at least 20 should be conducted.

Our underlying assumptions seem to be reasonable, and we did use a random number table to generate randomness. However, we only did 10 trials. for a better simulation, we should do at least 20 trials. Running trials by hand can be very time consuming. Using a spreadsheet program we could get our results much more quickly.

Be very cautious when conducting and reporting on simulations. The following guidelines should be followed.

Figure 2.1.38. Simulation Cautions I
Figure 2.1.39. Simulation Cautions II

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 upper management employees were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. You decide to use a random number table to simulate the situation. You assign the management employees numbers 00-16, the other employees numbers 17-73, and skip numbers 74-99. You then simulate a 10 trials using a random number table and find that on average 15.2 of the 17 upper management employees were selected. You therefore conclude that the lottery must be fair.

Question: What, if any, mistake did you make?

Answer

Overstated your Case

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 upper management employees were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. You decide to use a random number table to simulate the situation. You assign the management employees numbers 00-16, the other employees numbers 17-73, and skip numbers 74-99. You then simulate 100 trials using a random number table and find that an average of 5.4 upper management employees were selected. You therefore conclude that it is unlikely this was a fair lottery.

Question: What, if any, mistake did you make?

Answer

Made No Mistakes--Your Conclusion is Sound

Seventy-four employees at a firm with limited free parking participate in a “parking lottery” to receive one of 23 free parking spaces. Seventeen of those whose names were in the lottery were upper management. When all 17 upper management employees were “randomly” selected to receive free parking, the other employees complained that the lottery had been rigged. You decide to use a random number table to simulate the situation. You assign the management the even digits (0,2,4,6,8) and the other employees the odd digits (1,3,5,7,9). You then simulate 100 trials using a random number table and find that 8.2 of the 17 upper management employees were selected. You therefore conclude that all 17 being selected is not very likely if the lottery is indeed fair.

Question: What, if any, mistake did you make?

Answer

Didn't Model Outcomes Correctly