**Materials**: If you have not already done so, please download the lesson materials for this bootcamp, unzip, then go to the folder `control-flow`

, and open (double click) on the file `control-flow.Rproj`

to open Rstudio.

Often when we're coding we want to control the flow of our actions. Control flow (or flow of control) is simply the order in which we code and have our statements evaluated. That can be done by setting things to happen only if a condition or a set of conditions are met. Alternatively, we can also set an action to be computed for a particular number of times.

There are many ways these can be achieved in R. For conditional statements, the most commonly used approaches are the constructs:

```
if (and else)
while
```

Say, for example, that we want R to print a message depending on how a number x relates to 5:

```
set.seed(10)
x <- rpois(1, lambda=8)
if (x >= 5) {
"x is greater than or equals 5"
} else {
"x is smaller than 5"
}
```

```
## [1] "x is greater or equal than 5"
```

In the above case, the function `rpois`

generates a random number following a Poisson distribution with a mean (i.e. lambda) of 8. The function `set.seed`

guarantees that all machines that use the seed 10 as an input, will generate the exact same 'pseudo-random' number (more about pseudo-random numbers). Now, looking at at x we see that it takes the value 8 (you should get the exact same number).

```
x
```

```
## [1] 8
```

**Important:** when R evaluates the condition inside `if`

statements, it is looking for a logical element, i.e., `TRUE`

or `FALSE`

. This can cause some headache for beginners. For example:

```
x <- 4 == 3
if (x) {
"4 equals 3"
}
```

As we can see, the message was not printed because the vector x is `FALSE`

```
x <- 4 == 3
x
```

```
## [1] FALSE
```

Also, be careful if you a vector with more than one logical elements, the function `if`

will still run, but will only evaluate the condition of first element

```
set.seed(1)
x <- round(rnorm(30, mean=15, sd=15))
y <- x %in% c(3, 5, 16)
if(y) {
stop('at least one number in x was found within the sequence c(3, 5, 16)')
}
```

You can see that R does not return any result but instead prints a warning message.

```
## Warning message:
## In if (y) { :
## the condition has length > 1 and only the first element will be used
```

The reason for that is because the first element in `y`

, `y[1]`

is `FALSE`

(i.e. `x[1]`

is 6 and therefore is not contained within `c(3, 5, 16)`

), so the action within the condition is not evaluated. Such cases where multiple comparisons/logical statements are contained within one single vector are not uncommon, and I suspect you are probably looking for the functions `any`

or `all`

before running an `if`

statement. `any`

will return TRUE if at least one TRUE value is found within a vector, otherwise will return `FALSE`

. The function `all`

, as the name suggests, will only return `TRUE`

if all values in the vector are `TRUE`

. We can build on the above example to make this clear - suppose that we want to make sure that we avoid the warning message, and, on top of that, print a more informative message:

```
if(any(y)) {
stop(paste0(paste0(x[y], collapse=' and '), ' found in x'))
}
```

Notice that now returns an Error message

```
## Error: 3 and 16 found in x
```

The reason for that is because the function `stop`

was evaluated. This function is widely used in what's known as *defensive programming* - if you anticipate a potential problem in your data manipulation/analysis within a function, the function can stop if this certain problem occurs and print an informative error message. The same idea works for the function `warning`

. We could have also created the different statements with the function `all`

instead of `any`

. Try that by yourself.

Using the gapminder dataset, create a function called `subsetAndEvaluateLifeExp`

that subsets the data to a given year and continent, and return the average population size among countries if their life expectancy are all above 40 years, otherwise return a message saying what countries had their life expectancy below 40 years. Evaluate this function for Asia in 1967, 1977, 1987, 1997 and 2007. *Hint:* use the function `subset`

within your function to subset the data - type `?subset`

on the console for examples.

In what years did the condition *life expectancy > 40* occur?

When you mention looping, many people immediately reach for `for`

. Perhaps
that's because they are already familiar with these other languages,
like basic, python, perl, C, C++ or MATLAB. While `for`

is definitely the most
flexible of the looping options, we suggest you avoid it wherever you can in favor of functions in plyr and the like, for the following two reasons:

- It is not very expressive, i.e. takes a lot of code to do what you want.
- It permits you to write horrible code that is hard to read and debug.

In fact, we'd encourage you to avoid `for`

unless the order of iteration is important. A key feature of all the examples in the `plyr`

section is that **order of iteration is not important**. This is crucial. If each iteration is independent, then you can cycle through them in whatever order you like.

When the order of iteration **is** important, we can use loops. The basic structure of a `for`

loop is

```
for(iterator in set of values){
do a thing
}
```

For example:

```
for(i in 1:10){
print(i)
}
```

The `1:10`

bit is basically just creating a vector on the fly; you can iterate over any other vector as well.

```
v <- c("this", "is", "the", "world's", "greatest", "for", "loop")
for(i in v){
print(i)
}
```

One of the most important issues with `for`

loops is execution time. Some operations can really slow down a for loop if they're done in every iteration. For instance, take the following code:

```
vec <- numeric()
system.time(
for(i in seq_len(50000-1)) {
calculation <- sqrt(i + (i+1)/10)
vec <- c(vec, calculation)
}
)
```

```
user system elapsed
5.510 0.588 6.098
```

That takes about five and a half seconds to run. It turns out that most of that time is taken up with the concatenation step. If we get rid of that step by allocating a vector of the correct size to start with and accessing elements of that vector directly, it runs much faster.

```
iter <- 50000
vec <- numeric(length=iter)
system.time(
for(i in seq_len(iter-1)) {
calculation <- sqrt(i + (i+1)/10)
vec[i] <- calculation
}
)
```

```
user system elapsed
0.091 0.002 0.092
```

That took about 1/6 as long to do the same thing! We can shave off even more time by removing the `calculation`

variable and allocating the results of each step directly to the vector with no intermediaries.

```
iter <- 50000
vec <- numeric(length=iter)
system.time(
for(i in seq_len(iter-1)) {
vec[i] <- sqrt(i + (i)/10)
}
)
```

```
user system elapsed
0.077 0.001 0.078
```

This version even compares favorably with the execution time we achieve using apply!

```
system.time(
sapply(seq_len(iter), function(x)sqrt(x + (x)/10))
)
```

```
user system elapsed
0.071 0.001 0.072
```

We can use a for loop within another for loop to iterate over two things at once (e.g., rows and columns of a matrix).

```
for(i in 1:5){
for(j in 1:5){
print(paste(i,j))
}
}
```

Note the variable scoping - `i`

is visible from within the `j`

loop. What happens when we try to print `j`

from outside the `j`

loop?

Sometimes you will find yourself needing to repeat an operation until a certain condition is met, rather than doing it for a specific number of times. In some cases you might be able to hack something together using a `for`

loop, but usually you'd be much better off using a `while`

loop instead. `While`

loops look and act a lot like `for`

loops, but instead of saying:

```
for(iterator in set of values){
do a thing
}
```

You instead say:

```
while(this condition is true){
do a thing
}
```

Let's try an example, shall we? We'll try to come up with some simple code that generates random numbers between 0 and 1 until it gets one that's less than 0.1.

```
while(z > 0.1){
z <- runif(1)
print(z)
}
```

But wait, that doesn't work! What's the problem?

The problem is that we haven't defined `z`

, and so the very first time the while loop's condition is checked (`z > 0.1`

), `while`

just says "Okay, that's not true so I'm not going to execute this block of code". The same thing would have happened if we defined `z`

to be anything less than 0.1. Let's fix it.

```
z <- 1
while(z > 0.1){
z <- runif(1)
print(z)
}
```

Let's say you want to know how many random numbers you need to draw until you get one that's below a given threshold (i.e., "how many random numbers before I get one less than 0.1?") Let's create a simulation of that procedure!

Create a function

`randbelow`

that takes a threshold and uses a`while`

loop to draw random (`runif`

) numbers until it gets a number below the threshold, counting each time it draws a number.Create a function that takes a threshold and the number of reps and uses a

`for`

loop to call`randbelow`

. You'll need to define an empty numeric vector, e.g.,`results <- numeric()`

and append results to it, e.g.,`results <- c(results, thisresult)`

each iteration.Finally, use the results vector from that function to draw and return a histogram.

Let's experiment! See what the distribution looks like if you try to draw numbers under 0.1 for 100 repetitions. What about 0.01? Keep adding zeros (0.001, 0.0001, 0.00001). What happens to the time it takes to execute your simulation? Why?

This demonstrates an important part of using loops in code - if you don't think clearly about what you're doing, they can take a long time to execute, or can even go on forever!

One place where `for`

loops shine is in writing simulations; if one iteration depends on the value of a previous iteration, then a `for`

loop is probably the best way of repeating things.

In an (unbiased) random walk, each time step we move left or right with probability 0.5. R has lots of random number generation functions. The `runif`

function generates random numbers uniformly on `[0,1]`

so we can draw random steps directions like this:

```
set.seed(1)
for (i in 1:10){
step <- 0
if(runif(1) < 0.5){
step <- 1
}
else{
step <- -1
}
print(step)
}
```

```
## [1] 1
## [1] 1
## [1] -1
## [1] -1
## [1] 1
## [1] -1
## [1] -1
## [1] -1
## [1] -1
## [1] 1
```

Or we could take the `sign`

of a standard normal:

```
set.seed(1)
for (i in 1:10){
print(sign(rnorm(n=1, mean=0)))
}
```

```
## [1] -1
## [1] 1
## [1] -1
## [1] 1
## [1] 1
## [1] -1
## [1] 1
## [1] 1
## [1] 1
## [1] -1
```

The implementation does not matter. Now let's wrap that in a function.

```
random.step <- function() {
sign(rnorm(n=1, mean=0))
}
```

We can then use this to step 20 steps:

```
set.seed(1)
x <- 0
for (i in 1:20){
x <- x + random.step()
}
x
```

```
## [1] 4
```

which will end up somewhere between -20 and 20, but with a mean of 0.

We want to track the entire trajectory:

```
set.seed(1)
nsteps <- 200
x <- numeric(nsteps + 1) # space to store things in
x[1] <- 0 # start at 0
for (i in seq_len(nsteps)) {
x[i+1] <- x[i] + random.step()
}
plot(x, type="l", xlab="Step number", ylab="Position")
```

We might want to run that lots of times, so put it into a function:

```
random.walk <- function(nsteps, x0=0) {
x <- numeric(nsteps + 1)
x[1] <- x0
for (i in seq_len(nsteps)) {
x[i+1] <- x[i] + random.step()
}
x
}
```

which is now much easier to use:

```
set.seed(1)
plot(random.walk(400), type="l", xlab="Step number", ylab="Position")
```

```
set.seed(1)
nsteps <- 200
nrep <- 40
cols <- rainbow(nrep)
plot(NA, xlim=c(1, nsteps+1), ylim=c(-30, 30), xlab="Step number", ylab="Position")
for (i in 1:nrep){
lines(random.walk(200), col=cols[i])
}
```

This material was developed by Daniel Falster and Rich FitzJohn and modified by Diego Barneche and Dan Warren. Based on material prepared by Karthik Ram and Hadley Wickam.