1-10: For Loops
0.1 Changes…
- Put find difference in vector values application here.
1 Purpose
- Show how to perform if-else statements on multiple values in a vector using for loops
2 Questions about the material…
The files for this lesson:
Script: you can download the script here
If you have any questions about the material in this lesson, feel free to email them to the instructor, Charlie Belinsky, at belinsky@msu.edu.
3 Four structures in programming
As mentioned in the second lesson on variables, there are basically 4 main structures that cover almost every aspect of programming:
Variables
If-Else Statements
For Loops
Functions
In the next three lessons we cover for loops. for loops allow you to write code once and execute the code on multiple values.
4 Conditional statements on multiple values
In the last couple of lessons, we used conditional statements on values in a vector (e.g., asking about the weather conditions or the high temperatures on a specific day) and then output a message to the Console based on the results of the conditional statement.
But, when working with data, we rarely look at just one value – usually, we are looking at a vector of values (e.g., 365 high temperatures for a year) and we want to perform the same conditional statement on each of the values. In this lesson and the next lesson, we will learn to use conditional statements on multiple values – using for loops with embedded if-else statements.
5 Repeating commands
A for loop is used whenever you want to execute a codeblock multiple times. The most basic example is repeating the exact same output a certain number of times:
cat("---------\nRepeating the same code multiple times:\n");
for(i in 1:5) # repeat 5 times
{
cat("Hello, World\n");
}And the Console tab will show 5 Hellos:
Repeating the same code multiple times:
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World5.1 The codeblock attached to the for loop
Just like if-else statements, for loops have an attached codeblock, designated by curly brackets ( { } ). If-else statements execute the codeblock conditionally based on the conditional statement in the parentheses and for loops executes a codeblock multiple times. The number of times for loops execute the attached codeblock is dependent upon the sequence inside the parentheses. In Figure 1, the sequence is 1:5, and since the sequence has 5 values, the codeblock attached will execute 5 times.
Note: if the sequence was 15:19, which has 5 values, the attached codeblock will execute 5 times.
6 Repeated code on values
Figure 1 repeats the exact same code 5 times. However, you rarely want to repeat the exact same code – usually, you want to repeat similar code. For instance, you might want to say hello to 5 different people given by a vector:
helloVector = c("Ann", "Bob", "Charlie", "Dave", "Eve");To do this we cycle through every value in the vector, which is helloVector[1], helloVector[2], helloVector[3], helloVector[4], and helloVector[5].
To cycle through the values we set the index variable (often named i) in the for loop to the values 1 through 5 and then use i inside the codeblock for the for loop. i, in this example, indexes helloVector each time the for loop cycles.
cat("---------\nGoing through each value in a vector by changing the index value:\n");
for(i in 1:5)
{
# i takes on the values 1 through 5 through the 5 cycles of the for loop
cat("Hello," helloVector[i], "\n");
}Going through each value in a vector by changing the index value:
Hello, Ann
Hello, Bob
Hello, Charlie
Hello, Dave
Hello, Eve 6.1 Cycling methods
There are two methods of cycling through a for loop: by value and by index. By value is more commonly taught but this class emphasizes by index. For a detailed explanation go to: Extension: Two methods of cycling
7 The index variable in a for loop
The true power of the for loops lies in the indexing variable, i. Note: the indexing variable can be given any name – it’s just common to name it i. The indexing variable cycles through the sequence each time the for loop executes. This means the indexing variable can be used to modify the code every time the codeblock is executed.
7.1 The index variable as a counter
Let’s look at the simplest of examples – using the indexing variable, i, as a counter:
cat("---------\nUsing the indexing variable as a counter:\n");
for(i in 1:5) # cycle 5 times
{
cat("The count is:", i, "\n");
}This outputs:
Using the indexing variable as a counter:
The count is: 1
The count is: 2
The count is: 3
The count is: 4
The count is: 5The number of times the for loop cycles is the same as the number of values in the sequence. So, we can use a different sequence of 5 values…
cat("---------\nAny sequence of 5 number cycles the for loop 5 times:\n");
for(i in 15:19) # repeat 5 times
{
cat("The count is:", i, "\n");
}…and the for loop will still cycle 5 time.
Any sequence of 5 number cycles the for loop 5 times:
The count is: 15
The count is: 16
The count is: 17
The count is: 18
The count is: 197.2 Taking a step back: How a for loop cycles
A for loops always consists of 3 things:
An indexing variable: for(i in 1:5)
A sequence of values: for(i in 1:5)
A codeblock to execute: encapsulated by the { }
The sequence of values determines how many times the codeblock will cycle. If there are 12 values in the sequence, then the for loop will cycle 12 times.
For each cycle, the indexing variable is set to a different value in the sequence. The value of the indexing variable take will go in the same order as the sequence.
So, if the sequence is 12:8 (a backwards sequence with 5 values), then:
i will be set to 12 the first time the codeblock cycles
i will be set to 11 the first time the codeblock cycles
i will be set to 10 the first time the codeblock cycles
i will be set to 9 the first time the codeblock cycles
i will be set to 8 the first time the codeblock cycles
7.3 Changing the sequence
The for loop executes for every value in the sequence, and the index variable takes on the values in order.
cat("---------\n#Sequences can go backwards -- length is still 5:\n");
for(i in 12:8) # 12, 11, 10, 9 ,8 (there are five values in the sequence)
{
cat("The count is:", i, "\n");
}In this case, the for loop will still execute 5 times, but i will take on values 12 down to 8:
Sequences can go backwards -- length is still 5:
The count is: 12
The count is: 11
The count is: 10
The count is: 9
The count is: 87.4 Indexing variables and more complicated sequence
You do not have to name the index i, you can name it whatever you want. For instance, we will be working with weather so day might be a more appropriate name for the indexing variable.
In this example we we use a more complicated sequence and change the name of the index variable to day:
cat("---------\nUsing a more complicated sequence with seven values:\n");
for(day in seq(from=20, to=2, by=-3)) # 20, 17, 14, 11, 8, 5, 2 (seven values)
{
cat("The count is:", day, "\n");
}Using a more complicated sequence with seven values:
The count is: 20
The count is: 17
The count is: 14
The count is: 11
The count is: 8
The count is: 5
The count is: 2So, in Figure 4, the codeblock cycles 7 times because there are 7 values in the sequence (20, 17, 14, 11, 8, 5, and 2). Each time the codeblock cycles, day is set to a different value in the sequence (going from the first value in the sequence to the last).
8 Indexing values inside a for loop
Most examples so far use the index variable as a counter. The real power comes when you use the indexing variable to index values in a vector as we did in Figure 2. This is how we perform an if-else statement on every value in a vector.
Let’s use the data from the previous lessons:
### read in data from twoWeekWeatherData.csv
weatherData = read.csv(file="data/twoWeekWeatherData.csv",
sep=",",
header=TRUE);
### Extract the highTemps column from the data frame -- save it to a variable
highTemps = weatherData$highTemp;
noonCond = weatherData$noonCondition;Since we want to go through every value in the vector, we need to know the size of the vector. The easiest way to find the size of a vector is to use the function length():
numDays = length(highTemps); # numDays is 14 (the number of values in highTemps)Now let’s use a for loop to output all of the highTemp values. To do this we use the indexing variable, day, to index the vector highTemp:
cat("---------\nThe 14 values in the vector highTemps in order:\n");
for(day in 1:numDays) # numDays, in this case, is 14 (sequence is 1:14)
{
cat(highTemp[day], "\n"); # day will take the values 1-14
}The 14 values in the vector highTemps in order:
57
50
54
40
39
58
60
53
55
44
39
54
61
75In the following example, we use the indexing variable day as both a counter and an index for noonCond:
cat("---------\nUsing indexing variable as a counter and an index for noonCond:\n");
for(day in 1:numDays)
{
cat(day, ") ", noonCond[day], "\n", sep="");
}Using indexing variable as a counter and an index for noonCond:
1) Cloudy
2) Cloudy
3) Sunny
4) Rain
5) Fog
6) Sunny
7) Sunny
8) Cloudy
9) Rain
10) Rain
11) Snow
12) Sunny
13) Sunny
14) Sunny8.1 Making code more extensible
In the code in Figure 5 and Figure 6, we already know that the number of values in the vector is 14 so we could just write:
for(day in 1:14)
{
cat(highTemp[day], "\n");
}But this code will only work if the data frame does not change size. In general, you want to make your code as generic as possible, which means using variables instead of fixed numbers. When you make your code generic, you do not have to change the code when your data changes. In Figure 5, numDays will automatically adjust to the size of the vector.
9 if-else within a for loop
We usually have a lot of questions that we want to ask of the values we are looking at. In other words, we want to use if-else statements within the for loops. Let’s first ask if each day was sunny or not and output the answer to the Console.
cat("---------\nUsing if-else statements within a for loop:\n");
for(i in 1:numDays)
{
if(noonCond[i] == "Sunny")
{
cat("day ", i, " was sunny\n", sep="");
}else
{
cat("day ", i, " was not sunny\n", sep="");
}
}Using if-else statements within a for loop:
day 1 was not sunny
day 2 was not sunny
day 3 was sunny
day 4 was not sunny
day 5 was not sunny
day 6 was sunny
day 7 was sunny
day 8 was not sunny
day 9 was not sunny
day 10 was not sunny
day 11 was not sunny
day 12 was sunny
day 13 was sunny
day 14 was sunny10 Checking a subset of values
We can use any vector of values to cycle a for loop. In this case, the vector has 3 values so the for loop will execute 3 times with i cycling through the values 8, 2, and 13:
cat("---------\nUsing a vector to subset the values:\n");
for(i in c(8,2,13))
{
if(noonCond[i] == "Sunny")
{
cat("day ", i, " was sunny\n", sep="");
}else
{
cat("day ", i, " was not sunny\n", sep="");
}
}Using a vector to subset the values:
day 8 was not sunny
day 2 was not sunny
day 13 was sunny11 State variables and for loops
The code in Figure 7 is displaying specific information about each value in a vector. This would get quickly out of hand if there are thousands of values! We usually want more general information about a vector – for instance, we might want to know how many days were sunny?
Finding more general information about the values in a vector requires the introduction of the last component of most for loops: the state variable. The state variable is a variable that maintains a running tab of some value through each cycle of the for loop.
Let’s count the number of sunny days using a for loop. To do this we need to:
create a state variable that holds the number of sunny days (this will start at 0)
have a for loop cycle through every value in the noonCond vector
include an if() statement asking if the day is sunny
have a command to increase the state variable by 1 if it is sunny
Note: The state variable must be declared outside of the for loop. Think about why this is true – it is the first question in the Application.
sunnyDays = 0; # state variable -- will hold the count of cloudy days
for(i in 1:numDays)
{
if(noonCond[i] == "Sunny") # was the day sunny
{
sunnyDays = sunnyDays +1; # it was -- increase sunnyDays by 1
}
# there is no else here -- we don't care about non-sunny days
}And we can look in the Environment tab to see what the final value of sunnyDays was:
sunnyDays: 6And if we look at the noonCond vector, we see there are 6 sunny days:
> noonCond
[1] "Cloudy" "Cloudy" "Sunny" "Rain" "Fog" "Sunny" "Sunny" "Cloudy"
[9] "Rain" "Rain" "Snow" "Sunny" "Sunny" "Sunny"11.1 The state variable
sunnyDays does a lot of heavy lifting in Figure 9. We initially set sunnyDays to 0 because 0 is the default value, or the value if the condition (noonCond[i] == "Sunny") fails every time.
Whenever the value in noonCond is “Sunny” the following code is executed:
sunnyDays = sunnyDays +1;This code works because the right side of the equation is always evaluated first.
So, R will first calculate: sunnyDays +1
- If sunnyDays is 0, then sunnyDays +1 evaluates to 1
Then R assigns the value on the right-side to the variable on the left, which is sunnyDays.
- sunnyDays is assigned the value of 1.
If sunnyDays is 4, then sunnyDays +1 evaluates to 5, and sunnyDays is assigned the value of 5.
12 Application:
1) In Comments at the top of your script:
Why must the state variable be declared outside the for loop as opposed to inside the for loop? Think about what happens to the state variable every time the for loop cycles if it is declared inside the for loop.
If you are using a for loop to add up all values in a vector then what value should your state variable start with? Why?
How many times will this for loop cycle?
for(i in seq(from=100, to=-100, by=-5)
2) Create one for loop that outputs the (a) square (2nd power), cube (3rd power), and cube root (1/3rd power) for the numbers 1 through 10.
3) Using a for loop, find out how many days had high temperatures less than 50.
- set up your for loop so that it could work on a vector of any size (not just 14)
4) Using a for loop, find out how many even days were cloudy
5) Using a for loop, find out how many of the last 8 days were cloudy.
6) Challenge: Using for loops, find the total precipitation (i.e., add up all the precipitation values). Hint: you keep adding each new precipitation value to the total precipitation.
Save the script as app1-10.r in your scripts folder and email your Project Folder to Charlie Belinsky at belinsky@msu.edu.
Instructions for zipping the Project Folder are here.
If you have any questions regarding this application, feel free to email them to Charlie Belinsky at belinsky@msu.edu.
12.1 Questions to answer
Answer the following in comments inside your application script:
What was your level of comfort with the lesson/application?
What areas of the lesson/application confused or still confuses you?
What are some things you would like to know more about that is related to, but not covered in, this lesson?
13 Extension: Two methods of cycling
In R, there are two common methods for cycling through a for loop: looping by value and looping by index.
By far, the most commonly taught method is looping by value:
temps = c(20, 30, 40, 50);
for (val in temps) # loops through values 20, 30, 40, and 50
{
if (val < 30)
{
cat("brr, that's cold\n");
}
}The other method is looping by index:
temps = c(20, 30, 40, 50);
for (i in 1:length(temps)) # loops through indexes 1, 2, 3, and 4
{
if (temps[i] < 30)
{
cat("brr, that's cold\n");
}
}At first glance, looping by value seems easier and more natural. However, looping by index is much more robust for data analysis because it allows easy access to other values that are stored in the same row. For example, when working with a data frame, you may want to look up the wind speed or humidity that corresponds to a particular temperature measurement.
Looping by value hides the row position, which makes it much harder to find related values across columns or to update the data frame. As tasks become more complex, this often leads to confusion and bugs.
For these reasons, I recommend that students new to programming use looping by index exclusively. Looping by index works in all situations, reinforces a clear mental model of how R stores and accesses data, and scales naturally to more complex problems.
I also discourage switching between looping by value and looping by index, as this is a common source of errors for beginners.