1-10: For Loops

0.1 Changes…

  • Put find difference in vector values application here.

1 Purpose

  • Show how to perform if-else statements on multiple values in a vector using for loops

2 Questions about the material…

The files for this lesson:

 

If you have any questions about the material in this lesson, feel free to email them to the instructor, Charlie Belinsky, at belinsky@msu.edu.

3 Four structures in programming

As mentioned in the second lesson on variables, there are basically 4 main structures that cover almost every aspect of programming:

  1. Variables

  2. If-Else Statements

  3. For Loops

  4. Functions

 

In the next three lessons we cover for loops.  for loops allow you to write code once and execute the code on multiple values. 

4 Conditional statements on multiple values

In the last couple of lessons, we used conditional statements on values in a vector (e.g., asking about the weather conditions or the high temperatures on a specific day) and then output a message to the Console based on the results of the conditional statement.

 

But, when working with data, we rarely look at just one value – usually, we are looking at a vector of values (e.g., 365 high temperatures for a year) and we want to perform the same conditional statement on each of the values. In this lesson and the next lesson, we will learn to use conditional statements on multiple values – using for loops with embedded if-else statements.

5 Repeating commands

A for loop is used whenever you want to execute a codeblock multiple times.  The most basic example is repeating the exact same output a certain number of times:

cat("---------\nRepeating the same code multiple times:\n");
for(i in 1:5)  # repeat 5 times
{
  cat("Hello, World\n");
}
Figure 1: The most basic for loop – repeating an output command multiple times

And the Console tab will show 5 Hellos:

Repeating the same code multiple times:
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World

5.1 The codeblock attached to the for loop

Just like if-else statements, for loops have an attached codeblock, designated by curly brackets ( {  } ).  If-else statements execute the codeblock conditionally based on the conditional statement in the parentheses and for loops executes a codeblock multiple times.  The number of times for loops execute the attached codeblock is dependent upon the sequence inside the parentheses.  In Figure 1, the sequence is 1:5, and since the sequence has 5 values, the codeblock attached will execute 5 times. 

 

Note: if the sequence was 15:19, which has 5 values, the attached codeblock will execute 5 times.

6 Repeated code on values

Figure 1 repeats the exact same code 5 times. However, you rarely want to repeat the exact same code – usually, you want to repeat similar code.  For instance, you might want to say hello to 5 different people given by a vector:

helloVector = c("Ann", "Bob", "Charlie", "Dave", "Eve");

To do this we cycle through every value in the vector,  which is helloVector[1], helloVector[2], helloVector[3], helloVector[4], and helloVector[5].

 

To cycle through the values we set the index variable (often named i) in the for loop to the values 1 through 5 and then use i inside the codeblock for the for loop. i, in this example, indexes helloVector each time the for loop cycles.

cat("---------\nGoing through each value in a vector by changing the index value:\n");
for(i in 1:5)
{
  # i takes on the values 1 through 5 through the 5 cycles of the for loop
  cat("Hello," helloVector[i], "\n");
}
Figure 2: Cycling through values in a vector using their index number
Going through each value in a vector by changing the index value:
Hello, Ann 
Hello, Bob 
Hello, Charlie 
Hello, Dave 
Hello, Eve 

6.1 Cycling methods

There are two methods of cycling through a for loop: by value and by index. By value is more commonly taught but this class emphasizes by index. For a detailed explanation go to: Extension: Two methods of cycling

7 The index variable in a for loop

The true power of the for loops lies in the indexing variable, iNote: the indexing variable can be given any name – it’s just common to name it i.  The indexing variable cycles through the sequence each time the for loop executes. This means the indexing variable can be used to modify the code every time the codeblock is executed.

7.1 The index variable as a counter

Let’s look at the simplest of examples – using the indexing variable, i, as a counter:

cat("---------\nUsing the indexing variable as a counter:\n");
for(i in 1:5) # cycle 5 times
{
  cat("The count is:", i, "\n");
}

This outputs:

Using the indexing variable as a counter:
The count is: 1
The count is: 2
The count is: 3
The count is: 4
The count is: 5

The number of times the for loop cycles is the same as the number of values in the sequence. So, we can use a different sequence of 5 values…

cat("---------\nAny sequence of 5 number cycles the for loop 5 times:\n");
for(i in 15:19) # repeat 5 times
{
  cat("The count is:", i, "\n");
}

…and the for loop will still cycle 5 time.

Any sequence of 5 number cycles the for loop 5 times:
The count is: 15
The count is: 16
The count is: 17
The count is: 18
The count is: 19

7.2 Taking a step back: How a for loop cycles

A for loops always consists of 3 things:

  • An indexing variable:  for(i in 1:5) 

  • A sequence of values: for(i in 1:5)

  • A codeblock to execute: encapsulated by the { }

 

The sequence of values determines how many times the codeblock will cycle.  If there are 12 values in the sequence, then the for loop will cycle 12 times.

 

For each cycle, the indexing variable is set to a different value in the sequence.  The value of the indexing variable take will go in the same order as the sequence.

 

So, if the sequence is 12:8 (a backwards sequence with 5 values), then:

  • i will be set to 12 the first time the codeblock cycles

  • i will be set to 11 the first time the codeblock cycles

  • i will be set to 10 the first time the codeblock cycles

  • i will be set to 9 the first time the codeblock cycles

  • i will be set to 8 the first time the codeblock cycles

7.3 Changing the sequence

The for loop executes for every value in the sequence, and the index variable takes on the values in order.

cat("---------\n#Sequences can go backwards -- length is still 5:\n");
for(i in 12:8) # 12, 11, 10, 9 ,8 (there are five values in the sequence)
{
  cat("The count is:", i, "\n");
}
Figure 3: Using a sequence that does not start with 1 in a for loop

In this case, the for loop will still execute 5 times, but i will take on values 12 down to 8:

Sequences can go backwards -- length is still 5:
The count is: 12
The count is: 11
The count is: 10
The count is: 9
The count is: 8

7.4 Indexing variables and more complicated sequence

You do not have to name the index i, you can name it whatever you want.  For instance, we will be working with weather so day might be a more appropriate name for the indexing variable.

 

In this example we we use a more complicated sequence and change the name of the index variable to day:

cat("---------\nUsing a more complicated sequence with seven values:\n");
for(day in seq(from=20, to=2, by=-3))  # 20, 17, 14, 11, 8, 5, 2 (seven values)
{
  cat("The count is:", day, "\n");
}
Figure 4: Using a more complicated sequence in a for loop
Using a more complicated sequence with seven values:
The count is: 20
The count is: 17
The count is: 14
The count is: 11
The count is: 8
The count is: 5
The count is: 2

So, in Figure 4, the codeblock cycles 7 times because there are 7 values in the sequence (20, 17, 14, 11, 8, 5, and 2).  Each time the codeblock cycles,  day is set to a different value in the sequence (going from the first value in the sequence to the last).

8 Indexing values inside a for loop

Most examples so far use the index variable as a counter. The real power comes when you use the indexing variable to index values in a vector as we did in Figure 2. This is how we perform an if-else statement on every value in a vector.

 

Let’s use the data from the previous lessons:

### read in data from twoWeekWeatherData.csv
weatherData = read.csv(file="data/twoWeekWeatherData.csv",
                       sep=",",
                       header=TRUE);

### Extract the highTemps column from the data frame -- save it to a variable
highTemps = weatherData$highTemp;
noonCond = weatherData$noonCondition;

Since we want to go through every value in the vector, we need to know the size of the vector.  The easiest way to find the size of a vector is to use the function length():

numDays = length(highTemps);  # numDays is 14 (the number of values in highTemps)

Now let’s use a for loop to output all of the highTemp values.  To do this we use the indexing variable, day, to index the vector highTemp:

cat("---------\nThe 14 values in the vector highTemps in order:\n");
for(day in 1:numDays)  # numDays, in this case, is 14 (sequence is 1:14)
{
  cat(highTemp[day], "\n");  # day will take the values 1-14
}
Figure 5: Using the indexing variable as an index for highTemp
The 14 values in the vector highTemps in order:
57
50
54
40
39
58
60
53
55
44
39
54
61
75

In the following example, we use the indexing variable day as both a counter and an index for noonCond:

cat("---------\nUsing indexing variable as a counter and an index for noonCond:\n");
for(day in 1:numDays)
{
  cat(day, ") ", noonCond[day], "\n", sep="");
}
Figure 6: Using indexing variable as a counter and an index for noonCond
Using indexing variable as a counter and an index for noonCond:
1) Cloudy
2) Cloudy
3) Sunny
4) Rain
5) Fog
6) Sunny
7) Sunny
8) Cloudy
9) Rain
10) Rain
11) Snow
12) Sunny
13) Sunny
14) Sunny

8.1 Making code more extensible

In the code in Figure 5 and Figure 6,  we already know that the number of values in the vector is 14 so we could just write:

for(day in 1:14)
{
  cat(highTemp[day], "\n");
}

But this code will only work if the data frame does not change size.  In general, you want to make your code as generic as possible, which means using variables instead of fixed numbers. When you make your code generic, you do not have to change the code when your data changes.  In Figure 5, numDays will automatically adjust to the size of the vector.

9 if-else within a for loop

We usually have a lot of questions that we want to ask of the values we are looking at. In other words, we want to use if-else statements within the for loops.  Let’s first ask if each day was sunny or not and output the answer to the Console.

cat("---------\nUsing if-else statements within a for loop:\n");
for(i in 1:numDays)
{
  if(noonCond[i] == "Sunny")
  {
    cat("day ", i, " was sunny\n", sep="");
  }else
  {
    cat("day ", i, " was not sunny\n", sep="");
  }
}
Figure 7: Using if-else statements in a for loop
Using if-else statements within a for loop:
day 1 was not sunny
day 2 was not sunny
day 3 was sunny
day 4 was not sunny
day 5 was not sunny
day 6 was sunny
day 7 was sunny
day 8 was not sunny
day 9 was not sunny
day 10 was not sunny
day 11 was not sunny
day 12 was sunny
day 13 was sunny
day 14 was sunny

10 Checking a subset of values

We can use any vector of values to cycle a for loop. In this case, the vector has 3 values so the for loop will execute 3 times with i cycling through the values 8, 2, and 13:

cat("---------\nUsing a vector to subset the values:\n");
for(i in c(8,2,13))
{
  if(noonCond[i] == "Sunny")
  {
    cat("day ", i, " was sunny\n", sep="");
  }else
  {
    cat("day ", i, " was not sunny\n", sep="");
  }
}
Figure 8: Using the sequence to subset the values
Using a vector to subset the values:
day 8 was not sunny
day 2 was not sunny
day 13 was sunny

11 State variables and for loops

The code in Figure 7 is displaying specific information about each value in a vector.  This would get quickly out of hand if there are thousands of values!  We usually want more general information about a vector – for instance, we might want to know how many days were sunny?

 

Finding more general information about the values in a vector requires the introduction of the last component of most for loops: the state variable.  The state variable is a variable that maintains a running tab of some value through each cycle of the for loop.

 

Let’s count the number of sunny days using a for loop.  To do this we need to:

  • create a state variable that holds the number of sunny days (this will start at 0)

  • have a for loop cycle through every value in the noonCond vector

  • include an if() statement asking if the day is sunny

  • have a command to increase the state variable by 1 if it is sunny

 

Note: The state variable must be declared outside of the for loop.  Think about why this is true – it is the first question in the Application.

 

sunnyDays = 0; # state variable -- will hold the count of cloudy days
for(i in 1:numDays)
{
  if(noonCond[i] == "Sunny")  # was the day sunny
  {
    sunnyDays = sunnyDays +1;   # it was -- increase sunnyDays by 1
  }
  # there is no else here -- we don't care about non-sunny days
}
Figure 9: Using a state variable to calculate how many sunny days there was in noonCond

And we can look in the Environment tab to see what the final value of sunnyDays was:

sunnyDays:   6

And if we look at the noonCond vector, we see there are 6 sunny days:

> noonCond
 [1] "Cloudy" "Cloudy" "Sunny"  "Rain"   "Fog"    "Sunny"  "Sunny"  "Cloudy"
 [9] "Rain"   "Rain"   "Snow"   "Sunny"  "Sunny"  "Sunny"

11.1 The state variable

sunnyDays does a lot of heavy lifting in Figure 9.  We initially set sunnyDays to 0 because 0 is the default value, or the value if the condition (noonCond[i] == "Sunny") fails every time.

 

Whenever the value in noonCond is “Sunny” the following code is executed:

   sunnyDays = sunnyDays +1;

This code works because the right side of the equation is always evaluated first.

 

So, R will first calculate: sunnyDays +1

  • If sunnyDays is 0, then sunnyDays +1 evaluates to 1

 

Then R assigns the value on the right-side to the variable on the left, which is sunnyDays

  • sunnyDays is assigned the value of 1.

 

If sunnyDays is 4, then sunnyDays +1 evaluates to 5, and sunnyDays is assigned the value of 5.

12 Application:

1) In Comments at the top of your script:

  • Why must the state variable be declared outside the for loop as opposed to inside the for loop?  Think about what happens to the state variable every time the for loop cycles if it is declared inside the for loop.

  • If you are using a for loop to add up all values in a vector then what value should your state variable start with? Why?

  • How many times will this for loop cycle? for(i in seq(from=100, to=-100, by=-5)

 

2) Create one for loop that outputs the (a) square (2nd power), cube (3rd power), and cube root (1/3rd power) for the numbers 1 through 10.

 

3) Using a for loop, find out how many days had high temperatures less than 50.

  • set up your for loop so that it could work on a vector of any size (not just 14)

 

4) Using a for loop, find out how many even days were cloudy

 

5) Using a for loop, find out how many of the last 8 days were cloudy.

 

6) Challenge: Using for loops, find the total precipitation (i.e., add up all the precipitation values).  Hint: you keep adding each new precipitation value to the total precipitation.

 

 

Save the script as app1-10.r in your scripts folder and  email your Project Folder to Charlie Belinsky at belinsky@msu.edu.

 

Instructions for zipping the Project Folder are here.

 

If you have any questions regarding this application, feel free to email them to Charlie Belinsky at belinsky@msu.edu.

12.1 Questions to answer

Answer the following in comments inside your application script:

  1. What was your level of comfort with the lesson/application?

  2. What areas of the lesson/application confused or still confuses you?

  3. What are some things you would like to know more about that is related to, but not covered in, this lesson?

13 Extension: Two methods of cycling

In R, there are two common methods for cycling through a for loop: looping by value and looping by index.

 

By far, the most commonly taught method is looping by value:

temps = c(20, 30, 40, 50);

for (val in temps) # loops through values 20, 30, 40, and 50
{
  if (val < 30) 
  {
    cat("brr, that's cold\n");
  }
}

The other method is looping by index:

temps = c(20, 30, 40, 50);

for (i in 1:length(temps))  # loops through indexes 1, 2, 3, and 4
{
  if (temps[i] < 30) 
  {
    cat("brr, that's cold\n");
  }
}

At first glance, looping by value seems easier and more natural. However, looping by index is much more robust for data analysis because it allows easy access to other values that are stored in the same row. For example, when working with a data frame, you may want to look up the wind speed or humidity that corresponds to a particular temperature measurement.

 

Looping by value hides the row position, which makes it much harder to find related values across columns or to update the data frame. As tasks become more complex, this often leads to confusion and bugs.

 

For these reasons, I recommend that students new to programming use looping by index exclusively. Looping by index works in all situations, reinforces a clear mental model of how R stores and accesses data, and scales naturally to more complex problems.

 

I also discourage switching between looping by value and looping by index, as this is a common source of errors for beginners.