1-11: For Loops and State Variables
0.1 Issues
1 Purpose
- Show more example of using for loops with state variables
2 Questions about the material…
The files for this lesson:
Script: you can download the script here
If you have any questions about the material in this lesson, feel free to email them to the instructor, Charlie Belinsky, at belinsky@msu.edu.
3 For loops – a vital skill
In this lesson, we will be using for loops and if() statements to do common calculations on a vector (e.g., finding the sum, finding maximum value, checking if any value meets a condition). There are functions for all of these calculations in R that are easier to use than a for loop. Hoever, the best way to learn for loops is to use them in applications that are easy to understand. You are going to have many situations where you cannot rely on an R function or package – and for loops with embedded if-else statements are the best solution.
Understanding how to effectively use for loops is perhaps the most vital skill you will learn as a programmer.
4 Multiple state variables
We ended last lesson by introducing you to the concept of a state variable – a state variable is a variable that updates information (e.g., the state of some value) as the for loops cycles. The state variable is always declared outside of the for loop and modified within the for loop.
Note: if the state variable was declared inside the for loop then it would be re-initialized every time the for loop cycles.
The state variable needs to be initialized to some default value – or the value the state variable should be if the for loop has no cycles (an empty vector). In the last example from last lesson, the state variable sunnyDays was initialized to 0, which is the correct value if the condition inside the for loop is FALSE every cycle..
= 0; # state variable -- will hold the count of sunny days
sunnyDays
for(i in 1:numDays)
{if(noonCond[i] == "Sunny") # was the day sunny
{= sunnyDays +1; # it was -- increase sunnyDays by 1
sunnyDays
}# there is no else here -- we don't care about non-sunny days
}
Now, let’s expand this for loop to count for two different conditions: Sunny and Rain. We will now need two state variables – initializing both values to 0.
= 0; # state variable -- will hold the count of sunny days
sunnyDays = 0; # state variable -- will hold the count of rainy days
rainyDays
for(i in 1:numDays)
{if(noonCond[i] == "Sunny")
{= sunnyDays +1; # increases sunnyDays by 1
sunnyDays # We use else if here because we know "Sunny" and "Rain" are mutually exclusive
else if(noonCond[i] == "Rain")
}
{= rainyDays +1; # increases rainyDays by 1
rainyDays
} }
And we can look in the Environment tab to see what the final values of rainyDays and sunnyDays were:
: 3
rainyDays: 6 sunnyDays
You can double check the noonCond vector and see there are 6 sunny and 3 rain days.
> noonCond
1] "Cloudy" "Cloudy" "Sunny" "Rain" "Fog" "Sunny" "Sunny" "Cloudy" "Rain" "Rain"
[11] "Snow" "Sunny" "Sunny" "Sunny" [
4.1 The if-else statement
In Figure 2 we used an if-else structure to separate the two conditional statement (Sunny and Rain). We could have just used two if() statements. In other words, the 10th line could have been written:
if(noonCond[i] == "Rain")
But we would be checking for Rain on days that we already know is Sunny. noonCond on a specific day cannot be both Sunny and Rain (i.e., they are mutually exclusive) – so the extra checks are not needed. The code would still work, just not be as efficient. But, efficiency becomes more important as the number of values in your vector increases – and efficiency means quicker execution time.
5 State variables uses
We can also use state variables on numeric values to count how many times a condition is met.
In this case, we will look at how many time
the temperature was greater than 60
the temperature was less than 50.
We are looking at two situations so we need two state variables. Again, we initialize the values of the state variables to 0 because 0 would be the correct answer if the conditions inside the if-else statement were FALSE for every day.
= 0; # days with temperatures greater than 60
tempGT60 = 0; # days with temperatures less than 50
tempLT50
for(i in 1:numDays)
{if(highTemps[i] > 60) # high temp more than 60
{= tempGT60 +1;
tempGT60 else if(highTemps[i] < 50) # high temp less than 50
}
{= tempLT50 +1;
tempLT50
} }
And we can look in the Environment tab to see the final values of the state variables:
: 2
tempGT60: 4 tempLT50
Again, you can check highTemps in the Console to see this is correct:
> highTemps
1] 57 50 54 40 39 58 60 53 55 44 39 54 61 75 [
5.1 Sequences with the for loops
Perhaps you do not want to check every day – maybe you only want to check every-other day. In this case you use seq() in the for loop to subset the days:
= 0; # odd indexed days with temperatures greater than 60
tempGT60odd = 0; # odd indexed days with temperatures less than 50
tempLT50odd
for(i in «seq(from=1, to=numDays, by=2)») # every other day (1,3,5,7...)
{if(highTemps[i] > 60)
{= tempGT60odd +1;
tempGT60odd else if(highTemps[i] < 50)
}
{= tempLT50odd +1;
tempLT50odd
} }
And, again, the Environment tab will show the final values of the state variables:
: 1
tempGT60odd: 2 tempLT50odd
6 State variable to sum values
We have used state variables to count values – adding 1 to the state variable during each cycle of a for loop when a condition is TRUE.
Next we will use a state variable to add values. In this case, adding up precipitation values to get the total precipitation. Again, we initialize the state variable to default value of 0, because 0 is the answer if there is nothing to add.
The difference is that, instead of adding 1 to the state variable, we add the precipitation for that day, or precip[i] where i is the index variable.
Note: there are no if() statements in this for loop because we are unconditionally adding every days’ precipitation to the total precipitation
= 0;
totalPrecip for(i in 1:numDays)
{= totalPrecip + precip[i];
totalPrecip }
7 if-any checks
Now we are going to use a state variable to see if any value in the vector meets a condition.
We are going to initialize the state variable to FALSE and if any variable meets the condition, the state variable will be changed to TRUE. The condition we will look is whether a temperature is less than 40.
Trap: Using T and F to represent TRUE and FALSE is a bad idea
= FALSE;
anyDayLT40
for(i in 1:numDays)
{if(highTemps[i] < 40)
{= TRUE;
anyDayLT40
} }
After executing the script, we see that anyDayLT40 in TRUE in the Environment. You can easily test other scenarios by changing the if() condition. For example:
if(highTemps[i] < 30) # this will produce a FALSE result
7.1 The break statement
In Figure 5, the for loop is inefficient because it will continue to check every temperature value even after one of them passes the condition of being greater that 40. This is not necessary because we only care if at least one value meets the condition. We do not care if more than one meets the condition
To make the code more efficient, we can add a break statement to the codeblock attached to the if(). break is a command that tells R to immediately exit the for loop. In other words, if the 5th value passes the condition, the for loop will end and not cycle through values 6-14.
# A more efficient check if any value in highTemps is less than 40
= FALSE; # initialize state variable to FALSE
anyDayLT40_2 for(i in 1:numDays)
{if(highTemps[i] < 40)
{= TRUE; # found a value -- change state variable to TRUE
anyDayLT40_2 # exits the for loop immediately
«break;»
} }
A break statement in the above code will not meaningfully speed up this script, but it will speed up script that is cycling through 1000s or 10000s of times through a for loop.
8 Finding the highest (or lowest) value
For the last example we are going to find the highest temperature in a vector. We do this by cycling though each value in the vector highTemps and comparing it to the current highest temperature. If the new value is higher than the highest temperature, then we set that to be the highest temperature.
Here is the code – we will go over it in more detail in a bit:
= 0; # initialize the state variable to 0
highestTemp
for (i in 1:numDays)
{if(highTemps[i] > highestTemp) # is this day's value grater than the current high
{# this day's value is higher -- set highestTemp to this value
= highTemps[i];
highestTemp # browser(); # uncomment to pause the script's execution at this point
} }
8.1 Using browser() to pause your script
The for loop above is going through every value in the highTemps in order. Each time a new high temperature is found, highestTemp is set top that value. So, highestTemp is going to start at 0 (the initial value), and change multiple times.
If we put highTemps in the Console, we can see each time a new highest temperature will occur:
> highTemps
1] «57» 50 54 40 39 «58» «60» 53 55 44 39 54 «61» «75» [
A new highest temperature will be discovered 5 times: 57, 58, 60, 61, and 75 (cycles 1, 6 , 7, 13, and 14).
We can use browser() to see this, too. browser() is an instruction to pause the script in the middle of execution and put it in debug mode.
If we uncomment the browser() line in Figure 6:
# browser(); # uncomment to pause the script's execution at this point
Then the script will pause at that line when the condition if(highestTemp > highTemps[i]) is TRUE, which happens 5 times (out of 14 cycles) when i = 1, 6, 7, 13, and 14.
browser() will initially pause the script when i = 1 and you can click Continue to pause the script when i = 6, 7, 13, and 14. For now, the only other button you should know is Stop, which ends debug mode.

Debug mode is a complicated topic that will be more fully explored in future lessons.
8.2 Initializing the state variable
I initialized the state variable, highestTemp, to 0 – and that works in this case because we know there are temperatures greater than 0. But, if all the temperatures were negative, then this would not work (I will let you answer why this would not work in the Application).
A better solution is to set the state variable to the first value in the vector (highTemps[1]). This makes the first value in the highTemps vector the default answer. In other words, if every condition check is FALSE (i.e., no temperature is greater than the first) then the first temperature is the highest temperature.
Note: this means you could start the indexing variable i at 2 instead of one. You do not need to test the 1st value against the 1st value – there is no harm but it is a tiny bit inefficient.
= highTemps[1]; # set the state variable to the first value in the vector
highestTemp2
for(i in 1:numDays) # could be (i in 2:numDays)
{if(highTemps[i] > highestTemp2) # is this day's value greater than the current high
{= highTemps[i]; # set the state variable to the current day's value
highestTemp2 ### another way to debug your script...
cat("Day ", i, " the highest temp changed to ", highTemps[i], "\n", sep="");
} }
Whenever you are looking for the most extreme value in a vector, it is best to set the state variable to the first value in the vector. By doing this, you do not have to guess what kind of values you are going to get.
8.3 Debug with cat()
Using cat() to output information to the Console is a quick-and-dirty, if not the most robust, way to debug your script. In this case, cat() shows the four time the highest temperature was changed in the 14 cycles of the for loop.
Day 6 the highest temp changed to 58
Day 7 the highest temp changed to 60
Day 13 the highest temp changed to 61 Day 14 the highest temp changed to 75
Remember to remove or comment out your cat() statements that are used for debugging before sharing your script.
9 Application
1) In comments answer: What happens if you set the state variable to 0 when trying to find the highest temperature if all temperatures are negative.
2) Find out if any of the even days were less than 40 degrees. Hint: use seq()
add code to exit the for loop as soon as the condition is met
add code that outputs to Console the first day that met the condition
3) Find the mean low temperature using for loops: get the total and divide by the number
4) In one for loop find out how many days had:
1 inch or more rain
between 0.1 and 1 inch of rain (not inclusive of 0.1 or 1)
0.1 inches or less rain
5) Find out the lowest low temp and output to the Console the lowest temperature and the date it occurred on.
6) On days that were cloudy: find the highest temperature and the mean temperature .
Save the script as app1-11.r in your scripts folder and email your Project Folder to Charlie Belinsky at belinsky@msu.edu.
Instructions for zipping the Project Folder are here.
If you have any questions regarding this application, feel free to email them to Charlie Belinsky at belinsky@msu.edu.
9.1 Questions to answer
Answer the following in comments inside your application script:
What was your level of comfort with the lesson/application?
What areas of the lesson/application confused or still confuses you?
What are some things you would like to know more about that is related to, but not covered in, this lesson?
10 Trap: Using T and F to represent TRUE and FALSE is a bad idea
The terms TRUE and FALSE are reserved keywords in R (just like if, for, else…) – this means that TRUE and FALSE are predefined and cannot be used as variable names in R.
If you try to assign a value to a “variable” named TRUE or FALSE, you will get an invalid (do_set) left-hand side to assignment error (Figure 9). This is the same error you get if you try to assign a number to a number (i.e., 10 = 5).

You will often see R scripts that use T and F as shortcuts for TRUE and FALSE. R accepts T and F as shortcuts for TRUE and FALSE but you should not use T and F to represent TRUE and FALSE because T and F are not protected keywords. T and F can be overwritten as shown in Figure 9.
If T or F get overwritten then your code will produce unexpected results. It is best to stick with the reserved (and protected) keywords TRUE and FALSE.