1-12: Multiple Conditions

0.1 Changes

1 Purpose

  • Check for multiple conditions on variables

2 Questions about the material…

The files for this lesson:

 

This lesson data has two new columns in it compared to the previous lesson’s data – so 14 obs. of 7 variables (instead of 5).  The new columns (noonCondMessy and precipBad) are copies of the previous columns with some issues added that are addressed in this lesson.

Figure 1: The new weatherData data frame with two new columns

 

If you have any questions about the material in this lesson, feel free to email them to the instructor, Charlie Belinsky, at belinsky@msu.edu.

3 Logical Operators: AND (&), OR (|)

In this lesson we are going to learn how to combine conditional statements in if() statements using two new operators: AND, OR.  The symbol for the AND operator is the ampersand ( & ) and the symbol for the OR operator is the vertical pipe ( | ).  For situations where you are checking one value at a time, the single and double operators have the same functionality.  We will cover the other situation later in this lesson.

 

Extension: & vs && and | vs ||

 

The AND operator combines two conditional statements into one conditional statement with the rule:

  • if both of the conditional statements are TRUE then the whole conditional statement is TRUE
if (condition1 & condition2)
{
   # Execute the code between these curly brackets
   #   if both condition1 and condition2 are TRUE
}

The OR operator combines two conditional statements into one conditional statement with the rule:

  • if either of the conditional statements are TRUE then the whole conditional statement is TRUE
if (condition1 | condition2)
{
   # Execute the code between these curly brackets
   #   if either condition1 or condition2 is TRUE
}

3.1 Operators we have learned so far

Table 1: All of the operators taught in the class so far
Operator Type Purpose R Symbols
Assignment assign a value to a variable =, <-, ->
Mathematical Perform a mathematical operation on a numeric value +, -, *, /, ^
Subset Subset a vector [ ], $
Conditional Compare two values ==, !=, >, <, >=, <=
Logical Combine conditions &, &&, |, ||

4 Combining conditional statements (logical operators)

In the last lesson we started by asking two questions about noonCond that both had one condition: 

  1. Is the day sunny? 

  2. Is the day rainy?

#### From last lesson
sunnyDays = 0; # state variable -- will hold the count of sunny days
rainyDays = 0; # state variable -- will hold the count of rainy days

for(i in 1:numDays)
{
  if(noonCond[i] == "Sunny")
  {
    sunnyDays = sunnyDays +1; # increases sunnyDays by 1
  # We use else if here because we know "Sunny" and "Rain" are mutually exclusive
  }else if(noonCond[i] == "Rain")
  {
    rainyDays = rainyDays +1; # increases rainyDays by 1
  }
}

And the answer was:

sunnyDays:   6
rainyDays:   3

4.1 Using the OR operator

We can also combine the two conditional statements and ask: Is the day sunny OR  rainy?

 

Just replace OR with the symbol that represents OR, which is |:

  if(noonCond[i] == "Sunny" | noonCond[i] == "Rain")

Put this in the script and it will count days both sunny and rainy days:

sunnyOrRainyDays = 0
for(i in 1:numDays)
{
  if(noonCond[i] == "Sunny" | noonCond[i] == "Rainy")
  {
    sunnyOrRainyDays = sunnyOrRainyDays +1;
  }
}
Figure 2: Using logical operator | to test two conditions

sunnyOrRainyDays is the addition of sunny (6) and rainy (3) days

sunnyOrRainyDays:   9

4.2 Conditions must be explicit

In programming we need to be explicit when using multiple conditional statement:

  if(noonCond[i] == "Sunny" | noonCond[i] == "Rainy")  # this is correct

In English, it makes sense to ask: Is the day sunny or rainy?

 

And, naively, the code for this sentence would look like this:

  if(noonCond[i] == "Sunny" | "Rain")  # this in incorrect

If you do this, you will get the error:

Error in noonCond[i] == “Sunny” | “Rain” : invalid ‘y’ type in ‘x | y’

where x represent noonCond[i] == "Sunny" and y represents "Rain"

 

The error is telling you that “Rain” (the invalid ‘y’) is not a conditional statement.  Both conditions (‘x | y’) must be explicitly written out (i.e., both must have a conditional operator with a value on both sides).

4.3 More than 2 conditions

You can also use the OR operator to string together multiple conditional statements.

 

We will use the new column in our data, called noonCondMessy in which the weather conditions are not consistently spelled – as often happens when people are manually inputting data:

noonCondMess = weatherData$noonCondMessy;

Looking at the value in noonCondMess show that there are multiple versions of “sunny”:

> noonCondMess
[1] "clouds" "Cloudy" "sun" "rainy" " fog" "Sunny" "sunny"
[8] "cloudy" "Rain" "RAIN" "Snow" "SUNNY" "Sun" "sun"

We will use multiple OR operators to check for four different versions of “sunny”:

sunnyDaysMess = 0; # state variable -- will hold the count of sunny days

for(i in 1:numDays)
{
  # check for different spellings
  if(noonCondMess[i] == "Sunny" | noonCondMess[i] == "sunny" |
     noonCondMess[i] == "sun" | noonCondMess[i] == "SUN")
  {
    sunnyDaysMess = sunnyDaysMess +1;
  }
}
Figure 3: Using OR operator to check for multiple spellings

And, once again we get six:

sunnyDays1:  6

Note: This is a brute force method for finding multiple spellings. There are more robust way to check for different spelling using substrings and pattern recognition (i.e., substr() and grep()), topics we will touch on later in this class.

5 The AND operator

The OR operator says: if one condition is TRUE then the whole condition is TRUE

The AND operator says that ALL conditions need to be TRUE for the whole condition to be TRUE.

 

The AND operator can be used to check conditions in two different weather columns.

 

For example, you might want to know which days were warmer than 60 AND Sunny.

goOutDay = 0;

for(i in 1:numDays)
{
  if(highTemps[i] > 60 & noonCond[i] == "Sunny") 
  {
    goOutDay = goOutDay +1;
  }
}
Figure 4: Using AND operator to check conditions in two different columns

There were 2 days that were both over 60 and sunny:

goOutDays:   2

Note: day 7 is 60 degrees and sunny but not counted here because the condition ( > ) is greater than (but not equal to) 60

5.1 The NOT EQUAL operator

We can reverse the conditions and check for non-sunny days that were colder than (or equal to) 50.

 

In other words we want highTemps <= 50 AND noonCond != “Sunny”:

stayInDay = 0;

for(i in 1:numDays)
{
  if(highTemps[i] <= 50 & noonCond[i] != "Sunny") 
  {
    stayInDay = stayInDay +1;
  }
}
Figure 5: Reversing the conditional statements from the last loop

There were 5 days that were below (or equal to) 50 and not sunny:

stayInDays:    5

5.2 Mutually exclusive if() statement

Since the two if() statements above (Figure 4 and Figure 5) have mutually exclusive conditions (i.e., there are no situations where both can be TRUE), we can (and should) combine them into one if-else-if statement.  The following code is functionally the same but more efficient (i.e., faster):

for(i in 1:numDays)
{
  if(highTemps[i] > 60 & noonCond[i] == "Sunny")
  {
    cat("day", i, " good day to go out\n");
  }else if(highTemps[i] <= 50 & noonCond[i] == "Rain")
  {
    cat("day", i, " good day to stay in\n");
  }
}
Figure 6: Checking mutually exclusive multiple conditions with an if-else-if structure

The 2 days that are good to go out and 5 that are not:

day 2 good day to stay in
day 4 good day to stay in
day 5 good day to stay in
day 10 good day to stay in
day 11 good day to stay in
day 13 good day to go out
day 14 good day to go out

6 Finding Ranges of numbers

So far when we has used conditional operators on numbers that are bounded on one side but go off to infinity of the other.

  • highTemp > 60 means anything from 60 up to positive infinity

  • highTemp < 50 means anything from 50 down to negative infinity

 

But, we often want to limit the range we are checking to something less than infinity!

 

For instance we might want all values between 50 and 60 (in this case, we will include 50 but not 60).

 

In other words, we want:

  • values greater than or equal to 50 (highTemps >= 50)

  • AND values less than 60 (highTemps < 60)

 

In order for the value to be between 50 and 60 both of the above conditional statements must be TRUE, so we use AND to combine the conditional statements:

for(i in 1:numDays)
{
  # the number is both greater than (or equal to) 50 and less than 60
  if(highTemps[i] >= 50 & highTemps[i] < 60)
  {
    cat("It was ", highTemps[i], "degrees on day ", i, "\n");
  }
}
Figure 7: Using AND operator to limit the range to values between two numbers

The 7 days between 50 and 60 (includes 50 but not 60):

It was 57 degrees on day 1
It was 50 degrees on day 2
It was 54 degrees on day 3
It was 58 degrees on day 6
It was 53 degrees on day 8
It was 55 degrees on day 9
It was 54 degrees on day 12

6.1 Values outside a range

Sometimes we want to check for values outside an expected range – often to check for errors.  For instance, precipBad has some values that seem to be in error:

> precipBad
[1] 0.010 0.005 0.040 1.110 0.120 0.000 «-0.005» «49.000» 0.450 0.300
[11] 1.130 0.004 0.000 0.000

Since rain is in inches, we are going to assume that any negative value (less than 0) or value above 10 is in error.

 

In this case we are looking for  precipBad values that are less than 0 OR greater than 10:

precipBad = weatherData$precipBad;
for(i in 1:numDays)
{
  # precipBad values less than 0 or greater than 10
  if(precipBad[i] < 0 | precipBad[i] > 10)
  {
    cat("Day", i, "has a value of" , precipBad[i], "\n");
  }
}
Figure 8: Checking for values beyond the expected range

And we see the two days in precipBad that are in error:

Day 7 has a value of -0.005
Day 8 has a value of 49

7 Boolean vectors

Up until this point. we have been using conditional operators to check values individually.  We can also use conditional operators to check every value in a vector at once and create a TRUE/FALSE (Boolean) vector from the result.

 

For instance, we only might care about sunny days:

sunnyDayBool = (noonCondition == "Sunny");

Or, whether the day was both sunny and warm:

niceDayBool = (highTemps > 60 & noonCondition == "Sunny");

Or, whether the was rain or snow:

precipBool = (precipitation == "Rain" | precipitation == "Snow")

The result for all three of these commands is a 14-value Boolean (also called logical) vector:

sunnyDayBool  logi [1:14] FALSE FALSE TRUE FALSE...
niceDayBool   logi [1:14] FALSE FALSE FALSE FALSE...
precipBool    logi [1:14] FALSE FALSE FALSE TRUE...

We can look at sunnyDayBool and see that the index of the TRUE values match the index of “Sunny” days in noonCond (values 3, 6, 7, 12, 13, and 14):

> sunnyDayBool
 [1] FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
[12] TRUE  TRUE  TRUE
> noonCond
 [1] "Cloudy" "Cloudy" "Sunny"  "Rain"   "Fog"    "Sunny"  "Sunny"  "Cloudy"
 [9] "Rain"   "Rain"   "Snow"  "Sunny"  "Sunny"  "Sunny" 

Or, that only the last two days were both Sunny and over 60 degrees:

> niceDayBool
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE  TRUE  TRUE

Or, that 4 of the 14 days had some precipitation (rain or snow):

> precipBool
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
[12] FALSE FALSE FALSE

7.1 Masking a dataframe

A common use for a Boolean vector is to reduce (or mask) a dataframe to rows that meet a condition. We can create a dataframe that contains only the rows with sunny days using the Boolean vector sunnyDayBool.

### Using a Boolean vector to "mask" a dataframe 
sunnyDayWD = weatherData[sunnyDayBool,];
Figure 9: Masking a dataframe using [X,Y] subset notation

In Figure 9, we use [X,Y] notation to subset the dataframe with:

  • sunnyDayBool subsets X (the rows) – so only rows where sunnyDayBool is TRUE will be in the subsetted dataframe

  • nothing subsets Y (the columns) – so all columns will be in the subsetted dataframe

 

The subsetted dataframe, sunnyDayWD has 6 rows, representing the 6 of the 14 rows (rows 3, 6, 7, 12, 13, and 14) from the original dataframe where noonCondition was Sunny:

> sunnyDayWD
    date highTemp lowTemp precipitation «noonCondition» noonCondMessy precipBad
3  Mar29       54      46         0.040         Sunny           sun     0.040
6   Apr1       58      45         0.000         Sunny         Sunny     0.000
7   Apr2       60      32         0.005         Sunny         sunny    -0.005
12  Apr7       54      43         0.004         Sunny         SUNNY     0.004
13  Apr8       61      45         0.000         Sunny           Sun     0.000
14  Apr9       75      63         0.000         Sunny           sun     0.000 

7.2 Masking rows and columns

We can use the [X,Y] notation to subset columns. The following code will also remove the precipitation column, which is column 4.

### Masking rows and columns
sunnyDayWD2 = weatherData[sunnyDayBool, c(-4)];  # remove precipitation column (4)

And in the Console, we see the same data with the precipitation column removed:

sunnyDayWD2                                                          ")
    date highTemp lowTemp noonCondition noonCondMessy precipBad
3  Mar29       54      46         Sunny           sun     0.040
6   Apr1       58      45         Sunny         Sunny     0.000
7   Apr2       60      32         Sunny         sunny    -0.005
12  Apr7       54      43         Sunny         SUNNY     0.004
13  Apr8       61      45         Sunny           Sun     0.000
14  Apr9       75      63         Sunny           sun     0.000

8 Application

A) Create one if-else-if structure that checks for:

  • Sunny days greater than 54

  • Non-Sunny less than or equal to 54

  • Sunny days less than or equal to 54

  • Non-Sunny greater than 54

In comments answer: Why do this as one if-else-if structure instead of 4 separate if statements?

 

B) Create a Boolean (logical) vector that finds all cloudy days in noonCondMessy (note different spellings)

 

C) Create a cloudyDays dataframe that:

  • Only has the rows from weatherData where weather conditions were cloudy

  • Removes the last 2 columns from weatherData (noonCondMessy and precipBad)

 

D) Create an if() that combines all rainy, cloudy, and snowy days from noonCondition.

 

E) Find which days meet all three of these conditions:

  • lowTemps > 40

  • highTemps < 60

  • Sunny

 

F) Use the following random number generator:

randomTemp = sample(0:100, size=1);  # pick 1 random from 0 to 100

and create one if-else-if structure that outputs:

  • “error” if randomTemp is less than 20 or greater than 80

  • “very cold” if randomTemp is 20-30

  • “cold” if randomTemp is 30-45

  • “nice” if randomTemp is 45-60

  • “unusually warm” if randomTemp 60-80

 

Save the script as app1-12.r in your scripts folder and  email your Project Folder to Charlie Belinsky at belinsky@msu.edu.

 

Instructions for zipping the Project Folder are here.

 

If you have any questions regarding this application, feel free to email them to Charlie Belinsky at belinsky@msu.edu.

8.1 Questions to answer

Answer the following in comments inside your application script:

  1. What was your level of comfort with the lesson/application?

  2. What areas of the lesson/application confused or still confuses you?

  3. What are some things you would like to know more about that is related to, but not covered in, this lesson?

9 Extension: & vs && and | vs ||

R has two AND operators (& and &&) and two OR operators (| and ||), which I will refer to as the singlesand doublesoperators.

 

The big difference is that singles work on one or multiple values, whereas doubles only work on one vaule. In other words, you can replace singles with doubles for the whole lesson except in Section 7, where we are working with a vector of values. In Section 7, switching the singles with doubles will cause an error.

 

At this point it seems there is no reason to use doubles if singles do everything and more. And for a beginner, this is enough information. However, as you get into more advanced programming, there are benefits to using doubles.

9.1 doubles are more efficient

In the following code, & will always check both if highTemps > 50 and noonCond is Sunny:

  if(highTemps > 50 «&» noonCond == "Sunny") 

But if highTemps is less than 50 then there is no point to checking noonCond because the condition is already FALSE.

 

&& will only check what is necessary to get the condition:

if(highTemps > 50 «&&» noonCond == "Sunny")

if highTemps is less than 50, && will not bother to check noonCond.

9.2 doubles can used to avoid errors

Another common usage of doubles is to check if a variable exists before checking the value of the variable. Here we are checking first to see if highTemps exists, then we will check if the value is greater than 50:

if( exists(highTemps) «&&» highTemps > 50 )  # will avoid errors

Because we are using &&, if exists(highTemps) is FALSE, the value of highTemps will not be checked.

 

If we use &, then the value of highTemps will be checked even if highTemps does not exist, causing an error in your script.

 if( exists(highTemps) «&» highTemps > 50 ) # error if highTemps does not exist

9.3 singles and double in other programming languages

In a lot of language, the doubles have a similar meaning to the singles in R. Just remember where you use singles in R, you are likely using doubles in other languages.