Hello, today I am going to teach you R, from the scratch. In the next part, I will also show you robotic automation using R
After going through this course, you will be able to manipulate data using R and later you will be able to automate reports, curtailing the time by 99%
1+1 #add
## [1] 2
2*2 #multiply
## [1] 4
4-1 #subtract
## [1] 3
25/5 #division
## [1] 5
We are saving a number i.e. 1 into a vector named a
a = 1
This will create the vector but will not show you what are the values there in a. You need to write the vector name again and execute it (pressing Ctrl+R) to see the values in vector a.
a = 1
a
## [1] 1
Please do remember that R is case sensitive, hence if we write capital A then it will show error, since capital A has not been defined. We need to call small a since we have defined it
We may also save multiple consecutive numbers Here new values will replace the previous values in vector a
a = 1:3
a
## [1] 1 2 3
If you want to save multiple numbers which are not consecutive, then you have to use c which is a short form of combine. Please note: R is case sensitive, hence c is small letter syntax, and writing it in capital will cause error.
a = c(1,3,5)
a
## [1] 1 3 5
You can see till now that = sign works perfectly fine. However, the people using R do not use = sign, but they use <- (arrow sign). This is just a substitute of = sign. Hence forth, I will also be using <- (arrow sign). We will get the same result though.
a <- c(1,3,5)
a
## [1] 1 3 5
Now if we try to same characters as a vector then we can save them in a vector named b. Remember to put quotes, else R will save the vector a instead of character a. Characters are always put with in quotes.
b = "a"
b
## [1] "a"
Each vector can hold only one type of values: either numbers (called numeric) or letters (called characters). How do we know which type of data is stored in a vector?
class function will show the property, i.e. which type of values are there. a has a numeric property
class(a)
## [1] "numeric"
Similarly, b has a character property
class(b)
## [1] "character"
It doesn’t give any errors
p <- c(1,2,3,"Joyita Sanyal")
p
## [1] "1" "2" "3" "Joyita Sanyal"
Now what will be property of p? Whenever there is a character value, the entire vector turns into character since character has a priority over numeric.
class(p)
## [1] "character"
We can change the property of vector we can save a as character which actually was numeric
a <- as.character(a)
class(a)
## [1] "character"
Now we can again save vector a as numeric which has become character
a <- as.numeric(a)
class(a)
## [1] "numeric"
The class is factor, which means R will treat this variable as a categorical variable. The benefit of having factor will be explained later on, when we will be doing Machine Learning.
a <- as.factor(c(1,1,2,1,3,3,2,4))
class(a)
## [1] "factor"
a
## [1] 1 1 2 1 3 3 2 4
## Levels: 1 2 3 4
The levels are showing the unique values in the vector a.
Let’s create 3 vectors first
name <- c("A","B","C","D","E","F","G","H","I","J")
age <- c(22,9,5,39,50,17,26,33,43,48)
sex <- c("F","M","M","F","F","M","F","M","F","M")
Now we will put all the 3 vectors together to create a data frame named undertaker
undertaker <- data.frame(name,age,sex,stringsAsFactors = FALSE)
The default rule of R is if data frame is created out of character vector then it will change it into factor.
If we write “stringsAsFactors=FALSE” then character vector will not be changed into factor and remain as character.
Now let’s see how undertaker looks like!
undertaker
## name age sex
## 1 A 22 F
## 2 B 9 M
## 3 C 5 M
## 4 D 39 F
## 5 E 50 F
## 6 F 17 M
## 7 G 26 F
## 8 H 33 M
## 9 I 43 F
## 10 J 48 M
If you want to create a copy of undertaker simply save it as kane
kane <- undertaker
kane
## name age sex
## 1 A 22 F
## 2 B 9 M
## 3 C 5 M
## 4 D 39 F
## 5 E 50 F
## 6 F 17 M
## 7 G 26 F
## 8 H 33 M
## 9 I 43 F
## 10 J 48 M
Now if we want to view only the first column of kane then write the column index.
Suppose we want to view only the age column (which is the 2nd column in kane), then write kane and within square bracket 2
kane[2]
## age
## 1 22
## 2 9
## 3 5
## 4 39
## 5 50
## 6 17
## 7 26
## 8 33
## 9 43
## 10 48
If you want the 1st and 3rd column then write 1,3 and use “c” since there are more than 1 value.
kane[c(1,3)]
## name sex
## 1 A F
## 2 B M
## 3 C M
## 4 D F
## 5 E F
## 6 F M
## 7 G F
## 8 H M
## 9 I F
## 10 J M
If you want to filter columns and rows, then within square bracket put a comma.
Left hand side signifies rows
Right hand side signifies columns
Leaving one side blank means all values
kane[row index , column index]
Suppose we want first 4 rows for 1st & 3rd column
kane[1:4,c(1,3)]
## name sex
## 1 A F
## 2 B M
## 3 C M
## 4 D F
Now suppose, we want 4th, 5th, 6th, and 7th rows for all columns
Leaving the column index blank means all columns
kane[1:4,]
## name age sex
## 1 A 22 F
## 2 B 9 M
## 3 C 5 M
## 4 D 39 F
Let’s have a more critical situation! We know there are columns named name and age, but don’t know whether they are 1st column or 5th column.
We may select the columns even by their names.
kane[1:4,c("name","age")]
## name age
## 1 A 22
## 2 B 9
## 3 C 5
## 4 D 39
We want to filter the people having age less than 20 Remember whenever we filter, we always filter the rows
So the filter condition would be put on the left side of the comma
We will now indicate the age column with dataframe kane. it can be written in the following format: dataframe$column
dataframe is the name of the data (in this case data is kane) and column is the variable with the datafame (in this case it’s the age column). Then we put the condition, i.e. less than 20.
kane[kane$age < 20,]
## name age sex
## 2 B 9 M
## 3 C 5 M
## 6 F 17 M
The different signs and their meaning are given below:
< less than
<= less than equals to
> greater than
>= greater than equals to
== equals to (always put double equals sign) != not equals to (! marks means not)
Let’s look at some other filter example:
We want to filter all female Put F in quotes since it’s character
kane[kane$sex == "F",]
## name age sex
## 1 A 22 F
## 4 D 39 F
## 5 E 50 F
## 7 G 26 F
## 9 I 43 F
We want to filter except whose name is A
Here we have everybody except A
kane[kane$name != "A",]
## name age sex
## 2 B 9 M
## 3 C 5 M
## 4 D 39 F
## 5 E 50 F
## 6 F 17 M
## 7 G 26 F
## 8 H 33 M
## 9 I 43 F
## 10 J 48 M
We may also have multiple conditions together
** and condition - (& is used for and) and means all conditions must be true **
If we want Female and whose age is greater than equals to 20
kane[kane$sex == "F" & kane$age >= 20,]
## name age sex
## 1 A 22 F
## 4 D 39 F
## 5 E 50 F
## 7 G 26 F
## 9 I 43 F
We may have any number of conditons:
Suppose, we want people whose age is between 20 & 40, but their name should not be D
kane[kane$age >= 20 & kane$age <= 40 & kane$name != "D",]
## name age sex
## 1 A 22 F
## 7 G 26 F
## 8 H 33 M
** or condition - (| is used for and) and means any of the given conditions should be true **
We want people whose age is less than 20 or if they are female
kane[kane$age <= 20 | kane$sex == "F",]
## name age sex
## 1 A 22 F
## 2 B 9 M
## 3 C 5 M
## 4 D 39 F
## 5 E 50 F
## 6 F 17 M
## 7 G 26 F
## 9 I 43 F
Here we get only people who are Female. The only people who are not Female bacuse their age is less than equals to 20 (as the given condition)