Machine Learning with R

About the Course

The world is quietly being reshaped by machine learning. We no longer need to teach computers how to perform complex tasks like image recognition or text translation: instead, we build systems that let them learn how to do it themselves. R is very powerful open source software, which is the best tool for data analytics and machine learning, used by giant corporates including Google. In this course you will be learning how to use R and Machine Learning algorithms to solve business problems and extracting insights to enable to companies to stay one step ahead of their competitors.

Course Overview

Getting the Hang of R
The R Website
Downloading and Installing R from CRAN
Running the R Program
The interpreter and the console
Tools to work efficiently with R
Finding Your Way with R
Getting Help via the CRAN Website and the Internet
The Help Command in R
Anatomy of a Help Item in R
Command Packages
Standard Command Packages
What Extra Packages Can Do for You
How to Get Extra Packages of R Commands

Special values
Creating a vector and accessing its properties
Data frames
Creating a data frame and accessing its properties
Creating Data Frames
Accessing Data Frames
Extracting Subdata Frames
More on Treatment of NA Values
Using the rbind() and cbind() Functions and Alternatives
Applying apply()
Extended Example: WWE case Study

Efficient Data Frames
Tidying Data with tidyr and Regular Expressions
Make Wide Tables Long with gather()
Split Joint Variables with spread()
Other tidyr Functions
Regular Expressions
Merging Data Frames
Extended Example: An Employee Database
Efficient Data Processing with plyr
Renaming Columns
Changing Column Classes
Filtering Rows
Chaining Operations
Data Aggregation
Combining Datasets
Working with Databases
Data Processing with data.table

The if() Versus ifelse() Functions
If… else conditionals
The use of the if conditional statement
Extra tick of if conditional statement
Sorting and Ordering
Reversing Elements
Which Indices are TRUE?
Converting Factors to Numerics
Logical AND and OR
Row and Column Operations and anyNA()
The cut() Function

Writing your first function in R
Writing functions with multiple arguments and use of default values
Handling data types in input arguments
Producing different output types and return values
Making a recursive call to a function
Handling exceptions and error messages

Text functions
Data cleaning with efficient text functions
Inbuilt Numeric functions of R
Inbuilt String functions of R
Inbuilt other functions of R
nchar() , paste(), substr(), strsplit() etc

Pivot Table of Excel in R
Table function
Count function of plyr package
Learning of SQL queries using R
Grouping numeric data
User defined functions (Macros) in R
Visualizing of Data
Date functions with Lubridate package
Apply functions
User defined functions (Macros) in R

Box-whisker Plots
Basic Boxplots
Customizing Boxplots
Horizontal Boxplots
Scatter Plots
Basic Scatter Plots
Adding Axis Labels
Plotting Symbols
Setting Axis Limits
Line Charts
Line Charts Using Numeric Data
Line Charts Using Categorical Data
Pie Charts
Bar Charts
Single-Category Bar Charts
Multiple Category Bar Charts
Horizontal Bars
Bar Charts from Summary Data

Means: The Lure of Averages
The Average in R: mean()
Medians: Caught in the Middle
The Median in R: median()
Statistics à la Mode
The Mode in R
Deviating from the Average
Measuring Variation
Back to the Roots: Standard Deviation
Standard Deviation in R
Conditions, Conditions, Conditions …
Meeting Standards and Standings
Catching Some Z’s

How Many?
The High and the Low
Living in the Moments
Tuning in the Frequency
Summarizing a Data Frame

Hitting the Curve
Working with Normal Distributions
A Distinguished Member of the Family
Drawing Conclusions from Data

Understanding Sampling Distributions
An EXTREMELY Important Idea: The Central Limit Theorem
Confidence: It Has Its Limits!
Fit to a t

Hypotheses, Tests, and Errors
Hypothesis Tests and Sampling Distributions
Catching Some Z's Again
Z Testing in R
t for One
t Testing in R
Working with t-Distributions
Visualizing t-Distributions
Testing a Variance
Working with Chi-Square Distributions
Visualizing Chi-Square Distributions

Hypotheses Built for Two
Sampling Distributions Revisited
t for Two
Like Peas in a Pod: Equal Variances
t-Testing in R
A Matched Set: Hypothesis Testing for Paired Samples
Paired Sample t-testing in R
Testing Two Variances

Testing More Than Two
Another Kind of Hypothesis, Another Kind of Test
Getting Trendy
Trend Analysis in R

Cracking the Combinations
Two-Way ANOVA in R
Two Kinds of Variables … at Once
After the Analysis

Uses and abuses of machine learning
Machine learning successes
How machines learn
Machine learning in practice
Machine learning with R

Understanding regression
Simple linear regression
Ordinary least squares estimation
Multiple Linear Regression
Regression: What a Line!
Linear Regression in R
Juggling Many Relationships at Once: Multiple Regression
exploring and preparing the data
ANOVA: Another Look
Formulae and Linear Models
Model Building
training a model on the data
evaluating model performance
improving model performance
Goodness of Fit with Data—The Perils of Overfitting
Root-Mean-Square Error
Model Simplicity and Goodness of Fit
Assumption checking
Assumption checking using packages
Case studies of Linear Regression
Estimation the quality of wines
Price prediction of real estate
Movie popularity prediction
Retail sales prediction

Understanding logistic regression
The logit model
Generalized Linear Model
Simple logistic regression
Multiple logistic regression
Customer satisfaction analysis with the multiple logistic regression
Multiple logistic regression with categorical data
The Dataset and the Data Dictionary
Data Import in R
EDD in R
Outlier Treatment in R
Missing Value treatment in R
Variable transformation and Deletion in R
Dummy variable creation in R
Automatic dummy variable creation
Formulae and Logistic Models
Model Building
training a model on the data
evaluating model performance
improving model performance
Goodness of Fit with Data—The Perils of Overfitting
Confusion Matrix
Creating Confusion Matrix in Python

Introduction to Time Series Data
Notation for Time Series Data
Peculiarities of Time Series Data
Setting the Frequency
Treatment of missing values
White Noise
Correlation Between Past and Present Values
The Autocorrelation Function (ACF)
The Partial Autocorrelation Function (PACF)
Picking the Correct Model
The Autoregressive (AR) Model

Unsupervised Learning & Clustering: theory
K-Means Clustering: Theory
Example K-Means Clustering in R
Visualize K-Means Results in R
Model-based Unsupervised Clustering in R
How to assess a Clustering Tendency of the dataset
Selecting the number of clusters for unsupervised Clustering methods (K-Means)
Assessing the performance of unsupervised learning (clustering) algorithms
How to compare the performance of different unsupervised clustering algorithms?

A Simple Tree Model
Deciding How to Split Trees
The stopping criteria for controlling tree growth
Tree Entropy and Information Gain
Pros and Cons of Decision Trees
Tree Overfitting
Pruning Trees
Decision Trees for Classification
Conditional Inference Trees
Conditional Inference Tree Classification
Building a decision tree in R
Model Validation
Model Improvement
Model Interpretation
Ensemble technique
Random Forest Classification
Splitting Data into Test and Train Set in R
Choose the number of trees
Model Validation
Model Improvement
Model Interpretation
Accuracy of the model
Decision Vs Random Forest