Every day, around the United States, more than 36,000 weather forecasts are calculated. They gather all 36,000 forecasts, put them in a database, and compare them to the actual conditions encountered in that location on that day. All that collection, analysis, and reporting take a lot of heavy analytical horsepower and it is done with one programming language: Python. Over 40% of all data scientists use Python in their day to day work. Python has long been known as a simple programming language to pick up, which has propelled it to be the most preferred tool for a Data Scientist. In this course you will learn how to use the power of Python to analyze data, create beautiful visualizations, and use powerful machine learning algorithms to formulate business strategies.

Python—The Programming Language

Installing Python

Anaconda

Spyder

Jupyter notebook

IDLE (Integrated DeveLopment Environment)

Implement the Code Using an IDE

Interact with Python

Writing Python Code

Make Calculations

Import New Libraries and Functions

Import additional libraries using pip install

Import msgpack to satisfy basic requirement

NumPy

Pandas

matplotlib

Pandas Data Structures

Introduction

Creating Your Own Data

Types of Data

The dtype Option

The Series

The list

The tupple

Difference between list & tupple

The DataFrame

Making Changes to Series and DataFrames

Exporting and Importing Data

CSV

Excel

Jason

Aggregate Functions

Indexing, Slicing, and Iterating

Indexing

Slicing

Iterating an Array

Conditions and Boolean Arrays

Shape Manipulation

The Index Objects

Other Functionalities on Indexes

Reindexing

Dropping

Arithmetic and Data Alignment

Operations between Data Structures

Flexible Arithmetic Methods

Operations between DataFrame and Series

Function Application and Mapping

Functions by Element

Functions by Row or Column

Statistics Functions

Sorting and Ranking

“Not a Number” Data

Assigning a NaN Value

Working With Missing Data

Filtering Out NaN Values

Filling in NaN Occurrences

Hierarchical Indexing and Leveling

Reordering and Sorting Levels

Summary Statistic by Level

Data Preparation

Merging

Merging Multiple Data Sets

Concatenating

Combining

Pivoting

Removing

Data Transformation

Tidy Data

Removing Duplicates

Mapping

Discretization and Binning

Detecting and Filtering Outliers

Permutation

String Manipulation

More String Methods

Built-in Methods for Manipulation of Strings

Regular Expressions

Data Aggregation

Group By

A Practical Example

Hierarchical Grouping

Group Iteration

Chain of Transformations

Functions on Groups

Advanced Data Aggregation

Introduction

Aggregate

Transform

Filter

The pandas.core.groupby .DataFrameGroupBy Object

Working With a MultiIndex

Introduction

Python’s datetime Object

Converting to datetime

Loading Data That Include Dates

Extracting Date Components

Date Calculations and Timedeltas

Datetime Methods

Subsetting Data Based on Dates

Date Ranges

Shifting Values

Resampling

The matplotlib Library

Installation

matplotlib Architecture

Backend Layer

Artist Layer

Scripting Layer (pyplot)

pyplot

Line chart

Scatter plot

Annotations: Add Text

Annotations: Properties

A Simple Interactive Chart

Set the Properties of the Plot

Working with Multiple Figures and Axes

Adding Further Elements to the Chart

Adding Text

Adding a Legend

Legends: Properties

Saving Your Charts

Saving the Code

Saving Your Chart Directly as an Image

Line Chart

Line Charts with pandas

Histogram

Means: The Lure of Averages

The Average in Python: mean()

Medians: Caught in the Middle

The Median in Python: median()

Statistics à la Mode

The Mode in Python

Deviating from the Average

Measuring Variation

Back to the Roots: Standard Deviation

Standard Deviation in Python

Conditions, Conditions, Conditions …

Meeting Standards and Standings

Catching Some Z's

How Many?

The High and the Low

Living in the Moments

Tuning in the Frequency

Summarizing a Data Frame

Hitting the Curve

Working with Normal Distributions

A Distinguished Member of the Family

Drawing Conclusions from Data

Understanding Sampling Distributions

An EXTREMELY Important Idea: The Central Limit Theorem

Confidence: It Has Its Limits!

Fit to a t

Hypotheses, Tests, and Errors

Hypothesis Tests and Sampling Distributions

Catching Some Z's Again

Z Testing in Python

t for One

t Testing in Python

Working with t-Distributions

Visualizing t-Distributions

Testing a Variance

Working with Chi-Square Distributions

Visualizing Chi-Square Distributions

Hypotheses Built for Two

Sampling Distributions Revisited

t for Two

Like Peas in a Pod: Equal Variances

t-Testing in Python

A Matched Set: Hypothesis Testing for Paired Samples

Paired Sample t-testing in Python

Testing Two Variances

Testing More Than Two

ANOVA in Python

Another Kind of Hypothesis, Another Kind of Test

Getting Trendy

Trend Analysis in Python

Cracking the Combinations

Two-Way ANOVA in Python

Two Kinds of Variables … at Once

After the Analysis

Uses and abuses of machine learning

Machine learning successes

How machines learn

Machine learning in practice

Machine learning with Python

Understanding regression

Simple linear regression

Ordinary least squares estimation

Multiple Linear Regression

Regression: What a Line!

Linear Regression in Python

Juggling Many Relationships at Once: Multiple Regression

exploring and preparing the data

ANOVA: Another Look

Formulae and Linear Models

Model Building

training a model on the data

evaluating model performance

improving model performance

Goodness of Fit with Data—The Perils of Overfitting

Root-Mean-Square Error

Model Simplicity and Goodness of Fit

Assumption checking

Assumption checking using packages

Case studies of Linear Regression

Estimation the quality of wines

Price prediction of real estate

Movie popularity prediction

Retail sales prediction

Understanding logistic regression

The logit model

Generalized Linear Model

Simple logistic regression

Multiple logistic regression

Customer satisfaction analysis with the multiple logistic regression

Multiple logistic regression with categorical data

The Dataset and the Data Dictionary

Data Import in Python

EDD in Python

Outlier Treatment in Python

Missing Value treatment in Python

Variable transformation and Deletion in Python

Dummy variable creation in Python

Automatic dummy variable creation

Formulae and Logistic Models

Model Building

training a model on the data

evaluating model performance

improving model performance

Goodness of Fit with Data—The Perils of Overfitting

Confusion Matrix

Creating Confusion Matrix in Python

Introduction to Time Series Data

Notation for Time Series Data

Peculiarities of Time Series Data

Setting the Frequency

Treatment of missing values

White Noise

Stationarity

Seasonality

Correlation Between Past and Present Values

The Autocorrelation Function (ACF)

The Partial Autocorrelation Function (PACF)

Picking the Correct Model

The Autoregressive (AR) Model

ARMA

ARIMA

Automatic ARIMA

Unsupervised Learning & Clustering: theory

K-Means Clustering: Theory

Example K-Means Clustering in Python

Visualize K-Means Results in Python

Model-based Unsupervised Clustering in Python

How to assess a Clustering Tendency of the dataset

Selecting the number of clusters for unsupervised Clustering methods (K-Means)

Assessing the performance of unsupervised learning (clustering) algorithms

How to compare the performance of different unsupervised clustering algoritms?

A Simple Tree Model

Deciding How to Split Trees

The stopping criteria for controlling tree growth

Tree Entropy and Information Gain

Pros and Cons of Decision Trees

Tree Overfitting

Pruning Trees

Decision Trees for Classification

Conditional Inference Trees

Conditional Inference Tree Classification

Building a decision tree in Python

Model Validation

Model Improvement

Model Interpretation

Ensemble technique

Random Forest Classification

Splitting Data into Test and Train Set in Python

Choose the number of trees

Model Validation

Model Improvement

Model Interpretation

Accuracy of the model

Decision Vs Random Forest