Machine Learning

Machine Learning using Python

This course is introduces you to the world of Machine Learning. The scope is life science example oriented, however it can be applied to any domain.

Course Content

Introduction to Machine Learning

Why and How of Machine Learning
Applications in general and in life sciences
Supervised vs Unsupervised
Introduction to different ML classifiers

Introduction to Python

Quick primer to Python
Installation of python and modules (Numpy, Pandas, Matplotlib, Seaborn, Scikit)

Introduction to Numpy

Create 1D and 2D array
Fetch elements
Array operations
Shape, size, axis concept

Introduction to Pandas

Series
DataFrames
Create series from ndarray/dict
Fetch elements using Position/Label
Vectorized operations and label alignment with Series
Dataframes from Series/Dict of Dicts/Dict of lists
Read from csv/tsv file
Column selection, addition, deletion
Indexing / selection
Concepts of Loc, iloc
Data alignment and arithmetic
Merge tables
Play with IRIS dataset
Groupby

Introduction to Matplotlib

Scatter plot
Histogram
Sub plots
Line plot
Bar plot
Box plot

Introduction to Seaborn

Visualizing statistical relationships
relplot / Scatter plot
Line Plot
Faceting
Categorical scatterplots
catplot
Boxplot
Beeswarm plot
Boxen plot
Violinplots
Count plot
Bar plots
Wide/Long format data
Pair plot
Pair plot
Hexbin plots
Kernel density estimation based plots

Explore inbuilt Datasets

Explore datasets package
Explore IRIS dataset
Understand bunch (Features, Targets)
Visualize data using seaborn

Prepare Training and Test Datasets

Explore model_selection.train_test_split module
Prepare training and test dataset

Data Preprocesing

Standardization (Mean removal and variance scaling)
Understand why standardization is important?
Explore StandardScaler class
Scaling features to fall within a range
MinMaxScaler
MaxAbsScaler
Scaling data with outliers
Non-linear transformation
Mapping to a Uniform distribution
Mapping to a Gaussian distribution
Compare the effect of different scalers on data with outliers

Feature Selection

Removing features with low variance
Explore VarianceThreshold module
Univariate feature selection
SelectKBest
SelectPercentile
SelectFdr
SelectFwe
SelectFpr
Univariate scoring methods
chi2
f_classif
mutual_info_classif

Simple model building process

Define model/estimator
Fit training datasets into model
Test using test dataset
Evaluate Model performance
- score() function of estimator
- confusion_matrix
- Accuracy
- precision_score
- recall_score
- f1_score
- classification_report
- confusion matrix plot

Explore classifiers

Nearest Neighbors
Linear SVM
RBF SVM
DecisionTreeClassifier
RandomForestClassifier
LogisticRegression
Neural Net
AdaBoost
Naive Bayes
Gaussian Process

Explore cross validation strategy

Explore model_selection.cross_val_score module
k fold cross validation

Tuning the hyper-parameters of an estimator

Exhaustive Grid Search

Clustering

k-Means Clustering
Hierarchical Clustering
Silhouette Coefficient
Homogeneity, Completeness
Optimal number clusters

Regression

Introduction to Regression
Simple Linear Regression
Model Evaluation in Regression Models
Evaluation Metrics in Regression Models
Multiple Linear Regression

Dimentionality Reduction

Principal component analysis (PCA)

Case study

Breast cancer classification based on omics data from TCGA

Register for the course