Machine Learning using Python
This course is introduces you to the world of Machine Learning. The scope is life science example oriented, however it can be applied to any domain.
|
|
Course Content
Introduction to Machine Learning
Introduction to Python
Introduction to Numpy
Introduction to Matplotlib
Explore inbuilt Datasets
Prepare Training and Test Datasets
Data Preprocesing
Simple model building process
Explore classifiers
Explore cross validation strategy
Dimentionality Reduction
- Why and How of Machine Learning
- Applications in general and in life sciences
- Supervised vs Unsupervised
- Introduction to different ML classifiers
Introduction to Python
- Quick primer to Python
- Installation of python and modules (Numpy, Pandas, Matplotlib, Seaborn, Scikit)
Introduction to Numpy
- Create 1D and 2D array
- Fetch elements
- Array operations
- Shape, size, axis concept
- Series
- DataFrames
- Create series from ndarray/dict
- Fetch elements using Position/Label
- Vectorized operations and label alignment with Series
- Dataframes from Series/Dict of Dicts/Dict of lists
- Read from csv/tsv file
- Column selection, addition, deletion
- Indexing / selection
- Concepts of Loc, iloc
- Data alignment and arithmetic
- Merge tables
- Play with IRIS dataset
- Groupby
Introduction to Matplotlib
- Scatter plot
- Histogram
- Sub plots
- Line plot
- Bar plot
- Box plot
- Visualizing statistical relationships
- relplot / Scatter plot
- Line Plot
- Faceting
- Categorical scatterplots
- catplot
- Boxplot
- Beeswarm plot
- Boxen plot
- Violinplots
- Count plot
- Bar plots
- Wide/Long format data
- Pair plot
- Pair plot
- Hexbin plots
- Kernel density estimation based plots
Explore inbuilt Datasets
- Explore datasets package
- Explore IRIS dataset
- Understand bunch (Features, Targets)
- Visualize data using seaborn
Prepare Training and Test Datasets
- Explore model_selection.train_test_split module
- Prepare training and test dataset
Data Preprocesing
- Standardization (Mean removal and variance scaling)
- Understand why standardization is important?
- Explore StandardScaler class
- Scaling features to fall within a range
- MinMaxScaler
- MaxAbsScaler
- Scaling data with outliers
- Non-linear transformation
- Mapping to a Uniform distribution
- Mapping to a Gaussian distribution
- Compare the effect of different scalers on data with outliers
- Removing features with low variance
- Explore VarianceThreshold module
- Univariate feature selection
- SelectKBest
- SelectPercentile
- SelectFdr
- SelectFwe
- SelectFpr
- Univariate scoring methods
- chi2
- f_classif
- mutual_info_classif
Simple model building process
- Define model/estimator
- Fit training datasets into model
- Test using test dataset
- Evaluate Model performance
- score() function of estimator
- confusion_matrix
- Accuracy
- precision_score
- recall_score
- f1_score
- classification_report
- confusion matrix plot
Explore classifiers
- Nearest Neighbors
- Linear SVM
- RBF SVM
- DecisionTreeClassifier
- RandomForestClassifier
- LogisticRegression
- Neural Net
- AdaBoost
- Naive Bayes
- Gaussian Process
Explore cross validation strategy
- Explore model_selection.cross_val_score module
- k fold cross validation
- Exhaustive Grid Search
- k-Means Clustering
- Hierarchical Clustering
- Silhouette Coefficient
- Homogeneity, Completeness
- Optimal number clusters
- Introduction to Regression
- Simple Linear Regression
- Model Evaluation in Regression Models
- Evaluation Metrics in Regression Models
- Multiple Linear Regression
Dimentionality Reduction
- Principal component analysis (PCA)
- Breast cancer classification based on omics data from TCGA