
Machine Learning Tutorials:
Machine Learning Tutorials PART (1):
Machine learning is a subcategory of AI. Its primary goal is to make a computer get the knowledge from the data and thus make the predictions without program itself directly. For beginners in this branch, a tutorial will give them a rich understanding of what machine learning is, its types, algorithms, tools, and practical applications.
Module 1: Introduction to Machine Learning
Types of Machine Learning
Machine learning can be broadly categorized into three types:
Supervised Learning: Trains models on labeled data to predict or classify new, unseen data.
Unsupervised Learning: Finds patterns or groups in unlabeled data, like clustering or dimensionality reduction.
Reinforcement Learning: Learns through trial and error to maximize rewards, ideal for decision-making tasks.
Machine Learning Pipeline
Machine learning basically is depended on data, which is the platform to train and test models. Data consists of inputs (features) and outputs (labels). A model learns patterns during training and is tested on unseen data to evaluate its performance and generalization.When making predictions, the steps involved to produce a prediction machine learning model are illustrated.
ML workflow
Data Cleaning
Feature Scaling
Data Preprocessing in Python
Module 2: Supervised Learning
Supervised learning algorithms are generally categorized into two main types:
Classification – where the goal is to predict discrete labels or categories
Regression – where the aim is to predict continuous numerical values.
There exist numerous algorithms employed in supervised learning, each matched to various problems. Fortunately, the most commonly used supervised learning algorithms include:
1. Linear Regression
Introduction to Linear Regression
Gradient Descent in Linear Regression
Linear regression (Python Implementation from scratch)
Linear regression implementation using sklearn
Ridge Regression
Lasso regression
Elastic net Regression
Implementation of Lasso, Ridge and Elastic Net
2. Logistic Regression
Understanding Logistic Regression
Cost function in Logistic Regression
Logistic regression Implementation from scratch
Heart Disease Prediction – Project
Breast Cancer Wisconsin Diagnosis – Project
3. Decision Trees
Decision Tree in Machine Learning
Feature selection using Decision Tree
Decision Tree – Regression (Implementation)
Decision tree – Classification (Implementation)
Types of Decision tree algorithms
4. Support Vector Machines (SVM)
Understanding SVMs
Support Vector Machines(SVMs) implementation
SVM Hyperparameter Tuning – GridSearchCV
Non-Linear SVM
Implementing SVM on non-linear dataset
5. k-Nearest Neighbors (k-NN)
Introduction to KNN
Decision Boundaries in K-Nearest Neighbors (KNN)
Implementation from scratch
KNN classifier – Project
6. Naive Bayes
Introduction to naive bayes
Naive Bayes Scratch Implementation
Gaussian Naive Bayes
Implementation of Gaussian naive bayes
Multinomial Naive Bayes
Bernoulli Naive Bayes
Complement Naive Bayes
Introduction to Ensemble Learning
Ensemble learning is the way statistical grouping or accumulating so many different forecasts together results in forecasts with a smaller error mean at the same target and these decisions are then made by the forecasters.
Advanced Supervised Learning Algorithms:
7. Random Forest (Bagging Algorithm)
Introduction to Random forest
Random Forest Classifier using Scikit-learn
Random Forest Regression in Python
Hyperparameter Tuning in Random Forest
Credit Card Fraud Detection – Random Forest Classifier
Voting Classifier
8. Boosting Algorithms
Gradient Boosting in ML
XGBoost (Extreme Gradient Boosting)
LightGBM (Light Gradient Boosting Machine)
CatBoost
AdaBoost
Gradient boosting regressor and XG implementation from scratch
Calories Burnt Prediction – Project
Tuning Hyperparameters in Gradient Boosting
Box Office Revenue Prediction – Project
Medical Insurance Price Prediction – Project
Train a model using LightGBM
Train a model using CatBoost
E-commerce product recommendations using catboost – Project Implementing the AdaBoost Algorithm
Moreover, Stacking in machine learning is a ensemble learning technique involves the training of several models and the combination of their predictions through the implementation of a meta-model, which is being designed to learn the most accurate results of the individual models.
Module 3: Unsupervised learning
The unsupervised learning is again divided into three main categories based on their purpose: Clustering, Association Rule Mining, and Dimensionality Reduction. We will first see algorithms for Clustering, then dimensionality Reduction and lastly, Association.
1. Clustering
Data points are grouped by clustering algorithms into clusters depending on their differences or similarities. Clustering algorithms are distinguished by different kinds of techniques which they use to group data. These types are Centroid-based methods, Distribution-based methods, Connectivity-based methods, and Density-based methods. Let’s characterize each one:
Centroid-based Methods: Represent clusters using central points, such as centroids or medoids.
K-Means clustering
Elbow Method for optimal value of k in KMeans
Clustering Text Documents using K-Means in Scikit Learn – Project
Image Segmentation using K Means Clustering – Project
KMeans Clustering and PCA on Wine Dataset – Project
Modified versions of K- means algorithm:
K-Means++ clustering
K-Mode clustering: Theory and implementation
Fuzzy C-Means (FCM) Clustering
Image Segmentation Using Fuzzy C-Means Clustering
Distribution-based Methods:
Gaussian mixture models (GMMs)
Implementation using GMM
Expectation-Maximization Algorithm
Dirichlet process mixture models (DPMMs)
Connectivity based methods:
Hierarchical clustering
Agglomerative Clustering
Divisive clustering
Implementing Agglomerative Clustering
Affinity propagation
Density Based methods:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
OPTICS (Ordering Points To Identify the Clustering Structure)
2. Dimensionality Reduction
Dimensionality reduction is used to simplify datasets by reducing the number of features while retaining the most important information.
Principal Component Analysis (PCA)
Feature Importance in PCA
Dimensionality Reduction with PCA : Implementation
t-distributed Stochastic Neighbor Embedding (t-SNE)
Non-negative Matrix Factorization (NMF)
Handling Missing Values in NMF : Implementation
Independent Component Analysis (ICA)
Speech Separation Based On Fast ICA – Project
FastICA on 2D Point Clouds in Scikit Learn – Project
Isomap
Locally Linear Embedding (LLE)
Latent Semantic Analysis (LSA)
Autoencoders
3. Association Rule
When looking at the market basket analysis, it is important to find associations between different items in the huge datasets (e.g., discovering that those who purchase bread also buy butter). It is about the identification of the pattern only considering the frequency of items that appear in the data set generally or in pairs simultaneously.
Apriori algorithm
Implementing apriori algorithm
FP-Growth (Frequent Pattern-Growth)
ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal).