DATA SCIENCE TOUTORIALS Machine Learning

Marya 7 months agoJune 5, 2025

Machine Learning Tutorials:

Machine Learning Tutorials PART (1):

Machine learning is a subcategory of AI. Its primary goal is to make a computer get the knowledge from the data and thus make the predictions without program itself directly. For beginners in this branch, a tutorial will give them a rich understanding of what machine learning is, its types, algorithms, tools, and practical applications.

Module 1: Introduction to Machine Learning

Types of Machine Learning

Machine learning can be broadly categorized into three types:

Supervised Learning: Trains models on labeled data to predict or classify new, unseen data.

Unsupervised Learning: Finds patterns or groups in unlabeled data, like clustering or dimensionality reduction.

Reinforcement Learning: Learns through trial and error to maximize rewards, ideal for decision-making tasks.

Machine Learning Pipeline

Machine learning basically is depended on data, which is the platform to train and test models. Data consists of inputs (features) and outputs (labels). A model learns patterns during training and is tested on unseen data to evaluate its performance and generalization.When making predictions, the steps involved to produce a prediction machine learning model are illustrated.

ML workflow

Data Cleaning

Feature Scaling

Data Preprocessing in Python

Module 2: Supervised Learning

Supervised learning algorithms are generally categorized into two main types:

Classification – where the goal is to predict discrete labels or categories
Regression – where the aim is to predict continuous numerical values.

There exist numerous algorithms employed in supervised learning, each matched to various problems. Fortunately, the most commonly used supervised learning algorithms include:

1. Linear Regression

Introduction to Linear Regression

Gradient Descent in Linear Regression

Linear regression (Python Implementation from scratch)

Linear regression implementation using sklearn

Ridge Regression

Lasso regression

Elastic net Regression

Implementation of Lasso, Ridge and Elastic Net

2. Logistic Regression

Understanding Logistic Regression

Cost function in Logistic Regression

Logistic regression Implementation from scratch

Heart Disease Prediction – Project

Breast Cancer Wisconsin Diagnosis – Project

3. Decision Trees

Decision Tree in Machine Learning

Feature selection using Decision Tree

Decision Tree – Regression (Implementation)

Decision tree – Classification (Implementation)

Types of Decision tree algorithms

4. Support Vector Machines (SVM)

Understanding SVMs

Support Vector Machines(SVMs) implementation

SVM Hyperparameter Tuning – GridSearchCV

Non-Linear SVM

Implementing SVM on non-linear dataset

5. k-Nearest Neighbors (k-NN)

Introduction to KNN

Decision Boundaries in K-Nearest Neighbors (KNN)

Implementation from scratch

KNN classifier – Project

6. Naive Bayes

Introduction to naive bayes

Naive Bayes Scratch Implementation

Gaussian Naive Bayes

Implementation of Gaussian naive bayes

Multinomial Naive Bayes

Bernoulli Naive Bayes

Complement Naive Bayes

Introduction to Ensemble Learning

Ensemble learning is the way statistical grouping or accumulating so many different forecasts together results in forecasts with a smaller error mean at the same target and these decisions are then made by the forecasters.

Advanced Supervised Learning Algorithms:

7. Random Forest (Bagging Algorithm)

Introduction to Random forest

Random Forest Classifier using Scikit-learn

Random Forest Regression in Python

Hyperparameter Tuning in Random Forest

Credit Card Fraud Detection – Random Forest Classifier

Voting Classifier

8. Boosting Algorithms

Gradient Boosting in ML

XGBoost (Extreme Gradient Boosting)

LightGBM (Light Gradient Boosting Machine)

CatBoost

AdaBoost

Gradient boosting regressor and XG implementation from scratch

Calories Burnt Prediction – Project

Tuning Hyperparameters in Gradient Boosting

Box Office Revenue Prediction – Project

Medical Insurance Price Prediction – Project

Train a model using LightGBM

Train a model using CatBoost

E-commerce product recommendations using catboost – Project Implementing the AdaBoost Algorithm

Moreover, Stacking in machine learning is a ensemble learning technique involves the training of several models and the combination of their predictions through the implementation of a meta-model, which is being designed to learn the most accurate results of the individual models.

Module 3: Unsupervised learning

The unsupervised learning is again divided into three main categories based on their purpose: Clustering, Association Rule Mining, and Dimensionality Reduction. We will first see algorithms for Clustering, then dimensionality Reduction and lastly, Association.

1. Clustering

Data points are grouped by clustering algorithms into clusters depending on their differences or similarities. Clustering algorithms are distinguished by different kinds of techniques which they use to group data. These types are Centroid-based methods, Distribution-based methods, Connectivity-based methods, and Density-based methods. Let’s characterize each one:

Centroid-based Methods: Represent clusters using central points, such as centroids or medoids.

K-Means clustering

Elbow Method for optimal value of k in KMeans

Clustering Text Documents using K-Means in Scikit Learn – Project

Image Segmentation using K Means Clustering – Project

KMeans Clustering and PCA on Wine Dataset – Project

Modified versions of K- means algorithm:

K-Means++ clustering

K-Mode clustering: Theory and implementation

Fuzzy C-Means (FCM) Clustering

Image Segmentation Using Fuzzy C-Means Clustering

Distribution-based Methods:

Gaussian mixture models (GMMs)

Implementation using GMM

Expectation-Maximization Algorithm

Dirichlet process mixture models (DPMMs)

Connectivity based methods:

Hierarchical clustering

Agglomerative Clustering

Divisive clustering

Implementing Agglomerative Clustering

Affinity propagation

Density Based methods:

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

OPTICS (Ordering Points To Identify the Clustering Structure)

2. Dimensionality Reduction

Dimensionality reduction is used to simplify datasets by reducing the number of features while retaining the most important information.

Principal Component Analysis (PCA)

Feature Importance in PCA

Dimensionality Reduction with PCA : Implementation

t-distributed Stochastic Neighbor Embedding (t-SNE)

Non-negative Matrix Factorization (NMF)

Handling Missing Values in NMF : Implementation

Independent Component Analysis (ICA)

Speech Separation Based On Fast ICA – Project

FastICA on 2D Point Clouds in Scikit Learn – Project

Isomap

Locally Linear Embedding (LLE)

Latent Semantic Analysis (LSA)

Autoencoders

3. Association Rule

When looking at the market basket analysis, it is important to find associations between different items in the huge datasets (e.g., discovering that those who purchase bread also buy butter). It is about the identification of the pattern only considering the frequency of items that appear in the data set generally or in pairs simultaneously.

Apriori algorithm

Implementing apriori algorithm

FP-Growth (Frequent Pattern-Growth)

ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal).

Tagged BigData, DataScience, DataScienceForBeginners, machine learning, MachineLearning, python, PythonCode, PythonForDataScience

Codedataflow