Understanding Data in Data Science: Formats and Classifications Made Easy

Marya 3 months agoAugust 18, 2025

Understanding Data in Data Science: Formats and Classifications Made Easy

Understanding data in data science is the foundation of machine learning, analytics, and data-driven decision-making. It begins with knowing the types, formats, and classifications of data used in real-world applications.

What is Data?

Everything in the field of data science starts with data. Knowing what data is and the many forms and formats it might take is crucial whether you’re developing dashboards, machine learning models, or business choices.

What is Data?

To understand analytics, one must start by understanding data in data science. Raw facts, numbers, or symbols gathered by measurements, observations, or interactions are referred to as data. Data is meaningless on its own. However, after being processed and examined, it becomes data that can be used to make decisions and identify trends.

Example:
Raw data: “Maria, 28, Karachi”
Processed information: “Maria is a 28-year-old from Karachi.”

Types of Data

A key part of understanding data in data science is knowing the difference between structured and unstructured data. Whether you’re working with structured or unstructured data, understanding data in data science is critical. A key part of understanding data in data science is recognizing structured and unstructured data. Data can be defined as raw facts or figures collected from different sources. These facts can be numbers, text, images, audio, or video. In its unprocessed form, data doesn’t provide value — but with the right techniques, it becomes powerful information that drives decision-making.

There are three main types of data based on their structure and organization:

1. Structured Data

This data is arranged neatly into rows and columns according to a predetermined structure (like spreadsheets or SQL databases).

Examples:

Excel spreadsheets
SQL databases
CSV files with columns like Name, Age, City

Key Features:

Easily searchable
Stored in relational databases
Best suited for data analytics

2. Unstructured Data

There is no set structure or format for this kind of data. It is not compatible with conventional databases.

Examples:

Images and videos
Emails
Social media posts
Scanned documents
PDFs

Key Features:

Complex and harder to analyze
Requires techniques like NLP, image processing
Grows rapidly in today’s digital age (90% of data is unstructured!)

3. Semi-Structured Data

This falls within the category of unstructured and structured data. It has tags or markers to distinguish data pieces rather than adhering strictly to a tabular format.

Examples:

JSON files
XML files
NoSQL databases (like MongoDB)
HTML documents

Key Features:

Flexible and adaptable
Contains metadata
Easily converted into structured format for processing

Common Data Formats

Let’s now look at popular formats used to store and share data in data science projects:

1. CSV (Comma-Separated Values)

Simple text format
Columns separated by commas
Great for tabular data
Easily opened in Excel or imported into Python/Pandas

Example:

Name, Age, City  
Rafay, 28, Karachi  
Ahsan, 32, Lahore

2. JSON (JavaScript Object Notation)

Lightweight data-interchange format
Often used in web APIs
Supports nested data structures

{
  "name": "Maria",
  "age": 28,
  "city": "Karachi"
}

3. Excel (XLS/XLSX)

Microsoft Excel format
Can include formulas, charts, and multiple sheets
Popular among non-programmers

Pros:

User-friendly
Great for manual data entry and small-scale analysis

Cons:

Not ideal for automation or large datasets

4. SQL (Structured Query Language)

Not a file format, but a language used to interact with structured databases
Data stored in relational tables
Allows querying, updating, inserting, and deleting data

Example SQL query:

SELECT * FROM customers WHERE city = 'Karachi';

Quick Comparison Table

Type	Examples	Ideal Use
Structured	SQL, CSV, Excel	Traditional analysis, queries
Unstructured	Images, Videos, PDFs	NLP, computer vision, storage
Semi-Structured	JSON, XML, HTML	APIs, flexible storage formats

Why It Matters

Without fully understanding data in data science, models can become unreliable or misleading. Without understanding data in data science, your models may be based on flawed assumptions. Knowing your data helps you:

Choose the right model
Clean and preprocess data properly
Avoid bias and errors
Interpret results accurately

Summary:

By understanding data in data science, you gain insight into how to organize and use data effectively
Data is the backbone of data science—raw facts that become valuable when processed.
Types include structured, unstructured, and semi-structured data.
Key formats like CSV, JSON, Excel, and SQL help store and manage data in diverse ways.

Understanding these distinctions is the first step toward becoming a data expert.

Codedataflow

Understanding Data in Data Science: Formats and Classifications Made Easy

Types of Data

1. CSV (Comma-Separated Values)

2. JSON (JavaScript Object Notation)

3. Excel (XLS/XLSX)

4. SQL (Structured Query Language)

Why It Matters

Marya

Leave A Comment Cancel reply