What types of data exist in data science and
how do we classify them?
In data science, there are two main types
of data: categorical data and numerical data.
These are the two most common types of data
you will encounter in data science and the
most common way of classifying or grouping
the various types of data.
You’ll encounter them quite frequently in
data science, so it’s important that you clearly
understand the distinction between the two.
So, let’s spend some time learning about each
of them in more detail.
Categorical data represent named qualities
of an observed phenomenon.
This includes using words to describe the
names or properties of objects, like their
color, shape, and texture.
For example, the color of an apple is red.
The word “red” describes the quality of the
color of the apple.
In data science, we refer to categorical data
as “qualitative data” since they describe
the quality of the thing they represent.
However, most beginners more intuitively understand
the term “categorical” rather than “qualitative”,
so we’ll continue referring to this type of
data as “categorical” data.
Numerical data represents measured quantities
of an observed phenomenon.
This includes using numbers to describe the
measurement of objects
like their size, weight, and velocity.
For example, the price of 6 apples is $2.00.
“Six” represents the quantity of apples and
“$2.00” represents the price of the apples.
In data science, we refer to numerical data
as “quantitative data” since they describe
the quantity of the thing they represent.
However, because most beginners often confuse
the terms “qualitative” and “quantitative”
we’re going to continue referring to this
type of data as “numerical” data.
Categorical and numerical data can be further
divided into four subtypes.
Categorical data can be divided into nominal
and ordinal data.
And numerical data can be divided into interval
and ratio data.
We’ll take a look at each of these four subtypes
of data, next.