Types of Data

What types of data exist in data science and how do we classify them? In data science, there are two main types of data: categorical data and numerical data. These are the two most common types of data you will encounter in data science and the most common way of classifying or grouping the various types of data. You’ll encounter them quite frequently in data science, so it’s important that you clearly understand the distinction between the two. So, let’s spend some time learning about each of them in more detail. Categorical data represent named qualities of an observed phenomenon. This includes using words to describe the names or properties of objects, like their color, shape, and texture. For example, the color of an apple is red. The word “red” describes the quality of the color of the apple. In data science, we refer to categorical data as “qualitative data” since they describe the quality of the thing they represent. However, most beginners more intuitively understand the term “categorical” rather than “qualitative”, so we’ll continue referring to this type of data as “categorical” data. Numerical data represents measured quantities of an observed phenomenon. This includes using numbers to describe the measurement of objects like their size, weight, and velocity. For example, the price of 6 apples is $2.00. “Six” represents the quantity of apples and “$2.00” represents the price of the apples. In data science, we refer to numerical data as “quantitative data” since they describe the quantity of the thing they represent. However, because most beginners often confuse the terms “qualitative” and “quantitative” we’re going to continue referring to this type of data as “numerical” data. Categorical and numerical data can be further divided into four subtypes. Categorical data can be divided into nominal and ordinal data. And numerical data can be divided into interval and ratio data. We’ll take a look at each of these four subtypes of data, next.