Tabular Data

Tabular data are the most common form of structured data that we use for analysis in data science. But what are tabular data and how do we organize our data in this way? Tabular data are data organized into a table. The table provides the data with structure. A table, is a two-dimensional grid of data. However, unlike a matrix, which we saw earlier, all of the elements in a table do not need to be all of the same data type Rather, all data in each column must be the same data type, which we refer to as homogenous data. However, all data in a row can have different data types, from column to column, which we refer to as heterogenous data. For example, imagine we have a table of patients at a hospital. We would have a set of rows (one for each patient) and a set of columns, (one for each attribute of the patient). Each element of data in a column must be the same data type. For example, – all of the names must be character strings. – all of the genders must be enumerations of male, female, or other genders. – all ages must be integers… – and so one. However, each row contains elements of various data types. For example, – the name “Bill” is a character string, – the gender “Male” is an enumeration – and the age “21” is an integer As we can see, each column contains only a single data type; however, each row can contain multiple data types. In data science, tabular data can be broken down into three main components: – Observations – which we locate on the rows of a table – Variables – which we locate on the columns of a table – and Relationships – which connect data in one table to data in another table We’ll discuss each of these components, in more detail, next.