Data science is founded upon making observations
of the world around us.
But what are observations and how do we record
them in tabular data?
An observation is a recording of the qualities
and quantities of an observable phenomenon
in the natural world.
This includes what we can see, hear, feel,
or measure with sensors.
In data science, we record observations on
the rows of a table.
The rows are the horizontal groups of data
that are contained within the table.
For example, imagine that we are recording
the vital signs of a patient at a hospital.
For each observation, we would record: the
date of the observation, the patient’s heart
rate, their temperature, and other vitals.
Each of these observations would be recorded
on a separate row.
What is important to note is that all of the
elements in a row of data belong to the same observation.
For example, a row of data can be an observation
of a person, a place, a thing, or a set of
sensor readings at a specific time.
In data science, we want each row to contain
one and only one observation.
Essentially, each row should record one, and
only one, person, place, or thing begin observed
at a given time.
Outside of data science, the rows of a table
of data go by various names.
First, you may hear them simply referred to
as “rows”, for, well, obvious reasons, I guess.
In computer science, they are often referred
to as a tuples (or a tupples), which is a
mathematical term for a list of data.
Or you may often hear them referred to as
“records”, because they store a recording
of an observation, an entity, or a transaction
of some kind.
No matter what they are called, observations
should always be represented as rows in tabular data.