In data science, we have several common composite
data types.
We’re going to learn about the ones that you’re
most likely to encounter.
First, we have the composite data types that
represent homogenous data.
First, we have a vector, also known as an
array, which is a one-dimensional sequence
of homogenous data.
Vectors are used to store a list of elements
that are all of the same data type.
For example, character strings, which we discussed
earlier, are a sequence of characters stored
in a vector.
Second, we have a matrix, which is a two-dimensional
grid of homogenous data.
Matrices are typically used to store and process
groups of related numbers using a set of mathematical
operations known as matrix algebra.
Finally, we have a tensor, which is a three-dimensional
cube or n-dimensional hypercube of homogenous data.
Tensors are typically used to create deep
neural networks in machine learning, which
is where the deep-learning framework TensorFlow
gets its name.
Next, we have composite data types that represent
tabular data.
First, we have a dictionary, which is a two-column
table that stores a list of key-value pairs.
A dictionary, also known as a look-up table,
is used to quickly retrieve data by a unique identifier.
Second, we have a table.
A table stores data as a set of rows and columns.
Tables are the most common way you will encounter
structured data in data science.
So we’ll discuss tabular data in much more
detail in the next module.
Next, we have composite data types that represent
semi-structured data.
First, we have a tree, which organizes data
as a set of nodes and branches.
Trees are used to represent hierarchical data
(i.e. data that are organized into parent-child relationships).
Second, we have a graph, which organizes data
as a set of nodes and edges.
Graphs are used to represent a network of
data, representing each item as a node and
each relationship as an edge.
Finally, we have composite data types to represent
multimedia data.
For example:
– A body of text is essentially a long vector
of characters.
– Images are represented as a two-dimensional
matrix of pixels.
– Audio is represented as a one-dimensional
vector of sound waves over time.
– Video is represented as sound and a set
of images moving frame-by-frame over time.
– and shape data is a graph of points, lines,
and polygons used to construct geometric structures
like maps and 3D objects.
Many more composite data types exist in data
science; however, these are the ones that
you are most likely to encounter.