pandas Series
The pandas library comes with two types of classes to handle data. The first that we will learn about is the pandas Series. A pandas Series can be visualized as an individual column within a spreadsheet. The column typically has a name and an index for each row associated with that column.
A pandas Series can hold any data type and all values within a Series are associated with a labeled index. The labeled index can be either label or integer-based. An example of a label-based index would be groceries
or movies
while an integer-based index would be numeric (1
, 2
, 3
, etc..
).
While a Series is capable of holding any type of data it usually only holds one.
If you would like to change the code within the blocks of code below to view different outputs and work with the code on your own, you can access them by opening up data-analysis-projects/eda-with-pandas/reading-examples
in Jupyter Notebook for this section and following sections!
Creating a Series
Let’s take a look at the syntax for creating a Series using lists, dictionaries, and tuples.
Using a List
|
|
The above code block accomplishes the following:
- Imports pandas as
pd
. - Creates a pandas Series called
example_list
by providing a list of values. - Creates a pandas Series called
series_from_existing_list
by using the already existing listpre_existing_list
and passing it in as a parameter to the.Series()
function.
Using a Dictionary
|
|
The above code block accomplishes the following:
- imports pandas as
pd
. - Creates a pandas Series called
example_dictionary
by providing a dictionary. - Creates a pandas Series called
series_from_existing_dictionary
by using the already existing dictionarypre_existing_dictionary
and passing it in as a parameter to the.Series()
function.
Using a Tuple
|
|
The above code block accomplishes the following:
- imports pandas as
pd
. - Creates a pandas Series called
example_tuple
by providing a tuple with values. - Creates a pandas Series called
series_from_existing_tuple
by using an already existing tuplepre_existing_tuple
and passing it in as a parameter to the.Series()
function.
Indexing and Naming
When creating a pandas Series you have the ability to add custom index labels and a name (sometimes also referred to as a label) for the column associated with the Series.
If you do not add custom index labels to a Series and none already exist it will default to a typical index range of 0, 1, 2, 3, 4, 5, etc..
. In regards to column names, if you do not add a custom name to the column it will default to none
.
In order to add custom index labels you can add in an additional parameter when creating the Series:
custom_index_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = ["red", "yellow", "green", "green"])
If you would like to add a customized column name, you would also need to add an additional parameter when creating the Series
:
custom_index_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = ["red", "yellow", "green", "green"], name = "fruit")
You could also use the .name()
function:
custom_index_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = ["red", "yellow", "green", "green"])
custom_index_labels.name = "fruit"
You can also store index labels inside of a variable as shown below:
fruit_color = ["red", "yellow", "green", "green"]
custom_fruit_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = fruit_color)
Subsetting a Series
pandas allows you to use slicing to subset a Series. You can accomplish this using bracket notation and specifying an index range. Let’s take a look at how we can do this using the dictionary created above as an example.
|
|
Output
0 apple
1 banana
Subset elements from index 1 to 3
1 banana
2 avocado
3 honey dew
Check Your Understanding
What does index-labeling default to if none are provided?
What type of data is a pandas Series capable of holidng?