pandas Series
The pandas library comes with two types of classes to handle data. The first that we will learn about is the pandas Series. A pandas Series can be visualized as an individual column within a spreadsheet. The column typically has a name and an index for each row associated with that column.
A pandas Series can hold any data type and all values within a Series are associated with a labeled index. The labeled index can be either label or integer-based. An example of a label-based index would be groceries or movies while an integer-based index would be numeric (1, 2, 3, etc..).
Note
While a Series is capable of holding any type of data it usually only holds one.
Note
If you would like to change the code within the blocks of code below to view different outputs and work with the code on your own, you can access them by opening up data-analysis-projects/eda-with-pandas/reading-examples in Jupyter Notebook for this section and following sections!
Creating a Series
Let’s take a look at the syntax for creating a Series using lists, dictionaries, and tuples.
Using a List
| |
The above code block accomplishes the following:
- Imports pandas as
pd. - Creates a pandas Series called
example_listby providing a list of values. - Creates a pandas Series called
series_from_existing_listby using the already existing listpre_existing_listand passing it in as a parameter to the.Series()function.
Using a Dictionary
| |
The above code block accomplishes the following:
- imports pandas as
pd. - Creates a pandas Series called
example_dictionaryby providing a dictionary. - Creates a pandas Series called
series_from_existing_dictionaryby using the already existing dictionarypre_existing_dictionaryand passing it in as a parameter to the.Series()function.
Using a Tuple
| |
The above code block accomplishes the following:
- imports pandas as
pd. - Creates a pandas Series called
example_tupleby providing a tuple with values. - Creates a pandas Series called
series_from_existing_tupleby using an already existing tuplepre_existing_tupleand passing it in as a parameter to the.Series()function.
Indexing and Naming
When creating a pandas Series you have the ability to add custom index labels and a name (sometimes also referred to as a label) for the column associated with the Series.
If you do not add custom index labels to a Series and none already exist it will default to a typical index range of 0, 1, 2, 3, 4, 5, etc... In regards to column names, if you do not add a custom name to the column it will default to none.
Example
In order to add custom index labels you can add in an additional parameter when creating the Series:
custom_index_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = ["red", "yellow", "green", "green"])Example
If you would like to add a customized column name, you would also need to add an additional parameter when creating the Series:
custom_index_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = ["red", "yellow", "green", "green"], name = "fruit")You could also use the .name() function:
custom_index_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = ["red", "yellow", "green", "green"])
custom_index_labels.name = "fruit"Tip
You can also store index labels inside of a variable as shown below:
fruit_color = ["red", "yellow", "green", "green"]
custom_fruit_labels = pd.Series(["apple", "banana", "avocado", "honey dew"], index = fruit_color)Subsetting a Series
pandas allows you to use slicing to subset a Series. You can accomplish this using bracket notation and specifying an index range. Let’s take a look at how we can do this using the dictionary created above as an example.
Example
| |
Output
0 apple
1 banana
Subset elements from index 1 to 3
1 banana
2 avocado
3 honey dew
Check Your Understanding
Question
What does index-labeling default to if none are provided?
Question
What type of data is a pandas Series capable of holidng?