Get all rows that contain a substring in Pandas DataFrame
A Pandas DataFrame is a two-dimensional data structure that can store data of various types. It is similar to a SQL table or an Excel spreadsheet. We can use the Pandas DataFrame to find all rows that contain a substring.
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'subject': ['Physics', 'Chemistry', 'English', 'Arts', 'Math', 'Ecomomics'],
'score': [90, 89, 93, 76, 82, 87],
'department': ['science', 'science', 'literature', 'arts', 'science', 'commerce']
})
print('ORIGINAL DATAFRAME')
print(df)
# get all rows that contain a substring - sci
df = df[df['department'].str.contains('sci')]
print('RESULT DATAFRAME')
print(df)
Output
ORIGINAL DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│ │ subject │ score │ department │
╞════╪═══════════╪═════════╪══════════════╡
│ 0 │ Physics │ 90 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 1 │ Chemistry │ 89 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 2 │ English │ 93 │ literature │
├────┼───────────┼─────────┼──────────────┤
│ 3 │ Arts │ 76 │ arts │
├────┼───────────┼─────────┼──────────────┤
│ 4 │ Math │ 82 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 5 │ Ecomomics │ 87 │ commerce │
╘════╧═══════════╧═════════╧══════════════╛
RESULT DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│ │ subject │ score │ department │
╞════╪═══════════╪═════════╪══════════════╡
│ 0 │ Physics │ 90 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 1 │ Chemistry │ 89 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 4 │ Math │ 82 │ science │
╘════╧═══════════╧═════════╧══════════════╛
To do this, we will first create a DataFrame with some dummy data. We will then use the Pandas "str" function to find all rows that contain the substring "sci".
The code that is used to get the rows that contain a substring is as below:
df[df['department'].str.contains('sci')]
The code above returns a DataFrame of all rows where the department column contains the string 'sci'.
Use DataFrame.str.contains() function to get all the rows contains a substring
The DataFrame.str.contains() function is a convenient way to get all the rows that contain a substring. This can be useful when you want to find all the rows that contain a certain word or phrase.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'subject': ['Physics', 'Chemistry', 'English', 'Arts', 'Math', 'Ecomomics'],
'score': [90, 89, 93, 76, 82, 87],
'department': ['science', 'science', 'literature', 'arts', 'science', 'commerce']
})
print('ORIGINAL DATAFRAME')
print(df)
# get all rows that contain a substring - art
df = df[df['department'].str.contains('art')]
print('RESULT DATAFRAME')
print(df)
Output
ORIGINAL DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│ │ subject │ score │ department │
╞════╪═══════════╪═════════╪══════════════╡
│ 0 │ Physics │ 90 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 1 │ Chemistry │ 89 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 2 │ English │ 93 │ literature │
├────┼───────────┼─────────┼──────────────┤
│ 3 │ Arts │ 76 │ arts │
├────┼───────────┼─────────┼──────────────┤
│ 4 │ Math │ 82 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 5 │ Ecomomics │ 87 │ commerce │
╘════╧═══════════╧═════════╧══════════════╛
RESULT DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│ │ subject │ score │ department │
╞════╪═══════════╪═════════╪══════════════╡
│ 3 │ Arts │ 76 │ arts │
╘════╧═══════════╧═════════╧══════════════╛
Using the above code example we are getting the rows where the department column value contains the substring art.
Get all rows where a column contains substring 1 or substring 2
If you're working with a Pandas DataFrame and you need to get all of the rows where a column contains either substring 1 or substring 2, you can use the | inside the str.contains() function to achieve this. Takes a look at the below code example to understand this.
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'subject': ['Physics', 'Chemistry', 'English', 'Arts', 'Math', 'Ecomomics'],
'score': [90, 89, 93, 76, 82, 87],
'department': ['science', 'science', 'literature', 'arts', 'science', 'commerce']
})
print('ORIGINAL DATAFRAME')
print(df)
# get all rows that contain a substring - art or comm
df = df[df['department'].str.contains('art|comm')]
print('RESULT DATAFRAME')
print(df)
The code will get the row where the department column contains either art or comm substrings.
df[df['department'].str.contains('art|comm')]
Output
ORIGINAL DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│ │ subject │ score │ department │
╞════╪═══════════╪═════════╪══════════════╡
│ 0 │ Physics │ 90 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 1 │ Chemistry │ 89 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 2 │ English │ 93 │ literature │
├────┼───────────┼─────────┼──────────────┤
│ 3 │ Arts │ 76 │ arts │
├────┼───────────┼─────────┼──────────────┤
│ 4 │ Math │ 82 │ science │
├────┼───────────┼─────────┼──────────────┤
│ 5 │ Ecomomics │ 87 │ commerce │
╘════╧═══════════╧═════════╧══════════════╛
RESULT DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│ │ subject │ score │ department │
╞════╪═══════════╪═════════╪══════════════╡
│ 3 │ Arts │ 76 │ arts │
├────┼───────────┼─────────┼──────────────┤
│ 5 │ Ecomomics │ 87 │ commerce │
╘════╧═══════════╧═════════╧══════════════╛
We can also perform one or multiple substring searches in a column of Pandas DataFrame using the code examples explained in this post.
- Get a column rows as a List in Pandas Dataframe
- Get the count of rows and columns of a Pandas DataFrame
- Create pandas DataFrame and add columns and rows to it
- Loop through DataFrame rows in python pandas
- Sort a DataFrame by rows and columns in Pandas
- Pandas - Change rows order of a DataFrame using index list
- Pandas - Delete multiple rows from DataFrame using index list