python

Get all rows that contain a substring in Pandas DataFrame

A Pandas DataFrame is a two-dimensional data structure that can store data of various types. It is similar to a SQL table or an Excel spreadsheet. We can use the Pandas DataFrame to find all rows that contain a substring.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'subject': ['Physics', 'Chemistry', 'English', 'Arts', 'Math', 'Ecomomics'],
    'score': [90, 89, 93, 76, 82, 87],
    'department': ['science', 'science', 'literature', 'arts', 'science', 'commerce']
})

print('ORIGINAL DATAFRAME')
print(df)

# get all rows that contain a substring - sci
df = df[df['department'].str.contains('sci')]

print('RESULT DATAFRAME')
print(df)

Output

ORIGINAL DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│    │ subject   │   score │ department   │
╞════╪═══════════╪═════════╪══════════════╡
│  0 │ Physics   │      90 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  1 │ Chemistry │      89 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  2 │ English   │      93 │ literature   │
├────┼───────────┼─────────┼──────────────┤
│  3 │ Arts      │      76 │ arts         │
├────┼───────────┼─────────┼──────────────┤
│  4 │ Math      │      82 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  5 │ Ecomomics │      87 │ commerce     │
╘════╧═══════════╧═════════╧══════════════╛

RESULT DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│    │ subject   │   score │ department   │
╞════╪═══════════╪═════════╪══════════════╡
│  0 │ Physics   │      90 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  1 │ Chemistry │      89 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  4 │ Math      │      82 │ science      │
╘════╧═══════════╧═════════╧══════════════╛

To do this, we will first create a DataFrame with some dummy data. We will then use the Pandas "str" function to find all rows that contain the substring "sci".

The code that is used to get the rows that contain a substring is as below:

df[df['department'].str.contains('sci')]

The code above returns a DataFrame of all rows where the department column contains the string 'sci'.

Use DataFrame.str.contains() function to get all the rows contains a substring

The DataFrame.str.contains() function is a convenient way to get all the rows that contain a substring. This can be useful when you want to find all the rows that contain a certain word or phrase.

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'subject': ['Physics', 'Chemistry', 'English', 'Arts', 'Math', 'Ecomomics'],
    'score': [90, 89, 93, 76, 82, 87],
    'department': ['science', 'science', 'literature', 'arts', 'science', 'commerce']
})

print('ORIGINAL DATAFRAME')
print(df)

# get all rows that contain a substring - art
df = df[df['department'].str.contains('art')]

print('RESULT DATAFRAME')
print(df)

Output

ORIGINAL DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│    │ subject   │   score │ department   │
╞════╪═══════════╪═════════╪══════════════╡
│  0 │ Physics   │      90 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  1 │ Chemistry │      89 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  2 │ English   │      93 │ literature   │
├────┼───────────┼─────────┼──────────────┤
│  3 │ Arts      │      76 │ arts         │
├────┼───────────┼─────────┼──────────────┤
│  4 │ Math      │      82 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  5 │ Ecomomics │      87 │ commerce     │
╘════╧═══════════╧═════════╧══════════════╛

RESULT DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│    │ subject   │   score │ department   │
╞════╪═══════════╪═════════╪══════════════╡
│  3 │ Arts      │      76 │ arts         │
╘════╧═══════════╧═════════╧══════════════╛

Using the above code example we are getting the rows where the department column value contains the substring art.

Get all rows where a column contains substring 1 or substring 2

If you're working with a Pandas DataFrame and you need to get all of the rows where a column contains either substring 1 or substring 2, you can use the | inside the str.contains() function to achieve this. Takes a look at the below code example to understand this.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'subject': ['Physics', 'Chemistry', 'English', 'Arts', 'Math', 'Ecomomics'],
    'score': [90, 89, 93, 76, 82, 87],
    'department': ['science', 'science', 'literature', 'arts', 'science', 'commerce']
})

print('ORIGINAL DATAFRAME')
print(df)

# get all rows that contain a substring - art or comm
df = df[df['department'].str.contains('art|comm')]

print('RESULT DATAFRAME')
print(df)

The code will get the row where the department column contains either art or comm substrings.

df[df['department'].str.contains('art|comm')]

Output

ORIGINAL DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│    │ subject   │   score │ department   │
╞════╪═══════════╪═════════╪══════════════╡
│  0 │ Physics   │      90 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  1 │ Chemistry │      89 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  2 │ English   │      93 │ literature   │
├────┼───────────┼─────────┼──────────────┤
│  3 │ Arts      │      76 │ arts         │
├────┼───────────┼─────────┼──────────────┤
│  4 │ Math      │      82 │ science      │
├────┼───────────┼─────────┼──────────────┤
│  5 │ Ecomomics │      87 │ commerce     │
╘════╧═══════════╧═════════╧══════════════╛

RESULT DATAFRAME
╒════╤═══════════╤═════════╤══════════════╕
│    │ subject   │   score │ department   │
╞════╪═══════════╪═════════╪══════════════╡
│  3 │ Arts      │      76 │ arts         │
├────┼───────────┼─────────┼──────────────┤
│  5 │ Ecomomics │      87 │ commerce     │
╘════╧═══════════╧═════════╧══════════════╛

We can also perform one or multiple substring searches in a column of Pandas DataFrame using the code examples explained in this post.

Was this helpful?