python

DataFrame.where() function in Pandas

The DataFrame.where() function is a powerful tool that allows you to select subsets of data from a DataFrame based on boolean conditions. In this article, we'll show you how to use this function to select rows from a DataFrame based on one or more conditions.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'subject': ['Math', 'Physics', 'Chemistry', 'English', 'Computer Science'],
    'score': [90, 70, 94, 67, 69]
})

print("DataFrame: ")
print(df)

# use where() function
result = df.where(df['score'] > 80)

print("Result: ")
print(result)

Output

DataFrame: 
╒════╤══════════════════╤═════════╕
│    │ subject          │   score │
╞════╪══════════════════╪═════════╡
│  0 │ Math             │      90 │
├────┼──────────────────┼─────────┤
│  1 │ Physics          │      70 │
├────┼──────────────────┼─────────┤
│  2 │ Chemistry        │      94 │
├────┼──────────────────┼─────────┤
│  3 │ English          │      67 │
├────┼──────────────────┼─────────┤
│  4 │ Computer Science │      69 │
╘════╧══════════════════╧═════════╛

Result: 
╒════╤═══════════╤═════════╕
│    │ subject   │   score │
╞════╪═══════════╪═════════╡
│  0 │ Math      │      90 │
├────┼───────────┼─────────┤
│  1 │ nan       │     nan │
├────┼───────────┼─────────┤
│  2 │ Chemistry │      94 │
├────┼───────────┼─────────┤
│  3 │ nan       │     nan │
├────┼───────────┼─────────┤
│  4 │ nan       │     nan │
╘════╧═══════════╧═════════╛

The DataFrame.where() function takes a Boolean condition as its first argument. This condition is applied to each row in the DataFrame, and only those rows where the condition evaluates to True are kept. The rows that do not meet the condition will be assigned nan values.

Syntax

DataFrame.where(condition)

The where() function in Pandas Library takes the condition as a parameter and it will be True or False based on the statement and row values.

In the code example, we have added a condition that will check if the score column has values greater than 80. If the score column value is not greater than 80 then the whole row values will be replaced by nan values.

Was this helpful?