python

Change column values condition based in Pandas DataFrame

A Pandas DataFrame is a two-dimensional data structure that can store data of many different types. In a Pandas DataFrame, each column can have a different data type, and you can change the values in a column based on a condition. In this post, we will describe the methods that can be used to change column values of a Pandas DataFrame based on a condition.

import pandas as pd
from tabulate import tabulate

# create a dataframe
df = pd.DataFrame({
    'subject': ['Math', 'Physics', 'Chemistry', 'Data Structue', 'History'],
    'score': [90, 80, 85, 40, 70],
    'status': ['unknown', 'unknown', 'unknown', 'unknown', 'unknown']
})

print(df)

# change 'status' column value based on 'score' value
df.loc[(df['score'] > 50), 'status'] = 'Pass'

print(df)

Output

The first method will show the below DataFrame

╒════╤═══════════════╤═════════╤══════════╕
│    │ subject       │   score │ status   │
╞════╪═══════════════╪═════════╪══════════╡
│  0 │ Math          │      90 │ unknown  │
├────┼───────────────┼─────────┼──────────┤
│  1 │ Physics       │      80 │ unknown  │
├────┼───────────────┼─────────┼──────────┤
│  2 │ Chemistry     │      85 │ unknown  │
├────┼───────────────┼─────────┼──────────┤
│  3 │ Data Structue │      40 │ unknown  │
├────┼───────────────┼─────────┼──────────┤
│  4 │ History       │      70 │ unknown  │

The second print method will show the below DataFrame

╒════╤═══════════════╤═════════╤══════════╕
│    │ subject       │   score │ status   │
╞════╪═══════════════╪═════════╪══════════╡
│  0 │ Math          │      90 │ Pass     │
├────┼───────────────┼─────────┼──────────┤
│  1 │ Physics       │      80 │ Pass     │
├────┼───────────────┼─────────┼──────────┤
│  2 │ Chemistry     │      85 │ Pass     │
├────┼───────────────┼─────────┼──────────┤
│  3 │ Data Structue │      40 │ unknown  │
├────┼───────────────┼─────────┼──────────┤
│  4 │ History       │      70 │ Pass     │
╘════╧═══════════════╧═════════╧══════════╛

The code that we are using to change the values of the 'status' column is as below:

df.loc[(df['score'] > 50), 'status'] = 'Pass'

Based on the above code, we are checking if the value in the 'score' column is greater than 50. If the value is greater than 50 the value in the 'status' column will be replaced by the string 'Pass'. You can apply your conditions on the DataFrame based on the requirements.

The solutions that can be used to change the DataFrame column values based on some condition are as below:

Solution 1: Using DataFrame.loc[]

There are times when we need to change the values of specific columns in our DataFrame, based on certain conditions. For instance, we might want to set a value in a column to 1 if the value in another column is greater than 6. We can do this using the DataFrame.loc[] method.

DataFrame.loc[] allows us to access and modify specific values in our DataFrame, based on either the index or column label. In this example, we'll use a column label. We can set a condition, such as a column B > 6, and then specify what we want to do with the values that meet that condition, such as setting the values in column C to 1.

This is a powerful method that can be used to clean and transform data in Pandas DataFrames.

Syntax

df.loc[condition, column_name] = value

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'A': ['a', 'b', 'c', 'd', 'e'],
    'B': [9, 10, 6, 2, 8],
    'C': [0, 0, 0, 0, 0]
})

# change 'C' column value based on 'B' value
df.loc[(df['B'] > 6), 'C'] = 1

print(df)

Output

╒════╤═════╤═════╤═════╕
│    │ A   │   B │   C │
╞════╪═════╪═════╪═════╡
│  0 │ a   │   9 │   1 │
├────┼─────┼─────┼─────┤
│  1 │ b   │  10 │   1 │
├────┼─────┼─────┼─────┤
│  2 │ c   │   6 │   0 │
├────┼─────┼─────┼─────┤
│  3 │ d   │   2 │   0 │
├────┼─────┼─────┼─────┤
│  4 │ e   │   8 │   1 │
╘════╧═════╧═════╧═════╛

Import the pandas library as pd.
Create a DataFrame named df that has three columns: 'A', 'B', and 'C'.
Populate the DataFrame df with some values.
Change the value of the 'C' column to 1 if the corresponding value in the 'B' column is greater than 6.

Solution 2: Using DataFrame.where() function

In Python, we can use the DataFrame.where() function to change column values based on a condition. For example, if we have a DataFrame with two columns, "A" and "B", and we want to set all the values in column "A" to 0 if the value in column "B" is less than 0, we can use the DataFrame.where() function like this:

df['A'].where(~(df['B'] < 0), 0, inplace=True)

This will change all the values in column "A" to 0 if the value in column "B" is less than 0.

Syntax

df['column_name'].where(~(condition), other=value, inplace=True)

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'A': ['a', 'b', 'c', 'd', 'e'],
    'B': [9, 10, 6, 2, 8],
    'C': [0, 0, 0, 0, 0]
})

# change 'C' column value based on 'B' value
df['C'].where(~(df['B'] > 6), other=1, inplace=True)

print(df)

Output

╒════╤═════╤═════╤═════╕
│    │ A   │   B │   C │
╞════╪═════╪═════╪═════╡
│  0 │ a   │   9 │   1 │
├────┼─────┼─────┼─────┤
│  1 │ b   │  10 │   1 │
├────┼─────┼─────┼─────┤
│  2 │ c   │   6 │   0 │
├────┼─────┼─────┼─────┤
│  3 │ d   │   2 │   0 │
├────┼─────┼─────┼─────┤
│  4 │ e   │   8 │   1 │
╘════╧═════╧═════╧═════╛

Solution 3: Use DataFrame.mask() function to change DataFrame column values condition based

The DataFrame.mask() function can be used to change the values in a DataFrame column based on a condition. This can be useful when you want to replace certain values in a column with a different value.

Syntax

df['column_name'].mask(condition, new_value, inplace=True)

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'A': ['a', 'b', 'c', 'd', 'e'],
    'B': [9, 10, 6, 2, 8],
    'C': [0, 0, 0, 0, 0]
})

# below line will change the column values - condition based
df['C'].mask(df['B'] > 6, 1, inplace=True)

print(df)

Output

╒════╤═════╤═════╤═════╕
│    │ A   │   B │   C │
╞════╪═════╪═════╪═════╡
│  0 │ a   │   9 │   1 │
├────┼─────┼─────┼─────┤
│  1 │ b   │  10 │   1 │
├────┼─────┼─────┼─────┤
│  2 │ c   │   6 │   0 │
├────┼─────┼─────┼─────┤
│  3 │ d   │   2 │   0 │
├────┼─────┼─────┼─────┤
│  4 │ e   │   8 │   1 │

The code above creates a DataFrame with three columns ('A', 'B', 'C'). The values in column 'C' are all initialized to 0. The code then uses the mask() method to update the values in column 'C'. The mask() method takes three arguments. The first argument is a condition - in this case, the condition is df['B'] > 6. The second argument is the value to use if the condition is True - in this case, the value is 1. The third argument is the value to use if the condition is False - in this case, the value is 0. So, the code above updates the values in column 'C' to 1 if the corresponding value in column 'B' is greater than 6, and updates the values in column 'C' to 0 if the corresponding value in column 'B' is less than or equal to 6.

Was this helpful?