python

Pandas - Delete multiple rows from DataFrame using index list

One of the most common data manipulation tasks is deleting a list of rows from a DataFrame. Pandas provide a lot of options for deleting rows from a DataFrame and this post highlights some of them.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
  "id": [1, 2, 3, 4, 5],
  "name": ["Delhi", "New York", "Mumbai", "LA", "London"],
  "pincode": [100, 200, 300, 400, 500]
})

print(df)

# detete rows using index list
df.drop(df.index[[0,3,4]], inplace=True)

# print the dataframe
print(df)

Output

╒════╤══════╤══════════╤═══════════╕
│    │   id │ name     │   pincode │
╞════╪══════╪══════════╪═══════════╡
│  0 │    1 │ Delhi    │       100 │
├────┼──────┼──────────┼───────────┤
│  1 │    2 │ New York │       200 │
├────┼──────┼──────────┼───────────┤
│  2 │    3 │ Mumbai   │       300 │
├────┼──────┼──────────┼───────────┤
│  3 │    4 │ LA       │       400 │
├────┼──────┼──────────┼───────────┤
│  4 │    5 │ London   │       500 │
╘════╧══════╧══════════╧═══════════╛

╒════╤══════╤══════════╤═══════════╕
│    │   id │ name     │   pincode │
╞════╪══════╪══════════╪═══════════╡
│  1 │    2 │ New York │       200 │
├────┼──────┼──────────┼───────────┤
│  2 │    3 │ Mumbai   │       300 │
╘════╧══════╧══════════╧═══════════╛

There are multiple ways that can be used to delete one or multiple rows from a Pandas DataFrame. In this post, we are trying to explain multiple ways that can be used to delete the rows using an index list.

Delete rows from DataFrame using index list using drop() function

The drop() function of DataFrame can be used to remove the rows from a DataFrame using an index list. It takes the index list as a parameter and by default, the "inplace" parameter is False means if you do not want to reassign the DataFrame you have to pass it true.

Syntax

df.drop(df.index[[0,1,2]], inplace=True)

Let's understand it step by step

1. Create a DataFrame

First, create a DataFrame using Pandas.DataFrame() function. You can create a DataFrame using multiple methods. You can read about them from the below links.

Create a DataFrame and add columns and rows to it

Create DataFrame from a Dictionary

Create DataFrame from a List

import pandas as pd

df = pd.DataFrame({
  "id": [1, 2, 3, 4, 5],
  "product": ["A", "B", "C", "D", "E"],
  "price": [100.0, 200.0, 150.0, 350.0, 600.0]
})

print(df)

Output

╒════╤══════╤═══════════╤═════════╕
│    │   id │ product   │   price │
╞════╪══════╪═══════════╪═════════╡
│  0 │    1 │ A         │     100 │
├────┼──────┼───────────┼─────────┤
│  1 │    2 │ B         │     200 │
├────┼──────┼───────────┼─────────┤
│  2 │    3 │ C         │     150 │
├────┼──────┼───────────┼─────────┤
│  3 │    4 │ D         │     350 │
├────┼──────┼───────────┼─────────┤
│  4 │    5 │ E         │     600 │
╘════╧══════╧═══════════╧═════════╛

2. Use the drop() function to remove rows using the index list

After creating the DataFrame, our next step is to delete the rows from the DataFrame. We will create a list here that will contain the index value of items from the DataFrame. In the above DataFrame, we have five indexes that start from 0 and end at 4.

index_list = [2,4]

df.drop(df.index[index_list], inplace=True)

print(df)

Output

╒════╤══════╤═══════════╤═════════╕
│    │   id │ product   │   price │
╞════╪══════╪═══════════╪═════════╡
│  0 │    1 │ A         │     100 │
├────┼──────┼───────────┼─────────┤
│  1 │    2 │ B         │     200 │
├────┼──────┼───────────┼─────────┤
│  3 │    4 │ D         │     350 │
╘════╧══════╧═══════════╧═════════╛

Remove DataFrame rows from index list using take() function

If you have a large amount of data in your DataFrame and you are using the drop() function to remove the rows then it can be slow and performance-wise - it may not be feasible. To overcome this we will use the df.take() function means will get the indexes of the rows that are required and remove the indexes of the rows that need to be removed.

We can understand it using the below code example.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
  "id": [1, 2, 3, 4, 5],
  "product": ["A", "B", "C", "D", "E"],
  "price": [100, 200, 150, 350, 600]
})

print(df)

# use the below code to remove rows
remove_indexes = [2,4]
required_indexes = set(range(df.shape[0])) - set(remove_indexes)
df = df.take(list(required_indexes))

# print the dataframe
print(df)

Output

╒════╤══════╤═══════════╤═════════╕
│    │   id │ product   │   price │
╞════╪══════╪═══════════╪═════════╡
│  0 │    1 │ A         │     100 │
├────┼──────┼───────────┼─────────┤
│  1 │    2 │ B         │     200 │
├────┼──────┼───────────┼─────────┤
│  2 │    3 │ C         │     150 │
├────┼──────┼───────────┼─────────┤
│  3 │    4 │ D         │     350 │
├────┼──────┼───────────┼─────────┤
│  4 │    5 │ E         │     600 │
╘════╧══════╧═══════════╧═════════╛
╒════╤══════╤═══════════╤═════════╕
│    │   id │ product   │   price │
╞════╪══════╪═══════════╪═════════╡
│  0 │    1 │ A         │     100 │
├────┼──────┼───────────┼─────────┤
│  1 │    2 │ B         │     200 │
├────┼──────┼───────────┼─────────┤
│  3 │    4 │ D         │     350 │
╘════╧══════╧═══════════╧═════════╛

In the above code example

  1. Created a new DataFrame named df.
  2. Created a list named remove_indexes that contains indexes of the rows that need to be removed from the DataFrame.
  3. Using set() and range() Python functions we have got the indexes of rows that want in the DataFrame. And we are storing them in the variable named required_indexes. As we know that we are removing rows that have indexes 2 and 4 so it will output the set - {0, 1, 3}.
  4. Using the df.take() function, we are getting the final DataFrame and printing it.

Remove DataFrame rows using df.iloc and List comprehension function

We can also use df.iloc in Pandas DataFrame to remove the rows from the list using the index list. Like the previous method here we are also getting the list of indexes that are required in the final DataFrame. Then we are passing the final indexes list to df.iloc[].

import pandas as pd

# create a dataframe
df = pd.DataFrame({
  "id": [1, 2, 3, 4, 5],
  "name": ["Joy", "Rick", "Carol", "Dumpty", "Clark"],
  "designation": ["Programmer", "Manager", "Admin", "Accountant", "Designer"]
})
print(df)

remove_indexes = [0,2,3] # remove these indexes rows

required_indexes = [i for i in df.index.values if i not in remove_indexes]
df = df.iloc[required_indexes]

# print the final dataframe
print(df)

Output

╒════╤══════╤════════╤═══════════════╕
│    │   id │ name   │ designation   │
╞════╪══════╪════════╪═══════════════╡
│  0 │    1 │ Joy    │ Programmer    │
├────┼──────┼────────┼───────────────┤
│  1 │    2 │ Rick   │ Manager       │
├────┼──────┼────────┼───────────────┤
│  2 │    3 │ Carol  │ Admin         │
├────┼──────┼────────┼───────────────┤
│  3 │    4 │ Dumpty │ Accountant    │
├────┼──────┼────────┼───────────────┤
│  4 │    5 │ Clark  │ Designer      │
╘════╧══════╧════════╧═══════════════╛
╒════╤══════╤════════╤═══════════════╕
│    │   id │ name   │ designation   │
╞════╪══════╪════════╪═══════════════╡
│  1 │    2 │ Rick   │ Manager       │
├────┼──────┼────────┼───────────────┤
│  4 │    5 │ Clark  │ Designer      │
╘════╧══════╧════════╧═══════════════╛
Was this helpful?