import pandas as pd
# create a dataframe
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"name": ["Delhi", "New York", "Mumbai", "LA", "London"],
"pincode": [100, 200, 300, 400, 500]
})
print(df)
# detete rows using index list
df.drop(df.index[[0,3,4]], inplace=True)
# print the dataframe
print(df)
Output
╒════╤══════╤══════════╤═══════════╕
│ │ id │ name │ pincode │
╞════╪══════╪══════════╪═══════════╡
│ 0 │ 1 │ Delhi │ 100 │
├────┼──────┼──────────┼───────────┤
│ 1 │ 2 │ New York │ 200 │
├────┼──────┼──────────┼───────────┤
│ 2 │ 3 │ Mumbai │ 300 │
├────┼──────┼──────────┼───────────┤
│ 3 │ 4 │ LA │ 400 │
├────┼──────┼──────────┼───────────┤
│ 4 │ 5 │ London │ 500 │
╘════╧══════╧══════════╧═══════════╛
╒════╤══════╤══════════╤═══════════╕
│ │ id │ name │ pincode │
╞════╪══════╪══════════╪═══════════╡
│ 1 │ 2 │ New York │ 200 │
├────┼──────┼──────────┼───────────┤
│ 2 │ 3 │ Mumbai │ 300 │
╘════╧══════╧══════════╧═══════════╛
There are multiple ways that can be used to delete one or multiple rows from a Pandas DataFrame. In this post, we are trying to explain multiple ways that can be used to delete the rows using an index list.
The drop() function of DataFrame can be used to remove the rows from a DataFrame using an index list. It takes the index list as a parameter and by default, the "inplace" parameter is False means if you do not want to reassign the DataFrame you have to pass it true.
Syntax
df.drop(df.index[[0,1,2]], inplace=True)
Let's understand it step by step
First, create a DataFrame using Pandas.DataFrame() function. You can create a DataFrame using multiple methods. You can read about them from the below links.
Create a DataFrame and add columns and rows to it
Create DataFrame from a Dictionary
import pandas as pd
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"product": ["A", "B", "C", "D", "E"],
"price": [100.0, 200.0, 150.0, 350.0, 600.0]
})
print(df)
Output
╒════╤══════╤═══════════╤═════════╕
│ │ id │ product │ price │
╞════╪══════╪═══════════╪═════════╡
│ 0 │ 1 │ A │ 100 │
├────┼──────┼───────────┼─────────┤
│ 1 │ 2 │ B │ 200 │
├────┼──────┼───────────┼─────────┤
│ 2 │ 3 │ C │ 150 │
├────┼──────┼───────────┼─────────┤
│ 3 │ 4 │ D │ 350 │
├────┼──────┼───────────┼─────────┤
│ 4 │ 5 │ E │ 600 │
╘════╧══════╧═══════════╧═════════╛
After creating the DataFrame, our next step is to delete the rows from the DataFrame. We will create a list here that will contain the index value of items from the DataFrame. In the above DataFrame, we have five indexes that start from 0 and end at 4.
index_list = [2,4]
df.drop(df.index[index_list], inplace=True)
print(df)
Output
╒════╤══════╤═══════════╤═════════╕
│ │ id │ product │ price │
╞════╪══════╪═══════════╪═════════╡
│ 0 │ 1 │ A │ 100 │
├────┼──────┼───────────┼─────────┤
│ 1 │ 2 │ B │ 200 │
├────┼──────┼───────────┼─────────┤
│ 3 │ 4 │ D │ 350 │
╘════╧══════╧═══════════╧═════════╛
If you have a large amount of data in your DataFrame and you are using the drop() function to remove the rows then it can be slow and performance-wise - it may not be feasible. To overcome this we will use the df.take() function means will get the indexes of the rows that are required and remove the indexes of the rows that need to be removed.
We can understand it using the below code example.
import pandas as pd
# create a dataframe
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"product": ["A", "B", "C", "D", "E"],
"price": [100, 200, 150, 350, 600]
})
print(df)
# use the below code to remove rows
remove_indexes = [2,4]
required_indexes = set(range(df.shape[0])) - set(remove_indexes)
df = df.take(list(required_indexes))
# print the dataframe
print(df)
Output
╒════╤══════╤═══════════╤═════════╕
│ │ id │ product │ price │
╞════╪══════╪═══════════╪═════════╡
│ 0 │ 1 │ A │ 100 │
├────┼──────┼───────────┼─────────┤
│ 1 │ 2 │ B │ 200 │
├────┼──────┼───────────┼─────────┤
│ 2 │ 3 │ C │ 150 │
├────┼──────┼───────────┼─────────┤
│ 3 │ 4 │ D │ 350 │
├────┼──────┼───────────┼─────────┤
│ 4 │ 5 │ E │ 600 │
╘════╧══════╧═══════════╧═════════╛
╒════╤══════╤═══════════╤═════════╕
│ │ id │ product │ price │
╞════╪══════╪═══════════╪═════════╡
│ 0 │ 1 │ A │ 100 │
├────┼──────┼───────────┼─────────┤
│ 1 │ 2 │ B │ 200 │
├────┼──────┼───────────┼─────────┤
│ 3 │ 4 │ D │ 350 │
╘════╧══════╧═══════════╧═════════╛
In the above code example
We can also use df.iloc in Pandas DataFrame to remove the rows from the list using the index list. Like the previous method here we are also getting the list of indexes that are required in the final DataFrame. Then we are passing the final indexes list to df.iloc[].
import pandas as pd
# create a dataframe
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"name": ["Joy", "Rick", "Carol", "Dumpty", "Clark"],
"designation": ["Programmer", "Manager", "Admin", "Accountant", "Designer"]
})
print(df)
remove_indexes = [0,2,3] # remove these indexes rows
required_indexes = [i for i in df.index.values if i not in remove_indexes]
df = df.iloc[required_indexes]
# print the final dataframe
print(df)
Output
╒════╤══════╤════════╤═══════════════╕
│ │ id │ name │ designation │
╞════╪══════╪════════╪═══════════════╡
│ 0 │ 1 │ Joy │ Programmer │
├────┼──────┼────────┼───────────────┤
│ 1 │ 2 │ Rick │ Manager │
├────┼──────┼────────┼───────────────┤
│ 2 │ 3 │ Carol │ Admin │
├────┼──────┼────────┼───────────────┤
│ 3 │ 4 │ Dumpty │ Accountant │
├────┼──────┼────────┼───────────────┤
│ 4 │ 5 │ Clark │ Designer │
╘════╧══════╧════════╧═══════════════╛
╒════╤══════╤════════╤═══════════════╕
│ │ id │ name │ designation │
╞════╪══════╪════════╪═══════════════╡
│ 1 │ 2 │ Rick │ Manager │
├────┼──────┼────────┼───────────────┤
│ 4 │ 5 │ Clark │ Designer │
╘════╧══════╧════════╧═══════════════╛
0 Comments