Count NaN values in a Pandas DataFrame
A DataFrame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Each row has a unique identifier called an index. A DataFrame can have multiple columns, each of which can hold a different type of data. NaN, or "Not a Number", is a numeric data type that is used to represent missing data. NaN values are often generated when data is missing, or when there is an error in the data.
In this post, we will explain how to get the count of NaN values in an entire DataFrame or in a column or row.
Count NaN values in entire DataFrame
If you want to find the number of NaN values in a pandas dataframe, you can use the isna() and sum() functions together. The isna() function will return True for every element that is NaN, and sum() will then count the number of True values.
Code example 1 - using isna() and sum() functions
import pandas as pd
data = {
"col_1": [10, None, 13, None, None, 40],
"col_2": [5, 10, None, 15, 20, None],
"col_3": [20, 30, 50, None, None, None]
}
df = pd.DataFrame(data)
print(df)
# this code will count the NaN values in the entire DataFrame
result = df.isna().sum().sum()
print("Total NaN values: ", result)
Output
+----+---------+---------+---------+
| | col_1 | col_2 | col_3 |
|----+---------+---------+---------|
| 0 | 10 | 5 | 20 |
| 1 | nan | 10 | 30 |
| 2 | 13 | nan | 50 |
| 3 | nan | 15 | nan |
| 4 | nan | 20 | nan |
| 5 | 40 | nan | nan |
+----+---------+---------+---------+
Total NaN values: 8
Explanation of the above code example
- We create a dictionary with three key-value pairs, where the keys are "col_1", "col_2", and "col_3" and the values are lists of numbers.
- We use the dictionary to create a pandas DataFrame called df.
- We print the DataFrame df.
- We use the DataFrame's isna() function to create a new DataFrame of booleans, then use the sum() method to sum up all the True values.
- We print the total number of NaN values.
Code example 2 - using isnull() and sum() functions
import pandas as pd
data = {
"col_1": [10, None, 13, None, None, 40],
"col_2": [5, 10, None, 15, 20, None],
"col_3": [20, 30, 50, None, None, None]
}
df = pd.DataFrame(data)
result = df.isnull().sum().sum()
print("Total NaN values: ", result)
Output
Total NaN values: 8
Code example 3 - Using axis and sum() function
import pandas as pd
data = {
"col_1": [10, None, 13, None, None, 40],
"col_2": [5, 10, None, 15, 20, None],
"col_3": [20, 30, 50, None, None, None]
}
df = pd.DataFrame(data)
# Get total NaN values in every column
res = df.isnull().sum(axis = 0).sum()
print("Total NaN values in every column: ", res)
Output
Total NaN values in every column: 8
Total NaN values in every row
res = df.isnull().sum(axis = 1).sum()
print("Total NaN values in every row: ", res)
Output
Total NaN values in every row: 8
Count NaN values in a specific column of Dataframe
In Python, the isna() function can be used to check for missing values in a specific column of a Dataframe. The sum() function can then be used to count the number of missing values in that column. Here, we will show you to get the count of NaN values in a specific column.
Syntax
df['column_name'].isna().sum()
# or
df['column_name'].isnull().sum()
Code example
import pandas as pd
data = {
"col_1": [10, None, 13, None, None, 40],
"col_2": [5, 10, None, 15, 20, None],
"col_3": [20, 30, 50, None, None, None]
}
df = pd.DataFrame(data)
nan_count = df['col_2'].isna().sum()
print("Total NaN values: ", nan_count)
Output
Total NaN values: 2
Count NaN values in a specific row of DataFrame
In order to count the number of NaN values in a specific row of a DataFrame, we need to first locate the row with the desired index, and then count the number of NaN values in that row.
Syntax
data.loc[row_index, :].isnull().sum()
# or
data.loc[row_index, :].isna().sum()
Code example - using row index
import pandas as pd
data = {
"col_1": [10, None, 13, None, None, 40],
"col_2": [5, 10, None, 15, 20, None],
"col_3": [20, 30, 50, None, None, None]
}
df = pd.DataFrame(data)
print(df)
res = df.loc[1, :].isnull().sum()
print(res)
Output
╒════╤═════════╤═════════╤═════════╕
│ │ col_1 │ col_2 │ col_3 │
╞════╪═════════╪═════════╪═════════╡
│ 0 │ 10 │ 5 │ 20 │
├────┼─────────┼─────────┼─────────┤
│ 1 │ nan │ 10 │ 30 │
├────┼─────────┼─────────┼─────────┤
│ 2 │ 13 │ nan │ 50 │
├────┼─────────┼─────────┼─────────┤
│ 3 │ nan │ 15 │ nan │
├────┼─────────┼─────────┼─────────┤
│ 4 │ nan │ 20 │ nan │
├────┼─────────┼─────────┼─────────┤
│ 5 │ 40 │ nan │ nan │
╘════╧═════════╧═════════╧═════════╛
Total NaN values in 2nd row: 1
To get the count of NaN values in the 6th row you can use the below code
res = df.loc[5, :].isnull().sum()
Output
Total NaN values in 6th row: 2
- Replace NAN values in Pandas dataframe
- Get the count of rows and columns of a Pandas DataFrame
- Python check NaN values with and without using packages
- Pandas - How to check whether a pandas DataFrame is empty
- Pandas - Delete,Remove,Drop, column from pandas DataFrame
- remove rows base on NaN of specific column
- Check if a column contains zero values only in Pandas DataFrame