python

Counting rows in a Pandas Dataframe based on column values

If you have a pandas DataFrame with a column of values and you want to count the number of rows where the value in that column is equal to a certain value, this tutorial will show you how.

import pandas as pd
  
# create a DataFrame
df = pd.DataFrame({
    'name': ['Rachel', 'Jacob', 'Dom', 'Syral', 'Naboi'],
    'city': ['Japan', 'NewYork', 'London', 'Itly', 'France'],
    'status': ['active', 'inactive', 'active', 'active', 'inactive']
})
print(df)

# get count of rows which has value - active for status column
total_active_users = df[df['status']=='active'].shape[0]

print(f'Total active users are: {total_active_users}')

Output

Total active users are: 3

The code above creates a DataFrame with three columns ('name', 'city', and 'status').

The DataFrame df is then printed.

The code then gets the count of rows that have the value 'active' in the 'status' column.

This count is then printed.

Syntax

df[df['column_name'] condition].shape(0)

Another example is to count the number of rows based on some condition applied on column values. 

import pandas as pd
  
# creating a DataFrame
df = pd.DataFrame({
    'name': ['Rachel', 'Jacob', 'Dom', 'Syral', 'Naboi'],
    'city': ['Japan', 'NewYork', 'London', 'Itly', 'France'],
    'status': ['active', 'inactive', 'active', 'active', 'inactive'],
    'salary': [3000, 5000, 2000, 7000, 4000]
})
print(df)

# get the count of rows where salary is greater than 4500
result = df[df['salary'] > 4500].shape[0]

print(f'Total count: {result}')

Output

+----+--------+---------+----------+----------+
|    | name   | city    | status   |   salary |
|----+--------+---------+----------+----------|
|  0 | Rachel | Japan   | active   |     3000 |
|  1 | Jacob  | NewYork | inactive |     5000 |
|  2 | Dom    | London  | active   |     2000 |
|  3 | Syral  | Itly    | active   |     7000 |
|  4 | Naboi  | France  | inactive |     4000 |
+----+--------+---------+----------+----------+

Total count: 2

In the above code example, we are getting the count of rows where the 'salary' column has values greater than 4500.

If you want to get the count of all rows in a pandas DataFrame then you can use the below code.

df.shape[0]

Below is the example - we will get the count of rows where name column values contain a specific substring.

import pandas as pd
  
# creating a DataFrame
df = pd.DataFrame({
    'name': ['Rachel', 'Jacob', 'Dom', 'Syral', 'Naboi'],
    'city': ['Japan', 'NewYork', 'London', 'Itly', 'France'],
    'status': ['active', 'inactive', 'active', 'active', 'inactive'],
    'salary': [3000, 5000, 2000, 7000, 4000]
})

# get count - where name column values contain the substring 'o'
result = df[df['name'].str.contains('o')].shape[0]

print(f'Total count: {result}')

Output

Total count: 3
Was this helpful?