python

Select specific columns from a Pandas DataFrame

This Pandas tutorial will show you how to select two columns from a DataFrame. You can select columns by their name or by their index.

A Pandas DataFrame is a structure that represents data in a tabular format. It contains columns and rows, with each column representing a different data type.

Solution 1: Select specific columns using the columns names list

You can select specific columns from a DataFrame using the column name. For example, if you have a DataFrame with columns "A" and "B", you can select column "A" by using the column name.

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'product_name': ['p1', 'p2', 'p3', 'p4'],
    'price': [100.0, 30.0, 200.0, 140.0],
    'quantity': [14, 67, 90, 40]
});

print('Original DataFrame')
print(df)

# select specific columns
result = df[['price', 'quantity']]

print('DataFrame after selecting specific columns - price and quantity')
print(result)

Output

Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│    │ product_name   │   price │   quantity │
╞════╪════════════════╪═════════╪════════════╡
│  0 │ p1             │     100 │         14 │
├────┼────────────────┼─────────┼────────────┤
│  1 │ p2             │      30 │         67 │
├────┼────────────────┼─────────┼────────────┤
│  2 │ p3             │     200 │         90 │
├────┼────────────────┼─────────┼────────────┤
│  3 │ p4             │     140 │         40 │
╘════╧════════════════╧═════════╧════════════╛


DataFrame after selecting specific columns - price and quantity
╒════╤═════════╤════════════╕
│    │   price │   quantity │
╞════╪═════════╪════════════╡
│  0 │     100 │         14 │
├────┼─────────┼────────────┤
│  1 │      30 │         67 │
├────┼─────────┼────────────┤
│  2 │     200 │         90 │
├────┼─────────┼────────────┤
│  3 │     140 │         40 │
╘════╧═════════╧════════════╛

Walkthrough

  1. The code first creates a DataFrame called df. This DataFrame has 4 rows and 3 columns. The columns are 'product_name', 'price', and 'quantity'.
  2. The code then prints the original DataFrame. 
  3. The code then creates a new DataFrame called result. This DataFrame contains the columns 'price' and 'quantity' from the original DataFrame.
  4. The code then prints the new DataFrame

Solution 2: Select 2 columns using DataFrame.loc[] function

To select two columns from a Pandas DataFrame, you can use the .loc[] method. This method takes in a list of column names and returns a new DataFrame that contains only those columns.

For example, if you have a DataFrame with columns ['A', 'B', 'C'], you can use .loc[] to select only columns 'A' and 'B':

df.loc[:,['A','B']]

This would return a new DataFrame with only columns 'A' and 'B'.

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'product_name': ['p1', 'p2', 'p3', 'p4'],
    'price': [100.0, 30.0, 200.0, 140.0],
    'quantity': [14, 67, 90, 40]
});

print('Original DataFrame')
print(df)

result = df.loc[:,['price','quantity']]

print('DataFrame after selecting specific columns - price and quantity')
print(result)

Output

Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│    │ product_name   │   price │   quantity │
╞════╪════════════════╪═════════╪════════════╡
│  0 │ p1             │     100 │         14 │
├────┼────────────────┼─────────┼────────────┤
│  1 │ p2             │      30 │         67 │
├────┼────────────────┼─────────┼────────────┤
│  2 │ p3             │     200 │         90 │
├────┼────────────────┼─────────┼────────────┤
│  3 │ p4             │     140 │         40 │
╘════╧════════════════╧═════════╧════════════╛


DataFrame after selecting specific columns - price and quantity
╒════╤═════════╤════════════╕
│    │   price │   quantity │
╞════╪═════════╪════════════╡
│  0 │     100 │         14 │
├────┼─────────┼────────────┤
│  1 │      30 │         67 │
├────┼─────────┼────────────┤
│  2 │     200 │         90 │
├────┼─────────┼────────────┤
│  3 │     140 │         40 │
╘════╧═════════╧════════════╛

Explanation of the above code line by line

  1. import pandas as pd - this imports the pandas library as an alias
  2. df = pd.DataFrame({...}) - this creates a dataframe using the given data
  3. print('Original DataFrame') - this prints the text "Original DataFrame"
  4. print(df) - this prints the dataframe
  5. result = df.loc[:,['price','quantity']] - this selects the columns "price" and "quantity" from the dataframe
  6. print('DataFrame after selecting specific columns - price and quantity') - this prints the text "DataFrame after selecting specific columns - price and quantity"
  7. print(result) - this prints the selected columns from the dataframe

Solution 3: Select specific columns from DataFrame using iloc[] function

Pandas is a powerful Python data analysis toolkit that provides a wide range of functions for working with data. One of the most useful functions in Pandas is the iloc[] function. This function allows you to select specific columns from a DataFrame using their integer position.

For example, if you have a DataFrame with three columns, you can use the iloc[] function to select the second column like this:

df.iloc[:,1]

This would return the second column of the DataFrame as a Series. You can also use the iloc[] function to select multiple columns by passing in a list of column positions:

df.iloc[:, [0,2]]

This would return the first and third columns of the DataFrame as a new DataFrame.

The iloc[] function is a powerful tool that can be used to select data from a DataFrame in a variety of ways. Experiment with it to see how it can be used to select the data you need.

Code example

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'product_name': ['p1', 'p2', 'p3', 'p4'],
    'price': [100.0, 30.0, 200.0, 140.0],
    'quantity': [14, 67, 90, 40]
});

print('Original DataFrame')
print(df)

# select first two columns
result = df.iloc[:, 0:2]

print('Selected first two columns')
print(result)

Output

Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│    │ product_name   │   price │   quantity │
╞════╪════════════════╪═════════╪════════════╡
│  0 │ p1             │     100 │         14 │
├────┼────────────────┼─────────┼────────────┤
│  1 │ p2             │      30 │         67 │
├────┼────────────────┼─────────┼────────────┤
│  2 │ p3             │     200 │         90 │
├────┼────────────────┼─────────┼────────────┤
│  3 │ p4             │     140 │         40 │
╘════╧════════════════╧═════════╧════════════╛


Selected first two columns
╒════╤════════════════╤═════════╕
│    │ product_name   │   price │
╞════╪════════════════╪═════════╡
│  0 │ p1             │     100 │
├────┼────────────────┼─────────┤
│  1 │ p2             │      30 │
├────┼────────────────┼─────────┤
│  2 │ p3             │     200 │
├────┼────────────────┼─────────┤
│  3 │ p4             │     140 │
╘════╧════════════════╧═════════╛

The python code above creates a DataFrame using the pandas library. The DataFrame consists of four columns, 'product_name', 'price', 'quantity'. The code then selects the first two columns of the DataFrame using the DataFrame.iloc[] method and prints the result.

To select columns price and quantity, you can use the below code example.

result = df.iloc[:, 1:3]

Output

╒════╤═════════╤════════════╕
│    │   price │   quantity │
╞════╪═════════╪════════════╡
│  0 │     100 │         14 │
├────┼─────────┼────────────┤
│  1 │      30 │         67 │
├────┼─────────┼────────────┤
│  2 │     200 │         90 │
├────┼─────────┼────────────┤
│  3 │     140 │         40 │
╘════╧═════════╧════════════╛
Was this helpful?