Select specific columns from a Pandas DataFrame
This Pandas tutorial will show you how to select two columns from a DataFrame. You can select columns by their name or by their index.
A Pandas DataFrame is a structure that represents data in a tabular format. It contains columns and rows, with each column representing a different data type.
Solution 1: Select specific columns using the columns names list
You can select specific columns from a DataFrame using the column name. For example, if you have a DataFrame with columns "A" and "B", you can select column "A" by using the column name.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'product_name': ['p1', 'p2', 'p3', 'p4'],
'price': [100.0, 30.0, 200.0, 140.0],
'quantity': [14, 67, 90, 40]
});
print('Original DataFrame')
print(df)
# select specific columns
result = df[['price', 'quantity']]
print('DataFrame after selecting specific columns - price and quantity')
print(result)
Output
Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│ │ product_name │ price │ quantity │
╞════╪════════════════╪═════════╪════════════╡
│ 0 │ p1 │ 100 │ 14 │
├────┼────────────────┼─────────┼────────────┤
│ 1 │ p2 │ 30 │ 67 │
├────┼────────────────┼─────────┼────────────┤
│ 2 │ p3 │ 200 │ 90 │
├────┼────────────────┼─────────┼────────────┤
│ 3 │ p4 │ 140 │ 40 │
╘════╧════════════════╧═════════╧════════════╛
DataFrame after selecting specific columns - price and quantity
╒════╤═════════╤════════════╕
│ │ price │ quantity │
╞════╪═════════╪════════════╡
│ 0 │ 100 │ 14 │
├────┼─────────┼────────────┤
│ 1 │ 30 │ 67 │
├────┼─────────┼────────────┤
│ 2 │ 200 │ 90 │
├────┼─────────┼────────────┤
│ 3 │ 140 │ 40 │
╘════╧═════════╧════════════╛
Walkthrough
- The code first creates a DataFrame called df. This DataFrame has 4 rows and 3 columns. The columns are 'product_name', 'price', and 'quantity'.
- The code then prints the original DataFrame.
- The code then creates a new DataFrame called result. This DataFrame contains the columns 'price' and 'quantity' from the original DataFrame.
- The code then prints the new DataFrame.
Solution 2: Select 2 columns using DataFrame.loc[] function
To select two columns from a Pandas DataFrame, you can use the .loc[] method. This method takes in a list of column names and returns a new DataFrame that contains only those columns.
For example, if you have a DataFrame with columns ['A', 'B', 'C'], you can use .loc[] to select only columns 'A' and 'B':
df.loc[:,['A','B']]
This would return a new DataFrame with only columns 'A' and 'B'.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'product_name': ['p1', 'p2', 'p3', 'p4'],
'price': [100.0, 30.0, 200.0, 140.0],
'quantity': [14, 67, 90, 40]
});
print('Original DataFrame')
print(df)
result = df.loc[:,['price','quantity']]
print('DataFrame after selecting specific columns - price and quantity')
print(result)
Output
Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│ │ product_name │ price │ quantity │
╞════╪════════════════╪═════════╪════════════╡
│ 0 │ p1 │ 100 │ 14 │
├────┼────────────────┼─────────┼────────────┤
│ 1 │ p2 │ 30 │ 67 │
├────┼────────────────┼─────────┼────────────┤
│ 2 │ p3 │ 200 │ 90 │
├────┼────────────────┼─────────┼────────────┤
│ 3 │ p4 │ 140 │ 40 │
╘════╧════════════════╧═════════╧════════════╛
DataFrame after selecting specific columns - price and quantity
╒════╤═════════╤════════════╕
│ │ price │ quantity │
╞════╪═════════╪════════════╡
│ 0 │ 100 │ 14 │
├────┼─────────┼────────────┤
│ 1 │ 30 │ 67 │
├────┼─────────┼────────────┤
│ 2 │ 200 │ 90 │
├────┼─────────┼────────────┤
│ 3 │ 140 │ 40 │
╘════╧═════════╧════════════╛
Explanation of the above code line by line
- import pandas as pd - this imports the pandas library as an alias
- df = pd.DataFrame({...}) - this creates a dataframe using the given data
- print('Original DataFrame') - this prints the text "Original DataFrame"
- print(df) - this prints the dataframe
- result = df.loc[:,['price','quantity']] - this selects the columns "price" and "quantity" from the dataframe
- print('DataFrame after selecting specific columns - price and quantity') - this prints the text "DataFrame after selecting specific columns - price and quantity"
- print(result) - this prints the selected columns from the dataframe
Solution 3: Select specific columns from DataFrame using iloc[] function
Pandas is a powerful Python data analysis toolkit that provides a wide range of functions for working with data. One of the most useful functions in Pandas is the iloc[] function. This function allows you to select specific columns from a DataFrame using their integer position.
For example, if you have a DataFrame with three columns, you can use the iloc[] function to select the second column like this:
df.iloc[:,1]
This would return the second column of the DataFrame as a Series. You can also use the iloc[] function to select multiple columns by passing in a list of column positions:
df.iloc[:, [0,2]]
This would return the first and third columns of the DataFrame as a new DataFrame.
The iloc[] function is a powerful tool that can be used to select data from a DataFrame in a variety of ways. Experiment with it to see how it can be used to select the data you need.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'product_name': ['p1', 'p2', 'p3', 'p4'],
'price': [100.0, 30.0, 200.0, 140.0],
'quantity': [14, 67, 90, 40]
});
print('Original DataFrame')
print(df)
# select first two columns
result = df.iloc[:, 0:2]
print('Selected first two columns')
print(result)
Output
Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│ │ product_name │ price │ quantity │
╞════╪════════════════╪═════════╪════════════╡
│ 0 │ p1 │ 100 │ 14 │
├────┼────────────────┼─────────┼────────────┤
│ 1 │ p2 │ 30 │ 67 │
├────┼────────────────┼─────────┼────────────┤
│ 2 │ p3 │ 200 │ 90 │
├────┼────────────────┼─────────┼────────────┤
│ 3 │ p4 │ 140 │ 40 │
╘════╧════════════════╧═════════╧════════════╛
Selected first two columns
╒════╤════════════════╤═════════╕
│ │ product_name │ price │
╞════╪════════════════╪═════════╡
│ 0 │ p1 │ 100 │
├────┼────────────────┼─────────┤
│ 1 │ p2 │ 30 │
├────┼────────────────┼─────────┤
│ 2 │ p3 │ 200 │
├────┼────────────────┼─────────┤
│ 3 │ p4 │ 140 │
╘════╧════════════════╧═════════╛
The python code above creates a DataFrame using the pandas library. The DataFrame consists of four columns, 'product_name', 'price', 'quantity'. The code then selects the first two columns of the DataFrame using the DataFrame.iloc[] method and prints the result.
To select columns price and quantity, you can use the below code example.
result = df.iloc[:, 1:3]
Output
╒════╤═════════╤════════════╕
│ │ price │ quantity │
╞════╪═════════╪════════════╡
│ 0 │ 100 │ 14 │
├────┼─────────┼────────────┤
│ 1 │ 30 │ 67 │
├────┼─────────┼────────────┤
│ 2 │ 200 │ 90 │
├────┼─────────┼────────────┤
│ 3 │ 140 │ 40 │
╘════╧═════════╧════════════╛
- Reorder dataframe columns using column names in pandas
- Create pandas DataFrame and add columns and rows to it
- Change the name of columns in a pandas dataframe
- Delete one or multiple columns from Pandas Dataframe
- Get the count of rows and columns of a Pandas DataFrame
- Sort a DataFrame by rows and columns in Pandas
- Replace DataFrame column values with a specific value