A Pandas DataFrame is a structure that represents data in a tabular format. It contains columns and rows, with each column representing a different data type.
You can select specific columns from a DataFrame using the column name. For example, if you have a DataFrame with columns "A" and "B", you can select column "A" by using the column name.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'product_name': ['p1', 'p2', 'p3', 'p4'],
'price': [100.0, 30.0, 200.0, 140.0],
'quantity': [14, 67, 90, 40]
});
print('Original DataFrame')
print(df)
# select specific columns
result = df[['price', 'quantity']]
print('DataFrame after selecting specific columns - price and quantity')
print(result)
Output
Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│ │ product_name │ price │ quantity │
╞════╪════════════════╪═════════╪════════════╡
│ 0 │ p1 │ 100 │ 14 │
├────┼────────────────┼─────────┼────────────┤
│ 1 │ p2 │ 30 │ 67 │
├────┼────────────────┼─────────┼────────────┤
│ 2 │ p3 │ 200 │ 90 │
├────┼────────────────┼─────────┼────────────┤
│ 3 │ p4 │ 140 │ 40 │
╘════╧════════════════╧═════════╧════════════╛
DataFrame after selecting specific columns - price and quantity
╒════╤═════════╤════════════╕
│ │ price │ quantity │
╞════╪═════════╪════════════╡
│ 0 │ 100 │ 14 │
├────┼─────────┼────────────┤
│ 1 │ 30 │ 67 │
├────┼─────────┼────────────┤
│ 2 │ 200 │ 90 │
├────┼─────────┼────────────┤
│ 3 │ 140 │ 40 │
╘════╧═════════╧════════════╛
Walkthrough
To select two columns from a Pandas DataFrame, you can use the .loc[] method. This method takes in a list of column names and returns a new DataFrame that contains only those columns.
For example, if you have a DataFrame with columns ['A', 'B', 'C'], you can use .loc[] to select only columns 'A' and 'B':
df.loc[:,['A','B']]
This would return a new DataFrame with only columns 'A' and 'B'.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'product_name': ['p1', 'p2', 'p3', 'p4'],
'price': [100.0, 30.0, 200.0, 140.0],
'quantity': [14, 67, 90, 40]
});
print('Original DataFrame')
print(df)
result = df.loc[:,['price','quantity']]
print('DataFrame after selecting specific columns - price and quantity')
print(result)
Output
Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│ │ product_name │ price │ quantity │
╞════╪════════════════╪═════════╪════════════╡
│ 0 │ p1 │ 100 │ 14 │
├────┼────────────────┼─────────┼────────────┤
│ 1 │ p2 │ 30 │ 67 │
├────┼────────────────┼─────────┼────────────┤
│ 2 │ p3 │ 200 │ 90 │
├────┼────────────────┼─────────┼────────────┤
│ 3 │ p4 │ 140 │ 40 │
╘════╧════════════════╧═════════╧════════════╛
DataFrame after selecting specific columns - price and quantity
╒════╤═════════╤════════════╕
│ │ price │ quantity │
╞════╪═════════╪════════════╡
│ 0 │ 100 │ 14 │
├────┼─────────┼────────────┤
│ 1 │ 30 │ 67 │
├────┼─────────┼────────────┤
│ 2 │ 200 │ 90 │
├────┼─────────┼────────────┤
│ 3 │ 140 │ 40 │
╘════╧═════════╧════════════╛
Explanation of the above code line by line
Pandas is a powerful Python data analysis toolkit that provides a wide range of functions for working with data. One of the most useful functions in Pandas is the iloc[] function. This function allows you to select specific columns from a DataFrame using their integer position.
For example, if you have a DataFrame with three columns, you can use the iloc[] function to select the second column like this:
df.iloc[:,1]
This would return the second column of the DataFrame as a Series. You can also use the iloc[] function to select multiple columns by passing in a list of column positions:
df.iloc[:, [0,2]]
This would return the first and third columns of the DataFrame as a new DataFrame.
The iloc[] function is a powerful tool that can be used to select data from a DataFrame in a variety of ways. Experiment with it to see how it can be used to select the data you need.
Code example
import pandas as pd
# create a dataframe
df = pd.DataFrame({
'product_name': ['p1', 'p2', 'p3', 'p4'],
'price': [100.0, 30.0, 200.0, 140.0],
'quantity': [14, 67, 90, 40]
});
print('Original DataFrame')
print(df)
# select first two columns
result = df.iloc[:, 0:2]
print('Selected first two columns')
print(result)
Output
Original DataFrame
╒════╤════════════════╤═════════╤════════════╕
│ │ product_name │ price │ quantity │
╞════╪════════════════╪═════════╪════════════╡
│ 0 │ p1 │ 100 │ 14 │
├────┼────────────────┼─────────┼────────────┤
│ 1 │ p2 │ 30 │ 67 │
├────┼────────────────┼─────────┼────────────┤
│ 2 │ p3 │ 200 │ 90 │
├────┼────────────────┼─────────┼────────────┤
│ 3 │ p4 │ 140 │ 40 │
╘════╧════════════════╧═════════╧════════════╛
Selected first two columns
╒════╤════════════════╤═════════╕
│ │ product_name │ price │
╞════╪════════════════╪═════════╡
│ 0 │ p1 │ 100 │
├────┼────────────────┼─────────┤
│ 1 │ p2 │ 30 │
├────┼────────────────┼─────────┤
│ 2 │ p3 │ 200 │
├────┼────────────────┼─────────┤
│ 3 │ p4 │ 140 │
╘════╧════════════════╧═════════╛
The python code above creates a DataFrame using the pandas library. The DataFrame consists of four columns, 'product_name', 'price', 'quantity'. The code then selects the first two columns of the DataFrame using the DataFrame.iloc[] method and prints the result.
To select columns price and quantity, you can use the below code example.
result = df.iloc[:, 1:3]
Output
╒════╤═════════╤════════════╕
│ │ price │ quantity │
╞════╪═════════╪════════════╡
│ 0 │ 100 │ 14 │
├────┼─────────┼────────────┤
│ 1 │ 30 │ 67 │
├────┼─────────┼────────────┤
│ 2 │ 200 │ 90 │
├────┼─────────┼────────────┤
│ 3 │ 140 │ 40 │
╘════╧═════════╧════════════╛
0 Comments