Filtering a Pandas DataFrame includes selecting the particular rows that meet exact conditions. It is particularly useful for a wide range of use cases in Python. In time series data we can filter the data on particular time intervals. Also, to improve the data quality we can do filtering to remove rows with missing data which will help in further analysis. Also, when we work with machine learning we have to filter rows on the basis of certain features which creates subsets for testing.
In this article, we’ll look at 5 methods to Filter Pandas DataFrame in Python so that you can use the option that best suits you. Let’s get started.
Method for Filtering Pandas DataFrame in Python
For Filtering Pandas DataFrame let’s first create a DataFrame using a CSV file.
Example :
import pandas as pd
df = pd.read_csv('recent-grads.csv')
print(df.shape)
df.head()
Here we first imported the pandas as pd, then created the DatFrame using a CSV file, the name of the CSV file is recent-grads.csv and after that, we printed the df.shape and head of the DataFrame using df.head().
Output:
Also Read: Create a Pandas DataFrame from Lists
Methods for Filtering Pandas DataFrame in Python are given below:
- Filtering DataFrame Using Boolean Masking
- Filtering DataFrame Using loc() Method
- Filtering DataFrame Using query() Method
- Filtering DataFrame Using iloc() Method
- Filtering DataFrame Using isnull() Method
1. Filtering DataFrame Using Boolean Masking
We can use Boolean Mask Method for filtering the DataFrame by specifying the conditions.
Example 1:
import pandas as pd
df1 = df[ df['Major_category'] == 'Social Science' ]
print(df1.shape)
df1.head()
Here we have made a condition df[‘Major_category’] == ‘Social Science’ means anywhere the column Major_category is equal to Social Science then those rows will be returned. We saved the result in a new DataFrame df1 and then printed the size and head of the DataFrame df1.
Output:
Example 2:
import pandas as pd
filter_criteria = (df['Major_category'] == 'Social Science') & (df['ShareWomen'] >= .5)
df1m = df[filter_criteria]
print(df1m.shape)
df1m.head()
Here we have made a condition df[‘Major_category’] == ‘Social Science’ &(df[‘ShareWomen’] >= .5 means anywhere the column Major_categeory is equal to Social Science and ShareWomen is greater and equal to 0.5 i.e. the condition got true then those rows will be returned.
Output:
2. Filtering DataFrame Using loc() Method
We can use the loc() method where we will put certain conditions inside it for filtering the DataFrame. Using it we can return the specific columns that we want.
Example:
import pandas as pd
filter_criteria = (df['Major_category'] == 'Social Science') & (df['ShareWomen'] >= .5)
df2 = df.loc[filter_criteria,'Part_time']
df2.head()
Here we have returned the rows from column Part_time only which meets the condition (df[‘Major_category’] == ‘Social Science’) & (df[‘ShareWomen’] >= .5).
Output:
3. Filtering DataFrame Using query() Method
We can also use the query() method in which we can specify conditions for filtering the DataFrame.
Example:
import pandas as pd
df3 = df.query("Major_category == 'Social Science' & ShareWomen >= .5")
print(df3.shape)
df3.head()
Here we have written df3 = df.query(“Major_category == ‘Social Science’ & ShareWomen >= .5”) means anywhere the column Major_categeory is equal to Social Science and ShareWomen is greater and equal to 0.5 then those rows will be returned.
Output:
4. Filtering DataFrame Using iloc() Method
We can filter the DataFrame with the help of iloc() method where we need to specify the index of the rows and columns instead of the name.
Example:
import pandas as pd
df4 = df.iloc[ :10, 2:6]
print(df4.shape)
df4.head()
Here we have written df4 = df.iloc[ :10, 2:6] means we have returned the first 10 rows and columns from 2 to 6 into DataFrame df4.
Output:
5. Filtering DataFrame Using isnull() Method
We can filter the missing data from the DataFrame with the help of isnull() method.
Example:
import pandas as pd
df5 = df[df['Total'].isnull()]
print(df5.shape)
df5.head()
Here we returned those rows from the DataFrame where the Total column has missing values.
Output:
Summary
DataFrame filtering is a fundamental operation in data analysis, and it’s used in several cases for data exploration, analysis, and visualization. In this tutorial, we have discussed five methods to filter Pandas DataFrame with examples. After reading this tutorial, we hope you can easily filter Pandas DataFrame in Python.
Reference
https://stackoverflow.com/questions/71811188/filtering-pandas-dataframe-in-python