Pandas query() Method: Querying DataFrame in Python

Python‘s Pandas library is well known for its ability to store and retrieve data efficiently. With the help of DataFrames, Pandas makes data visualization and manipulation easy as well. In this article, we’ll look at one of Pandas’ crucial functions, query() which allows us to filter and display certain records from the DataFrame according to our need, similar to using a Database system with a query language. We will be looking at Querying a DataFrame using query() with some examples.

Also Read: 4 Ways to Add a Column to DataFrame in Pandas (With Examples)

Pandas DataFrame query() Method

To put it simply, the query() function is used to ask queries to a DataFrame and obtain the corresponding data from the DataFrame. It helps us select, filter and do much more with data by passing a string (the query) into the query() method. Before we look at some examples, let’s understand the syntax of query().

Syntax:

DataFrame.query(expr, *, inplace=False, **kwargs)
  • expr: Takes in a string containing the query. We can use variables in our queries by prefixing them with an ‘@’ symbol. It is important to note that while using column names to query, they must not include Python keywords such as – list, and, for etc.
  • inplace: A boolean value that is used to indicate whether the data returned after querying should made into a new separate DataFrame or whether query() should modify the existing DataFrame.
  • **kwargs: Keyword arguments that are accepted query() which can be used with eval().

Now that we have understood what query() is, we’ll look at some examples.

Examples of Querying DataFrame Using query() Method

In this section, we’ll mainly focus on understanding query() with some examples of different queries we can input. We will start by importing pandas and creating a DataFrame to query.

Example DataFrame:

import pandas as pd

data = {'Id': [101,102,103,104],
        'Name': ['Tony', 'Natasha', 'Steve', 'Bruce'],
        'Marks': [89, 93, 72, 68],
        'Grade': ['B','A','C','D']
        }

df = pd.DataFrame(data)
print(df)

To create our DataFrame df, we first created a dictionary data containing four fields or keys which became the columns of our DataFrame and a list of values, each of length four, which became our rows in the DataFrame. We have converted the dictionary into a DataFrame using the DataFrame() method. For better visualisation, we print our DataFrame.

Output:

DataFrame df

Now we can look at some queries.

Example 1

Here we’ll look at a query to select rows containing marks greater than 70 from the DataFrame.

result = df.query("Marks > 70")
print(result)

Output:

Query To Select Marks Greater Than 70 From DataFrame
Query to select Marks greater than 70 from DataFrame

We can observe that the row containing Marks=68 has been omitted from the output.

Example 1.1: Empty DataFrame

In the event that the query passed has no matching results in the table, an empty DataFrame is returned

result = df.query("Marks < 60")
print(result)

Output:

Query With No Matching Results In DataFrame

Since all entries of the Marks field are greater than 60, no results match the query.

Example 2: Using the @ symbol

The @ symbol is used as a prefix on variable names which helps us directly use the variables we’ve declared while querying the DataFrame.

name = "Natasha"
result = df.query("Name == @name")
print(result)

Here we have directly used the variable name in our query by prefixing it with an @ symbol.

Output:

Using Variables In A Query By Prefixing Them With @

Example 3

Here we’ll look at a more challenging query to deepen our understanding of query(). We will write a query to select rows containing names that start with N or end with e.

result = df.query("Name.str.startswith('N') or Name.str.endswith('e')")
print(result)

Here we have used the in-built Python functions startswith() and endswith() which return the strings that start and end with the substrings passed into these functions, respectively. Note that the substrings entered are case-sensitive. We use the or keyword in the query to display the data if either of the two conditions is true.

Output:

Query Using String Methods For Filtering Names From The DataFrame

Here, the row containing the name Natasha is returned as it starts with the substring N and rows containing the names Bruce and Steve are returned as they end with the substring e.

Conclusion

Querying a DataFrame can be an efficient way to filter out data and modify our DataFrame. In this article, we have seen how we can select different rows based on single and multiple conditions with some supporting examples for better understanding. We have also seen how we can use variables to query a DataFrame. These simple methods can prove useful when it comes to data visualization and working with complex DataFrames.

Reference

https://stackoverflow.com/questions/71503386/query-data-frame-in-python-pandas-cant-save-query/71503434

Nandana Pradosh
Nandana Pradosh
Articles: 27