As we know the Pandas DataFrame module is a data-structured tool for handling massive datasets in several dimensions, including CSV or Excel files.
Since a DataFrame can contain a lot of data, we often face situations where we need to identify unique values from a dataset that may contain duplicate or redundant data. So to solve this problem pandas provide us with Pandas.unique() function which can generate unique values from the DataFrame.
In this tutorial, we will see how to get a list of unique values in a single or multiple Pandas DataFrame columns. Additionally, we will also see how to get the count of each unique value and how to get the frequency of unique values in a column.
Also Read: Getting Number of Rows and Columns in Pandas DataFrame
Creating a Pandas DataFrame
For Getting Unique values in Pandas DataFrame let’s first create a DataFrame using a dictionary.
import pandas as pd
data = {'a': [1,2,1,2],
'b': [3,4,3,5 ],
'c':[1,2,3,4]
}
df = pd.DataFrame(data)
df
Here we have a dictionary data containing three key-value pairs. For converting the data dictionary into a DataFrame, we used pd and then we put the dictionary into pd.DataFrame() and stored it in the df variable. In last, we displayed the DataFrame.
Output:
Get Unique Values in Column
Finding unique values in pandas is extremely easy as there’s a pandas.unique() function that can applied directly to a column. We can access unique values within a column by accessing the column directly and then applying the unique() function.
df['a'].unique()
Output:
In the output above, we can see that our code returns us an array object with the numbers 1 and 2 in it because only the values 1 and 2 are unique in the column ‘a’.
Generate a List of Unique Values in Column
Method 1:
Now we will generate the list of unique values by turning the array of unique values into a list directly by using the tolist() function.
df['a'].unique().tolist()
Here again, we have used df to access the column and created the array of unique values using the pandas.unique() function by writing df[‘a’].unique(). Finally, we applied the tolist() method to the array which generated a list that contains the same unique values as the array but in the list format.
Output:
Method 1:
We can accomplish the exact same thing as we did above by using the list() function. So instead of appending or chaining the tolist() function to pandas.unique() function we will wrap the entire thing in the list() function.
list(df['a'].unique())
In the code above inside the list() function, we have passed the unique values array generated by the pandas.unique() function.
Output:
We can see that here we have the exact same list as when using the tolist() function.
Count Unique Values in Column
Let’s say we want to know how many unique values are in any specific Pandas DataFrame column. So we could do this by passing the array of unique values directly to the len() function.
len(df['a'].unique())
Output:
We can see that the column ‘a’ has two unique values.
Get Unique Values in Multiple Columns
Now there may be times when we need combinations of unique values across multiple columns. In order to achieve this we can use the pandas library function drop_duplicates() which helps in removing rows with duplicate values from the DataFrame. The drop_duplicate() technique is thought to be the quicker way to eliminate duplicate values when working with big DataFrame sets.
df[['a', 'b']].drop_duplicates()
The above code returned us a DataFrame with unique values stretched across columns ‘a’ and ‘b‘.
Output:
We can see that the row at index 2 has been dropped because the row at index 2 contains the same values as the row at index 0 which was 1 for column ‘a’ and 3 for column ‘b’ for both the rows. So the DataFrame we got gives us the unique combinations across columns ‘a’ and ‘b’.
Frequency of Unique Values in Column
Another thing we may be curious about is how often each unique value actually occurs within a column. For this, we can use the value_counts() function to the column of a DataFrame which will give us the frequency of each unique value in a column.
df['b'].value_counts()
So in the code above, we applied the value_counts() function to column ‘b’ of DataFrame df.
Output:
In the output above, we can see that the value 3 occurs 2 times while value 4 and 5 occurs 1 time each.
Summary
Getting unique values from columns in a Pandas DataFrame is a fundamental step in data analysis and preparation. It provides insights into the data distribution, helps in data cleaning, and also supports various data manipulation tasks. We have discussed how to get a list of unique values, a count of unique values, and the frequency of unique values from Pandas DataFrame with examples. After reading this tutorial, we hope you can easily get unique values from a Pandas DataFrame in Python.
Reference
https://stackoverflow.com/questions/32072076/find-the-unique-values-in-a-column-and-then-sort-them