Converting a Pandas DataFrame to a Numpy array is a common operation in various data analysis, machine learning, and scientific computing scenarios.
Here are some use cases for converting DataFrame to Numpy array:
- Machine Learning Input: Many machine learning libraries and algorithms, such as sci-kit-learn, require input data in Numpy arrays. Converting a DataFrame to a Numpy array allows us to seamlessly integrate Pandas data with machine learning models.
- Efficient Numerical Operations: Numpy arrays offer efficient numerical operations and vectorized computations. Converting a DataFrame to a Numpy array is useful when we want to perform numerical calculations using optimized Numpy functions.
- Interfacing with Scientific Libraries: Libraries like SciPy and Matplotlib often work best with Numpy arrays. Converting a DataFrame to a Numpy array facilitates seamless integration with these libraries for scientific computations, simulations, and visualization.
- Feature Engineering: Before applying machine learning algorithms, we might need to perform feature engineering, including scaling, transformation, or extraction of features. Numpy arrays provide a convenient format for applying such operations.
- Customized Data Manipulation: If we need to perform custom data manipulations or calculations that are easier to achieve with Numpy’s array operations, converting a DataFrame to a Numpy array can simplify our code.
- Statistical Analysis: Some statistical functions or libraries work better with Numpy arrays. Converting a DataFrame to Numpy arrays can be helpful when conducting statistical analyses or hypothesis testing.
- Integration with Deep Learning Libraries: When working with deep learning libraries like TensorFlow or PyTorch, we often need to provide input data as Numpy arrays. Converting a DataFrame to a Numpy array facilitates integration with these libraries.
Converting Pandas DataFrame to NumPy Array in Python
We will convert a Pandas DataFrame to Numpy Array in Python by following three steps:
- Import Pandas and Numpy
- Creating a Pandas DataFrame
- Convert DataFrame to NumPy Array
Let us see them in brief.
1. Import Pandas and Numpy
For converting a Pandas DataFrame to a Numpy array, we have imported the Pandas library as pd and Numpy library as np.
import pandas as pd
import numpy as np
2. Creating a Pandas DataFrame
In this step, we created a dictionary first and then converted that dictionary into a DataFrame with the help of Pandas library.
data = {'Name':['Erik','Lisa','John'],
'Age':[38,33,44],
'PhD':[3,1,10]}
df = pd.DataFrame(data)
display(df)
Here we created a dictionary and for the dictionary, we created three key-value pairs. The first one is ‘Name’:[‘Erik’,’ Lisa’,’ John’] then ‘Age’:[38,33,44] and the last is ‘PhD’:[3,1,10] and we have stored all this in the data variable.
Then for converting the data dictionary into a DataFrame, we used pd and then we put the data dictionary into pd.DataFrame( ) for conversion of the dictionary to DataFrame and stored it in the df variable. In last, we displayed our DataFrame df by display(df).
Output:
3. Convert DataFrame to NumPy Array
Example 1:
Here we have converted the DataFrame to an array.
np_array = df.to_numpy()
display(np_array)
We converted the DataFrame df to the Numpy array by df.to_numpy( ) and stored it in the variable np_array. Then we displayed the Numpy array by the display(np_array).
Output:
Example 2:
Here we have converted a specific column.
df[['Age']].to_numpy()
We used the DataFrame df with the brackets and selected the column Age and put it like a string in the bracket and then converted the Age column from the DataFrame into the Numpy array with the help of the to_numpy( ) function.
Output:
Example 3:
Here we have converted multiple columns.
df[['Age','PhD']].to_numpy()
We have taken the DataFrame df and put the columns Age and PhD in the list and then converted these columns to the Numpy array by using the to_numpy( ) function.
Output:
Example 4:
Here we have selected certain data types and then converted them into an array.
df.select_dtypes(include=np.int64).to_numpy()
Output:
Summary
Converting a DataFrame to a Numpy array is a versatile technique that enhances compatibility, improves performance, and allows for seamless integration with various libraries and tools commonly used in data analysis and machine learning workflows. In this tutorial, we have learned how to convert a panda’s DataFrame to a Numpy array specifically, we learned how to do this in three simple steps, the steps to convert the DataFrame to an array are first load the needed libraries then create the DataFrame and finally convert the DataFrame to an array using the to_numpy( ) method. We also learned how to select specific columns and convert those to Numpy arrays. Lastly, we learned how to select certain data types and convert them to a Numpy array.
Reference
https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array