How to Convert CSV to NumPy Array in Python

A CSV (Comma-Separated Values) file is a simple and widely used file format for storing tabular data. It is a plain text file where each line represents a row of data, and within each line, individual data elements are separated by commas (or other delimiters, such as tabs or semicolons).

In this tutorial, we are going to discuss converting the CSV data into a NumPy Array using NumPy genfromtxt(), loadtxt() and CSV reader() functions. Let us first see how we can create a CSV file in Python so that we can further convert it to NumPy Array.

Creating a CSV File Using Python csv Module

To create a CSV file in Python, you can use the built-in csv module. Since csv is a built-in module, no need to install it. However, you must import it in any script that uses it.

Syntax of Importing CSV Module:

import csv

Let’s assume you have a list of lists, where each inner list represents a row of data. Now let’s create the CSV file from the data defined, and use the csv.writer class to write the data to a new CSV file. You need to provide the filename and the file mode like ‘w’ for write, when opening the file. Then, use the writerow() method to write each row of data to the file.

Example of Creating CSV file using CSV Module:

import csv
data = [['item','cost'],['pen',10],['book',50]]
csv_file = "output.csv"

with open(csv_file, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

In the above code, to create the csv file, we have used csv writer() method. We have imported csv module as we are using it, we have prepared a list of data with the first row as its column names item and cost, the other rows data as the pen and its cost, book and its cost. We have defined a file, output.csv. We have to append the data to this, for that we have opened the file in write(‘w’) mode, and we are adding a new line every time we append a row to the file.

After running the above code, you should have a new file named “output.csv” in the same directory as your Python script, which will be used further.

Convert CSV to NumPy Array in Python

Generally performing operations on data is simple if it is in a Structured format like a NumPy Array. We can use the NumPy library’s functions to convert CSV data to a NumPy Array, they read the data from a text file, such as a CSV file, and creates a NumPy array. Since NumPy is not a built-in package in Python, so we have to install it manually.

Pip is a package manager for Python which can be used to install NumPy by running the following command in your terminal.

Syntax of Installing NumPy Module:

pip install numpy

Once you have successfully installed NumPy, you can use its method to convert CSV data to NumPy Array. Let’s see them one by one.

Using NumPy genfromtxt() Function

As we have created a CSV file named “output.csv” in the same directory as your Python script, let’s convert it into an Array using the genfromtxt() function of NumPy Library.

Syntax of using genfromtxt() Function:

numpy.genfromtxt(fname, dtype=None, delimiter=' ', skiprows=0)

Parameters:

  • fname is the current path of the csv file,
  • dtype is the data type of the resulting array. If None, the data types will be determined by the contents of the file,
  • delimiter separates the columns in the file ,
  • skiprows is the number of rows to skip at the beginning of the file. It returns a ndarray with the data that is present in CSV.

Example of genfromtxt() Function:

import numpy as np
data_numpy_array = np.genfromtxt('output.csv', delimiter=',', dtype='str')
print(data_numpy_array)
print(type(data_numpy_array))

As our CSV file contains String type data, we have mentioned the dtype attribute’s value as ‘str’, it could be float or int according to the data that is present in the CSV File.

Output:

Converting CSV to NumPy Array Using genfromtxt() Function

We can clearly see that the output array is a NumPy array.

Using NumPy loadtxt() Function

In the same way, we can also use the loadtxt() function to convert CSV to NumPy Array.

Syntax of loadtxt() Function:

np.loadtxt(fname, delimiter=',', dtype='str')

Example of loadtxt() Function:

import numpy as np
data_array2 = np.loadtxt('output.csv', delimiter=',', dtype='str')
print(data_array2)
print(type(data_array2))

Output:

Converting CSV to NumPy Array Using loadtxt() Function

The output array is a ndarray, an object of the NumPy Array.

Using CSV reader() Function

We can also convert the CSV data to a list using csv.reader() and then the list to the NumPy Array using np.array().

Example:

The following code reads the CSV file and converts it into a list.

import numpy as np
import csv

csv_file = "output.csv"

data_list = []
with open(csv_file, "r") as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        data_list.append(row)
print(data_list)
print(type(data_list))

In the above code, we have imported the required libraries, loaded the csv file that is to be converted and we have created a list to store the csv data, and opened the file in read (‘r’) mode, then using the csv reader() method we have read the contents of the csv file and appended the csv data to the list.

Python List

The result we got is a list that can be further converted to a NumPy Array using a simple NumPy method as shown in the below code.

numpy_array = np.array(data_list)
print(numpy_array)
print(type(numpy_array))

Output:

Converting CSV to NumPy Array Using CSV Reader

Finally, we have covered different methods to convert CSV data into a NumPy array.

Conclusion

Converting CSV data to NumPy Array provides Efficient data manipulation, Multidimensional data support, NumPy’s extensive functionality, Interoperability (This interoperability allows you to leverage the strengths of different libraries in your data analysis pipeline), Broadcasting, Memory efficiency, Easy indexing and slicing, Numerical Stability and many more.

In conclusion, if you have a simple text file with regular data and no missing values, and you just want to load the data into a NumPy array without column names, loadtxt() is a straightforward and efficient option. On the other hand, if you have more complex data with varying data types and missing values, or you want to work with named columns, genfromtxt() provides more flexibility and control over the loading process.

Reference

https://stackoverflow.com/questions/34932210/convert-csv-to-numpy

Divya Maddipati
Divya Maddipati
Articles: 4