Export Dataframe Objects to Stata Format with Python Pandas

Datasets are very crucial when it comes to machine learning applications. The format in which these datasets are to be stored varies based on the applications upon which they are to be run. For instance, ‘.csv’ files can be run using any spreadsheet software.

Similarly, to put the datasets to use in the Stata software for carrying out further statistical analysis, the files are to be stored in the ‘.dta’ extension. There can be times at which the results of data analysis ought to be saved as Stata files for being shared with those who continue to use these data further on in the Stata software.

In this article, we shall explore the techniques used to store any dataset in the format of a stata file. It is to be noted that Python also supports its user to work with the Stata files through its libraries that contain exclusive functions to serve the purpose. The best part is that these modules from Python can also be integrated with the workflow created for the handling or analysis of large datasets.

Also read: Working with JSON File in Python: A Comprehensive Guide


Understanding the Syntax of StataWriter.write_file()

The exporting of dataframe objects to Stata files can be done in Python using the Statawriter.write_file( ) function. It belongs to the StataWriter class within the pandas.io.stata package and provides interoperability for converting the dataframes, whatsoever their format might be, into a ‘.dta’ file in the specified file path.

Given below is its syntax detailing its basic constituents for effective functioning.

pd.io.stata.StataWriter(dst_file, src_file).write_file()

where,

  • dst_file – used to specify the file path or the file name which is to be created in ‘.dta’ format
  • src_file – used to specify the dataframe which is to be converted into a Stata file

How to Use the StataWriter.write_file() Function

Let us have a look at how this function can be deployed in Python programming. One can get things started by importing the Pandas library using the following code.

import pandas as pd

This can then be followed by creating a dataset which shall then be converted into a Stata file.

ip = {'Brand':['Horlicks', 'Boost', 'Bournvita', 'Manna'],
      'Qty (kg)':['0.5','0.5','0.5','0.5'],
      'Price (INR)':[289, 245, 200, 189]}
df = pd.DataFrame(ip)
print(df)
Dataset For Convertion Into Stata File
Dataset For Conversion Into Stata File

With the dataset for the conversion ready to go, it is time to call the package and class type for using the StataWriter.write_file( ) function as shown below.

pd.io.stata.StataWriter("E:\\Exporting_dataframe_object_to_stata_using_Pandas\\test.dta", df).write_file()

Execute the above code and if the hunch is correct, there should be some warnings thrown by the compiler. This is due to the fact that the headers are not Stata file compatible and trained eyes would have spotted this in the dataframe the very moment it came across their sight.

In case you have not figured it out yet, it is the usage of the round parentheses ( ) in the column headers but Python being all-friendly goes out of the way to amend your mistakes & get the code running as shown below.

Python Correcting Mistakes In Dataframe
Python Correcting Mistakes In Dataframe

To verify whether the data has been exported as a Stata file to the destination path, one can verify by executing the read_stata( ) function with the file path specified as shown below.

pd.read_stata("E:\\Exporting dataframe object to stata using Pandas\\test.dta")
Stata Data Successfully Stored In Destination Path
Stata Data Successfully Stored In Destination Path

To make the code comprehensible, one can also chunk it up a bit by using a two-step approach as given below.

writer = pd.io.stata.StataWriter("E:\\Exporting dataframe object to stata using Pandas\\test.dta", df)
writer.write_file( )
Alternate Version Of Coding StataWriter Function
Alternate Version Of Coding StataWriter Function

Also read: How to Convert CSV to NumPy Array in Python

Conclusion

Now that we have reached the end of this article, hope it has elaborated on the exporting of a dataframe object to stata format using the Statawriter.write_file( ) function from the Pandas library in Python. Here’s another article that details how to format the floats before the decimal point in Python. There are numerous other enjoyable and equally informative articles in AskPython that might be of great help to those who are looking to level up in Python. Audere est facere!


Reference

Arulius Savio
Arulius Savio
Articles: 26