Datasets are very crucial when it comes to machine learning applications. The format in which these datasets are to be stored varies based on the applications upon which they are to be run. For instance, ‘.csv’ files can be run using any spreadsheet software.
Similarly, to put the datasets to use in the Stata software for carrying out further statistical analysis, the files are to be stored in the ‘.dta’ extension. There can be times at which the results of data analysis ought to be saved as Stata files for being shared with those who continue to use these data further on in the Stata software.
In this article, we shall explore the techniques used to store any dataset in the format of a stata file. It is to be noted that Python also supports its user to work with the Stata files through its libraries that contain exclusive functions to serve the purpose. The best part is that these modules from Python can also be integrated with the workflow created for the handling or analysis of large datasets.
Also read: Working with JSON File in Python: A Comprehensive Guide
Understanding the Syntax of StataWriter.write_file()
The exporting of dataframe objects to Stata files can be done in Python using the Statawriter.write_file( ) function. It belongs to the StataWriter class within the pandas.io.stata package and provides interoperability for converting the dataframes, whatsoever their format might be, into a ‘.dta’ file in the specified file path.
Given below is its syntax detailing its basic constituents for effective functioning.
pd.io.stata.StataWriter(dst_file, src_file).write_file()
where,
- dst_file – used to specify the file path or the file name which is to be created in ‘.dta’ format
- src_file – used to specify the dataframe which is to be converted into a Stata file
How to Use the StataWriter.write_file() Function
Let us have a look at how this function can be deployed in Python programming. One can get things started by importing the Pandas library using the following code.
import pandas as pd
This can then be followed by creating a dataset which shall then be converted into a Stata file.
ip = {'Brand':['Horlicks', 'Boost', 'Bournvita', 'Manna'],
'Qty (kg)':['0.5','0.5','0.5','0.5'],
'Price (INR)':[289, 245, 200, 189]}
df = pd.DataFrame(ip)
print(df)
With the dataset for the conversion ready to go, it is time to call the package and class type for using the StataWriter.write_file( ) function as shown below.
pd.io.stata.StataWriter("E:\\Exporting_dataframe_object_to_stata_using_Pandas\\test.dta", df).write_file()
Execute the above code and if the hunch is correct, there should be some warnings thrown by the compiler. This is due to the fact that the headers are not Stata file compatible and trained eyes would have spotted this in the dataframe the very moment it came across their sight.
In case you have not figured it out yet, it is the usage of the round parentheses ( ) in the column headers but Python being all-friendly goes out of the way to amend your mistakes & get the code running as shown below.
To verify whether the data has been exported as a Stata file to the destination path, one can verify by executing the read_stata( ) function with the file path specified as shown below.
pd.read_stata("E:\\Exporting dataframe object to stata using Pandas\\test.dta")
To make the code comprehensible, one can also chunk it up a bit by using a two-step approach as given below.
writer = pd.io.stata.StataWriter("E:\\Exporting dataframe object to stata using Pandas\\test.dta", df)
writer.write_file( )
Also read: How to Convert CSV to NumPy Array in Python
Conclusion
Now that we have reached the end of this article, hope it has elaborated on the exporting of a dataframe object to stata format using the Statawriter.write_file( ) function from the Pandas library in Python. Here’s another article that details how to format the floats before the decimal point in Python. There are numerous other enjoyable and equally informative articles in AskPython that might be of great help to those who are looking to level up in Python. Audere est facere!