Executing Stata in R using RStata package.

Try to migrate from one statistical software to another is, most of the time, an arduous task. It is common to hear people who commonly use one statistical software (i.e., Stata) and try to learn another one (i.e., R) desists because of the lack of time to learn it appropriately. On the other hand, several users code in both languages depending on the task to solve switching between scripts and interfaces, losing valuable time.

The main advantage of learning and using both languages is that you can match your programming skills to what is more comfortable for you. In my opinion, I think that Stata is better handy for data cleaning and working with a single data frame. On the other side, R is far better for plotting using ggplot2 package and connecting with other APIs (e.g., Google Maps).

Nevertheless, using both languages and switching between them is time costly. While I used both programs separately, I lost a lot of time and disk space writing files in Stata and loading them in R to make my plots. Fortunately, I found a solution to this issue.

The RStata package provides and excelent solution to use both languages in the same interface (in this case R). Check the documentation here. First of all, we have to install the package from CRAN and activate it.

install.packages('RStata')
library(RStata)

Secondly, we have to indicate to R wich one is our Stata path and version. For example, my path and version of Stata are:

options("RStata.StataPath" = "/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp")
options("RStata.StataVersion" = 14)

In the case of Windows, we have to delete the .exe from the path.

Now we are able to run Stata commands from a R session. Let’s see some examples:

Running single line commands

stata('di "Hello World"')
stata('di 2+2')
stata('clear all')

Running many lines commands

command <- "sysuse auto, clear
  sum mpg, d"
stata(command)

Running an entire .do file.

stata("my_dofile.do")

Although at this point, nothing is shocking about this package, the most potent usability of the tool comes when passing or receiving arguments into/to the function. AS example, imagine the following situation: you want to clean some data in Stata to nextly plot it in R (obviously using ggplot2). The workflow that I followed for months to tackle this task was to write the data as a .dta file an then read it with R. This procedure filled my Google Drive with a lot of folders and files, making it difficult to track them appropriately.

To avoid this, you can use the data.out parameter. This option returns an R data frame object.

data = stata("data_creation.do",
      data.out = TRUE)

You also can have a situation where you have to pass Stata some data that you have in R to execute commands on it. In this case, we use data.in parameter:

random_df = data.frame('column1'= rnorm(100))
stata("sum column1, d", data.in = random_df)

Finally, you can combine these two operations:

random_df = data.frame('column1'= rnorm(100))
random_df2 = stata("gen column2 = 2*column1 if column1>0 ", data.in = random_df, data.out = TRUE)

I hope this post will help Stata and/or R users’ to do more convenient the use of both languages at the same time.


***********************
© Ignacio Riveros