Stargazer: A solution to produce amazing academic tables

Introduction

As an economist working as a research assistant, I would hold that producing tables with estimation results is one of the job’s main tasks. As the research project flows, the results change, and as a consequence, the tables. With this in mind, the importance of maintaining the tables pipeline automatized is a crucial task. In the first place, you avoid typing errors due to manual typing, and most importantly, you prevent manually typing a table hand. Second, when you have a code to make the results for you, your results are reproducible. This issue is important to share the code with other researchers, as well as your future self.

In this post, I will show how to achieve amazing automatized regression tables using the R package called stargazer. This fantastic package (which name is also an incredible song written by Rainbow) produce table outputs with the data objects that are outputs of your estimations. Almost every detail of the tables is customizable!

Using stargazer

Data set

According to the documentation, swiss data contains “Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888”. You can find more details here.

data('swiss')
head(swiss)
##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6

Basic Usage

We will use a Linear Regression model to explain how Fertility is affected by several variables for explanatory purposes. Of course, stargazer supports a broader set of packages, including Instrumental Variables, Fixed Effects Models, among many others. Check the supported models to see all the possibilities. Finally, I will format all the tables in html to show the outputs. In the last part of this post, I will show how to use LaTeX outputs.
Disclaimer: I intend to show how to draw tables using stargazer, so do not expect too much from these models.

model_1 <- lm(Fertility ~ Catholic + Infant.Mortality, data = swiss)
model_2 <-
  lm(Fertility ~ Catholic + Infant.Mortality + Agriculture, data = swiss)
model_3 <-
  lm(Fertility ~ Catholic + Infant.Mortality + Agriculture + Examination,
     data = swiss)

# Checking a summary
summary(model_1)
## 
## Call:
## lm(formula = Fertility ~ Catholic + Infant.Mortality, data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.406  -4.336   2.036   5.317  20.421 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      35.59794   10.66161   3.339  0.00172 **
## Catholic          0.12071    0.03752   3.217  0.00243 **
## Infant.Mortality  1.48317    0.53719   2.761  0.00837 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.45 on 44 degrees of freedom
## Multiple R-squared:  0.3309,	Adjusted R-squared:  0.3005 
## F-statistic: 10.88 on 2 and 44 DF,  p-value: 0.0001447

The package is straightforward to use. As we see in the example, just passed the objects with the estimation parameters to the table, and you will have an excellent output.

# install.packages('stargazer')
library(stargazer)

stargazer(model_1, model_2, model_3, type = "html")
Dependent variable:
Fertility
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
F Statistic10.881*** (df = 2; 44)9.007*** (df = 3; 43)12.565*** (df = 4; 42)
Note:*p<0.1; **p<0.05; ***p<0.01

Editing Headers

Of course, your requirements will vary depending on which results do you want to show. Within this function, you have many options to format the header of your table. Here I present some features that you can modify (dep.var.caption, dep.var.labels, column.labels, dep.var.labels.include, model.numbers), but there are many more!

stargazer(
  model_1,
  model_2,
  model_3,
  type = "html",
  dep.var.caption  = "My new caption",
  dep.var.labels   = "Fertility (Live births per 1,000 inhabitants)",
  column.labels = c("OLS", "OLS", "OLS"),
  dep.var.labels.include = TRUE,
  # By default is TRUE, change to FALSE to supress the dep var labels,
  model.numbers = TRUE
  # By default is TRUE, change to FALSE to supress models' numbers.
)
My new caption
Fertility (Live births per 1,000 inhabitants)
OLSOLSOLS
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
F Statistic10.881*** (df = 2; 44)9.007*** (df = 3; 43)12.565*** (df = 4; 42)
Note:*p<0.1; **p<0.05; ***p<0.01

Multicolumns

You can arrange the column.labels array to work with multicolumn using column.separate. For instance, in econometrics, it is common to use the column labels to reference the estimation strategy used to estimate a set of coefficients. Imagine that you are working on the same three models presented above, with the difference that you are estimating the third one with Instrumental Variables (IV) approach instead of Ordinary Least Squares (OLS).

stargazer(
  model_1,
  model_2,
  model_3,
  type = "html",
  column.labels   = c("OLS", "IV"),
  column.separate = c(2, 1) 
  # First label for the first two columns and the second for the third one
)
Dependent variable:
Fertility
OLSIV
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
F Statistic10.881*** (df = 2; 44)9.007*** (df = 3; 43)12.565*** (df = 4; 42)
Note:*p<0.1; **p<0.05; ***p<0.01

Changing Covariates’ Labels

Change the covariate labels is also super easy. Just pass an array with your desired titles to the parameter covariates.labels. My suggestion is that you ALWAYS must check the order of the features used in an estimation model because you could wrongly label a variable with another name.

labels <- c(
  'Catholic (% as opposed to protestant)',
  'Infant Mortality	*live births who live less than 1 year)',
  'Agriculture	(% of males involved in agriculture as occupation',
  'Examination	(% draftees receiving highest mark on army examination)'
)

stargazer(model_1,
          model_2,
          model_3,
          type = "html",
          covariate.labels = labels)
Dependent variable:
Fertility
(1)(2)(3)
Catholic (% as opposed to protestant)0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant Mortality *live births who live less than 1 year)1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture (% of males involved in agriculture as occupation0.142*-0.048
(0.073)(0.080)
Examination (% draftees receiving highest mark on army examination)-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
F Statistic10.881*** (df = 2; 44)9.007*** (df = 3; 43)12.565*** (df = 4; 42)
Note:*p<0.1; **p<0.05; ***p<0.01

Styles

This package has a parameter to configure the table’s custom according to several academic journals’ aesthetics. For example, in the model below, I present a table with American Economic Review style. You can check all the options at this link.

stargazer(model_1, model_2, model_3, type = "html", style='aer')
Fertility
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
F Statistic10.881*** (df = 2; 44)9.007*** (df = 3; 43)12.565*** (df = 4; 42)
Notes:***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.

Customizing Stats of the Table

You can customize how to present the coefficients and the statistical inference using the report parameter. In my opinion, the syntax of this parameter a little tricky. Here are two examples of how to modify this feature:

# rep_format <- "vcp*" # variable name, coefficient, p value and stars
rep_format <- "vct" # variable name, coefficient, t statistic without stars

stargazer(model_1,
          model_2,
          model_3,
          type = "html",
          report = rep_format) 
Dependent variable:
Fertility
(1)(2)(3)
Catholic0.1210.0880.026
t = 3.217t = 2.192t = 0.679
Infant.Mortality1.4831.6331.396
t = 2.761t = 3.104t = 3.018
Agriculture0.142-0.048
t = 1.962t = -0.593
Examination-0.968
t = -3.829
Constant35.59826.74859.603
t = 3.339t = 2.372t = 4.570
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
F Statistic10.881*** (df = 2; 44)9.007*** (df = 3; 43)12.565*** (df = 4; 42)
Note:*p<0.1; **p<0.05; ***p<0.01

Using personalized standard errors and p-values

In general, a best practice while estimating linear models is to use robust standard errors since the good asymptotic properties allow us to improve the estimation’s statistical inference. For Stata users, the option , r at the end of every regression performs this task. Unfortunately, in R, the approach is not as direct as in Stata to use these errors. Nonetheless, stargazer allow us to use customized standard errors and p-values for our tables.

# install.packages('lmtest')
# install.packages('sandwich')

library("lmtest") # coeftest
library("sandwich") # vcovHC

# Robust standard errors: 
# Check https://www.r-econometrics.com/methods/hcrobusterrors/ for more details/
inference_m1 <-
  coeftest(model_1, vcov = vcovHC(model_1, type = "HC0"))
inference_m2 <-
  coeftest(model_2, vcov = vcovHC(model_2, type = "HC0"))
inference_m3 <-
  coeftest(model_3, vcov = vcovHC(model_3, type = "HC0"))

stargazer(
  model_1,
  model_2,
  model_3,
  se = list(NULL, inference_m2[, 2], inference_m3[, 2]), # If NULL use model_* errors
  p = list(NULL, inference_m2[, 4], inference_m3[, 4]), # If NULL use model_* pvalues
  omit.stat = "f",
  type = "html"
) 
Dependent variable:
Fertility
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.037)(0.036)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.433)(0.471)
Agriculture0.142-0.048
(0.090)(0.067)
Examination-0.968***
(0.232)
Constant35.598***26.748**59.603***
(10.662)(11.190)(12.262)
Observations474747
R20.3310.3860.545
Adjusted R20.3010.3430.501
Residual Std. Error10.448 (df = 44)10.125 (df = 43)8.820 (df = 42)
Note:*p<0.1; **p<0.05; ***p<0.01

Of course, you don’t need to show always the same indicators in every table. With the option keep.stat, you can specify an array of statistics to show in your table. In this example, I only show the sample size and the R-squared. You can check the list of statistics here!

stargazer(
  model_1,
  model_2,
  model_3,
  type = "html",
  keep.stat = c("n", "rsq")
) 
Dependent variable:
Fertility
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
Observations474747
R20.3310.3860.545
Note:*p<0.1; **p<0.05; ***p<0.01

Adding new lines

In some context, it is helpful to add lines at the end of the table. For example, when using Fixed Effects, it is common to see an array indicating if an estimation is controlling for some Fixed Effect or not. From my perspective, the best way to achieve this is to use the add.lines option. Here, you specify a list of arrays with all the lines that you want to add.

stargazer(
  model_1,
  model_2,
  model_3,
  type = "html",
  keep.stat = c("n", "rsq"),
  add.lines = list(
    c("County Fixed Effect", "No", "Yes", 'Yes'),
    c("Time Fixed Effect", "Yes", "No", 'Yes')
  )
) 
Dependent variable:
Fertility
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
County Fixed EffectNoYesYes
Time Fixed EffectYesNoYes
Observations474747
R20.3310.3860.545
Note:*p<0.1; **p<0.05; ***p<0.01

Customizing Notes

I wouldn’t say I like this option very much because the notes are introduced in a multicolumn, affecting the table’s width depending on the notes’ length. Nevertheless, I leave some examples of how to customize them.

stargazer(
  model_1,
  model_2,
  model_3,
  type = "html",
  keep.stat = c("n", "rsq"),
  add.lines = list(
    c("County Fixed Effect", "No", "Yes", 'Yes'),
    c("Time Fixed Effect", "Yes", "No", 'Yes')
  ),
  notes.label = 'Comments', # Edit here the label
  notes = 'Own elaboration based on swiss data. Significance levels: * 90%, ** 95%, *** 99%.',
  notes.append = FALSE, # TRUE append the significance levels
  notes.align = 'l' # c center, r right
) 
Dependent variable:
Fertility
(1)(2)(3)
Catholic0.121***0.088**0.026
(0.038)(0.040)(0.038)
Infant.Mortality1.483***1.633***1.396***
(0.537)(0.526)(0.463)
Agriculture0.142*-0.048
(0.073)(0.080)
Examination-0.968***
(0.253)
Constant35.598***26.748**59.603***
(10.662)(11.274)(13.042)
County Fixed EffectNoYesYes
Time Fixed EffectYesNoYes
Observations474747
R20.3310.3860.545
CommentsOwn elaboration based on swiss data. Significance levels: * 90%, ** 95%, *** 99%.

Stargazer and LaTeX

stargazer and LaTeX are a great combination together! Change the parameter type to "latex" and the package will do the work for you. You can also save the output in an external file specifying the parameter out. Additionally, if you don’t want your table in the table environment (i.e. you just want the tabular), use the option float. In future posts, I will go deeper on this issue.

stargazer(
  model_1,
  model_2,
  model_3,
  type = "latex",
  # out = 'path/of/your/table.tex'
  header = FALSE, # If TRUE, stargazer print the header with the citation and package info
  float = TRUE # If FALSE, the function don't returns the table in a table environment
) 
## 
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
##  & \multicolumn{3}{c}{\textit{Dependent variable:}} \\ 
## \cline{2-4} 
## \\[-1.8ex] & \multicolumn{3}{c}{Fertility} \\ 
## \\[-1.8ex] & (1) & (2) & (3)\\ 
## \hline \\[-1.8ex] 
##  Catholic & 0.121$^{***}$ & 0.088$^{**}$ & 0.026 \\ 
##   & (0.038) & (0.040) & (0.038) \\ 
##   & & & \\ 
##  Infant.Mortality & 1.483$^{***}$ & 1.633$^{***}$ & 1.396$^{***}$ \\ 
##   & (0.537) & (0.526) & (0.463) \\ 
##   & & & \\ 
##  Agriculture &  & 0.142$^{*}$ & $-$0.048 \\ 
##   &  & (0.073) & (0.080) \\ 
##   & & & \\ 
##  Examination &  &  & $-$0.968$^{***}$ \\ 
##   &  &  & (0.253) \\ 
##   & & & \\ 
##  Constant & 35.598$^{***}$ & 26.748$^{**}$ & 59.603$^{***}$ \\ 
##   & (10.662) & (11.274) & (13.042) \\ 
##   & & & \\ 
## \hline \\[-1.8ex] 
## Observations & 47 & 47 & 47 \\ 
## R$^{2}$ & 0.331 & 0.386 & 0.545 \\ 
## Adjusted R$^{2}$ & 0.301 & 0.343 & 0.501 \\ 
## Residual Std. Error & 10.448 (df = 44) & 10.125 (df = 43) & 8.820 (df = 42) \\ 
## F Statistic & 10.881$^{***}$ (df = 2; 44) & 9.007$^{***}$ (df = 3; 43) & 12.565$^{***}$ (df = 4; 42) \\ 
## \hline 
## \hline \\[-1.8ex] 
## \textit{Note:}  & \multicolumn{3}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
## \end{tabular} 
## \end{table}

Final Thoughts

I found stargazer a terrific tool for scientific research. It is intuitive to use and is highly customizable. There is also a python version under development!

I hope this post will help you with your tables! :)

References


***********************
© Ignacio Riveros