R packages

Hierarchical Dirichlet Process Generalized Linear Models (hdpGLM)

The package implements the hierarchical Dirichlet process Generalized Linear Models proposed in the paper Modeling Context-Dependent Latent Effect Heterogeneity . The model can be used to estimate latent heterogeneity in the marginal effect of GLM linear coefficients, cluster data points based on that latent heterogeneity, and investigate if Simpson’s Paradox occurs due to latent or omitted features. It also can be used with hierarchical data to estimate the effect of upper-level covariates on the latent heterogeneity in the effect of lower-level features.

Cluster Estimated Standard Errors in R (ceser)

(with John Jackson) The package implements the Cluster Estimated Standard Errors (CESE) method proposed by Jackson (2019) to compute clustered standard errors of linear coefficients in regression models with grouped data. The package is well integrated with standard R functions to estimate linear models (e.g., lm()). CESE produces more conservative confidence intervals, outperform the classical clustered robust standard error (CRSE) method in various ways, and avoid CRSE downward bias and underestimation of the clustered standard errors. (see Jackson, J., (2019) Corrected standard errors with clustered data, Political Analysis, forthcoming)

Occupation and Class Scheme Classification in R (occupar)

The package occupar (Occupation Classification in R) provides: (1) a handful of functions to convert between different versions of the International Standard Classification of Occupations (ISCO): ISCO-68, ISCO-88, ISCO-08; (2) a set of functions to compute class schemes (EGP, ISEI, ESeC, etc.) based on ISCO. The current package benefited from Harry Ganzeboom’s work of ISCO and class schemes.

Election Forensics Package (eforensics)

(with Walter Mebane ) The package can be used to estimate probability of fraud in elections using Finite Mixture Models (Supported by NSF grant SES 1523355).

Exploratory Data Analysis in R (edar)

A package that facilitates exploratory data analysis and visualization of model results, aligned with tidyverse and pipe coding philosophy.