Hierarchical Dirichlet Process Generalized Linear Models in R (hdpGLM)


This paper presents the R package hdpGLM, which contains an implementation of the hierarchical Dirichlet process mixture of Generalized Linear Models (hdpGLM) proposed in Ferrari (2020), expanding non-parametric Bayesian models proposed in Hannah (2011), Mukhopadhyay and Gelfand, (1997) and Heckman and Vytlacil (2007) to deal with context-dependent cases. The package can be used to estimate latent heterogeneity in the marginal effect of GLM linear coefficients, to cluster data points based on that latent heterogeneity, and to investigate the occurrence of Simpson’s Paradox due to latent or omitted features. It can also be used with hierarchical data to estimate the effect of upper-level features (e.g., levels of inequality, regional economic decline, institutional features) on the latent heterogeneity in the effect of lower-level covariates (e.g., income, gender, party identification) on outcome variables (e.g., policy preferences, support for populism, vote intention), which can be caused by omitted interactions in the model specification.

Journal of Statistical Software (in preparation for submission)