Softwares | Diogo Ferrari

Hierarchical Dirichlet Process Generalized Linear Models (hdpGLM)


	The package implements the hierarchical Dirichlet process Generalized Linear Models proposed in Ferrari (2020) Modeling Context-Dependent Latent Effect Heterogeneity, which expands the non-parametric Bayesian models proposed in Mukhopadhyay and Gelfand (1997), Hannah (2011), and Heckman and Vytlacil (2007) to deal with context-dependent cases. The package can be used to estimate latent heterogeneity in the marginal effect of GLM linear coefficients, to cluster data points based on that latent heterogeneity, and to investigate the occurrence of Simpson’s Paradox due to latent or omitted features. It can also be used with hierarchical data to estimate the effect of upper-level features (e.g., levels of inequality, regional economic decline, institutional features) on the latent heterogeneity in the effect of lower-level covariates (e.g., income, gender, party identification) on outcome variables (e.g., policy preferences, support for populism, vote intention), which can be caused by omitted interactions in the model specification.

The package implements the hierarchical Dirichlet process Generalized Linear Models proposed in Ferrari (2020) Modeling Context-Dependent Latent Effect Heterogeneity, which expands the non-parametric Bayesian models proposed in Mukhopadhyay and Gelfand (1997), Hannah (2011), and Heckman and Vytlacil (2007) to deal with context-dependent cases. The package can be used to estimate latent heterogeneity in the marginal effect of GLM linear coefficients, to cluster data points based on that latent heterogeneity, and to investigate the occurrence of Simpson’s Paradox due to latent or omitted features. It can also be used with hierarchical data to estimate the effect of upper-level features (e.g., levels of inequality, regional economic decline, institutional features) on the latent heterogeneity in the effect of lower-level covariates (e.g., income, gender, party identification) on outcome variables (e.g., policy preferences, support for populism, vote intention), which can be caused by omitted interactions in the model specification.

Identification analysis and structural causal model estimation in R (idar)


	The package implements identification analysis and structural causal model estimation in R. The software is particularly useful when the analysis relies on selection on observables for causal inference, making it easy to check if a causal effect is identifiable for any given assumption about the DGP encoded in a DAG. It provides an easy-to-use parametric estimation procedure using a linear structural equations model if a causal effect is identifiable. More specifically, it provides an end-to-end estimation of structural causal models (SCM), which includes specification of the data generating process (DGP) using directed acyclic graphs (DAGs), identification analysis, selection of adjustment variables (selection on observables), estimation of causal effects, and computation of numeric and graphical summaries of the estimation results.

The package implements identification analysis and structural causal model estimation in R. The software is particularly useful when the analysis relies on selection on observables for causal inference, making it easy to check if a causal effect is identifiable for any given assumption about the DGP encoded in a DAG. It provides an easy-to-use parametric estimation procedure using a linear structural equations model if a causal effect is identifiable. More specifically, it provides an end-to-end estimation of structural causal models (SCM), which includes specification of the data generating process (DGP) using directed acyclic graphs (DAGs), identification analysis, selection of adjustment variables (selection on observables), estimation of causal effects, and computation of numeric and graphical summaries of the estimation results.

Cluster Estimated Standard Errors in R (ceser)


	(with John Jackson) The package implements the Cluster Estimated Standard Errors (CESE) method proposed by Jackson (2019) to compute clustered standard errors of linear coefficients in regression models with grouped data. CESE produces more conservative confidence intervals, outperform the classical clustered robust standard error (CRSE) method in various ways, and avoid CRSE downward bias and underestimation of the clustered standard errors. (see Jackson, J., (2019) Corrected standard errors with clustered data, Political Analysis)

(with John Jackson) The package implements the Cluster Estimated Standard Errors (CESE) method proposed by Jackson (2019) to compute clustered standard errors of linear coefficients in regression models with grouped data. CESE produces more conservative confidence intervals, outperform the classical clustered robust standard error (CRSE) method in various ways, and avoid CRSE downward bias and underestimation of the clustered standard errors. (see Jackson, J., (2019) Corrected standard errors with clustered data, Political Analysis)

Occupation and Class Scheme Classification in R (occupar)


	The package occupar (Occupation Classification in R) provides: (1) a handful of functions to convert between different versions of the International Standard Classification of Occupations (ISCO): ISCO-68, ISCO-88, ISCO-08; (2) a set of functions to compute class schemes (EGP, ISEI, ESeC, etc.) based on ISCO. The current package benefited from Harry Ganzeboom’s work of ISCO and class schemes.

Election Forensics Package (eforensics)


	(with Walter Mebane) The package can be used to estimate probability of fraud in elections using Finite Mixture Models (Supported by NSF grant SES 1523355).

Exploratory Data Analysis in R (edar)


	A package that facilitates exploratory data analysis and visualization of model results, aligned with tidyverse and pipe coding philosophy.