The goal of this project is to provide a series of tools to investigate *latent heterogeneity* in the effect of treatment variables or other observed covariates. Latent heterogeneity can occur because latent conditioning terms (i.e., interactive factors) are omitted in the empirical analysis.

In generalized linear models, omitting interactions can lead to latent occurrences Simpson’s Paradox, which is a long-standing problem in statistical analysis in general and in the social sciences in particular. Simpson’s paradox refers to the possibility that an effect found when data are aggregated is entirely different or even reversed when data are separated and analyzed in groups. If these groups are latent, classical empirical approaches (GLM, mixed models, etc.) are not able to detect and deal with them, meaning that Simpson’s Paradox goes unnoticed by the researcher. In practice, it means that a researcher can conclude that an effect is positive when, in fact, it is positive only for a subgroup of the population but negative for another subgroup.

In comparative analysis, there is another level of complication: different countries can have different latent factors conditioning different observed covariates. This problem is not new, and many researchers have recognized its importance and its implications for both observational and experimental studies (see, for instance,
Adam Przeworski, 2007). It is impossible for researchers to know *a priori* if interactions are omitted. Suzan Stokes (2014), using different terms, argues that the omnipresent possibility of omitting relevant interactions in the analysis is a source of an attitude of *radical skepticism* regarding the results of observational and experimental empirical investigation in the social sciences. She says:

But from the standpoint of the radical skeptic,no research design can dispose of all potential interactions. Setting plausibility aside, if units have high dimensionality and if some confounders are unmeasurable, some unobserved trait is always likely to interact with the treatment. Faced with an experimental study that uncovers a causal effect,the radical skeptic should posit some unspecified subset of units whose response to treatment is at odds with the average response, potentially changing the theoretical implications of the study’s findings. If interactions can change the interpretation of experimental results, then the radical skeptic should be unnerved by their implication for experimental research. Because one can test only for interactions between treatments and observed factors, ungrounded skepticism implies that we will remain in the dark regarding the real findings of experimental studies.

Unobserved interactions [...] are omnipresent and inevitably limit the contribution of research to knowledge(Stokes, 2014, pg. 46)

This project develops some machine learning approaches and semi-parametric Bayesian (SPB) methods for dealing with those issues and investigating if interactions were omitted.

A recent a recent paper, “Modeling Context-Dependent Latent Effect Heterogeneity,” published in *Political Analysis*, I propose a hierarchical Dirichlet mixture of generalized linear models to deal with that problem. Using the model, researchers don’t need to specify all interactions explicitly. The model estimates marginal effects, even though interactions are missing in the model specification. Moreover, contrary to previous approaches, my method allows researchers to investigate whether contextual features such as schools, hospitals, neighborhoods, and country-specific institutional settings are associated with the emergence of latent heterogeneity in the effect of observables. I illustrate the model’s contributions with applications in political science that investigate attitudes toward financial aid and the effect of inequality on beliefs about meritocracy. The method is implemented in R, and it is available in the R package Hierarchical Dirichlet Process Generalized Linear Models (hdpGLM).

I have used SPB models to study the latent structure of public support for welfare policies in OECD countries. I show that there is a hidden polarization among the observed socioeconomic groups in some countries but not others. My research indicates that one side effect of welfare policies in highly unequal societies with fragmented party systems is the existence of latent polarization in welfare policy preferences among individuals with similar observed socioeconomic characteristics. Countries that do not display such a latent polarization (the USA, Japan, Australia, New Zealand) usually have comparative smaller welfare states.

**Here are some related papers:**

(2019) Modeling Context-Dependent Latent Effect Heterogeneity

*Political Analysis*Measuring Public Polarization and its Connection to the Determinants of Political Preferences

Identification Analysis and Multimodality of Posterior Distribution in Bayesian Models.

### References

Przeworski, A., Is the science of comparative politics possible? , In C. B. Boix, & S. C. Stokes (Eds.), The Oxford Handbook of Comparative Politics (pp. ) (2007). : Oxford Handbooks Online.

Stokes, S. C., A defense of observational research, In D. L. Teele (Eds.), Field experiments and their critics: Essays on the uses and abuses of experimentation in the social sciences (pp. 33–57) (2014). New Haven, CT: Yale University Press.