Vikash Mansinghka
Massachusetts Institute of Technology
Objectives: To make it possible for non-technical HBGD stakeholders to ask and answer empirical questions, such as “What countries are probably comparable to Bangladesh in terms of their expression of malnutrition-related variables at the macro-level?” The underlying goal is to empirically ground HBGD-relevant policy advocacy, initiative design, post-mortem analysis, and public debate. This requires an accessible interface, plus means of addressing data quality, model quality, model comparison, model combination, and collaborative analysis.
Methods: Longitudinal macro data were obtained from the Gapminder Foundation [1], including ~500 variables for ~250 countries, some for over 100 years. The data were analyzed using BayesDB [2], a novel probabilistic programming platform that makes it possible for end users to query the probable implications of the data without needing a conceptual or technical understanding of statistics. Bayesian ensembles of models were built from CrossCat [3], as well as mixed effects modeling and machine learning approaches.
Results: The output forms the basis for the BayesDB Macro Indicator Explorer, a web application that enables end users to browse the Gapminder data through the lens of multivariate “country phenotypes”. Users can identify the probable comparable countries for any target country and indicator of interest, and compare results across different modeling approaches.
Conclusions: Depending on the target country and indicator, the inferred set of comparables varies greatly. The ease of use suggests that extending the BayesDB MIE to answer a broader class of questions may be fruitful. The inferred dependencies between variables are not sparse; standard analyses that treat each variable independently may thus be inaccurate.
References:
[1] http://gapminder.org
[2] Mansinghka et al. BayesDB: a probabilistic programming platform for querying the probable implications of data. In review; preprint available at arXiv:1512:05006.
[3] Mansinghka et al. CrossCat: a fully Bayesian nonparametric method for analyzing heterogeneous high-dimensional data. In press at the Journal of Machine Learning Research (2016).
Reference: PAGE 25 (2016) Abstr 5999 [www.page-meeting.org/?abstract=5999]
Poster: Drug/Disease modeling - Other topics