Nikita Sakhanenko, Jamiul Jahid and David Galas
PNDRI
Background. The burden of childhood diarrhea and malnutrition remains high in South Asia due to inadequate household sanitation, lack of access to improved water and poor hygiene practices. An understanding of how impaired growth is causally related to pathogen-specific diarrhea and what household factors determine this relationship can aid development of interventions that more effectively reduce these co-morbidities.
Methods. Our method is model-free and identifies multivariable dependencies among variables. The dependency measures are significantly nonzero only if the subset of variables has an essential, collective dependency. Calculating dependency values for variable sets of large degree allows us to identify the dependent subsets, but can result in combinatorial explosion. We have taken advantage of the properties of the measure to avoid the combinatorial explosion by following the “shadows” that the multi-variable dependency casts onto smaller subsets.
Results. We analyzed a large, high-dimension data set on the development of Singapore children. This study collected a diverse range of information about children to capture a full view of child development. We considered three categories of phenotypes, anthropometric, neurological, and asthma/eczema, and their dependence on genetic variation. We identified a small set of strong two- and three-variable collective dependencies among phenotypes and SNPs. These dependencies formed interconnected networks of variables, and allowed us to look for biological relationships in these dependencies and form new hypotheses about the causal relationships.
Discussion. The application of our method to the Singapore data (GUSTO) shows promising initial results – we have identified dependencies in very large and heterogeneous data, and generate hypotheses. We will now add other types of data to the analysis, and integrate them into a single network.
Several lessons: Preprocessing data is extremely important. Missing data and noise strongly affect our ability to detect dependencies, and binning variable quantities is also key. Binning ranges from segmenting real values to complex groupings of time series or categorical values. Our method returns a set of candidate multi-variable dependencies, which is input to functional analysis. The SNP-phenotype dependencies and their networks suggest a number of involved biological pathways.
Sponsored by the Bill & Melinda Gates Foundation, Healthy birth, growth and development initiative
Reference: PAGE 25 () Abstr 5724 [www.page-meeting.org/?abstract=5724]
Poster: Methodology - Other topics