In September 2023, BioMedware’s Chief Scientist, Dr. Pierre Goovaerts, received the grant “Geostatistical Software for Non-Parametric Geostatistical Modeling of Uncertainty”.  Funded by the National Library of Medicine of the National Institutes of Health, this new award will establish the feasibility of novel methods for highly accurate models of human exposure, along with their associated uncertainties.  Environmental exposures from air and water pollution, agriculture, industry, and the built environment impact human health through toxins, noise, and other factors.  The global burden of disease from preventable environmental exposures is enormous and growing.  According to the World Health Organization (A Prüss-Ustün, J Wolf et al. 2016), in 2012,  

“… 12.6 million deaths globally, representing 23% (95% CI: 13–34%) of all deaths, were attributable to the environment. When accounting for death and disability, the fraction of the global disease burden due to the environment is 22% (95% CI: 13–32%). In children under five years, up to 26% (95% CI: 16–38%) of all deaths could be prevented.” 

These figures are staggering and are rapidly increasing with global disruptions due to climate change.  They also represent an enormous opportunity to improve global health since these deaths are largely preventable.   In order to prevent diseases caused by the environment, we must model environmental factors (e.g. air and water pollution, etc.) and their impact on people at both individual- and population-levels.  In order to understand the problem, consider an example taken from Dr. Goovaerts’ proposal. 

A key component in any investigation of association and/or cause-effect relationships between the environment (e.g. air pollution, radon, water lead levels) and health outcomes is the availability of accurate models of exposure (estimate and uncertainty assessment). In a bladder cancer case-control study to evaluate risks associated with exposure to low levels of arsenic (As) in drinking water (Meliker et al., 2007), the following information was available to model the spatial variability of arsenic in groundwater; 

  • 9,188 data collected at 8,212 private wells sampled between 1993 and 2002; 737 observations were below the detection limit, and 662 wells were sampled multiple times (2–14 times), with an average time interval of 14 days and no apparent temporal trend (Goovaerts et al., 2005),  
  • multiple layers of secondary information: type of bedrock and unconsolidated deposits, and proximity of wells to the Marshall Sandstone subcrop where arsenic concentrations are higher. 

This diversity of data sources is typical, as exemplified in Figure 1, where data layers have different: 

  • measurement scales (continuous for arsenic vs categorical for geology), 
  • precision: hard data (single measurement) vs soft data (interval-type for data below the detection limit, probability distribution for repeated measurements or calibration of secondary information). 
  • Sampling density (uneven and preferential sampling of wells with high level of arsenic vs exhaustive coverage for geological layers), 
  • spatial supports (point support for arsenic levels vs areal support for type of bedrock). 

 

Processing and merging these data to estimate exposure to arsenic in water can be a daunting task for staff in public health agencies, even more so when the time dimension is taken into account.    

Figure 1. Some data layers available to model the spatial variability of groundwater [As] across 11 MI Counties: water samples collected at private wells (A) and bedrock map (B). Data were used to map the probability of exceeding the EPA standard of 10 μg/L (C). The sample distribution is skewed right with a range of spatial correlation of 2 km (D).  

One of the defining characteristics of environmental data, such as pollutant concentrations, is their structured distribution in space and time which reflects the impact of various factors operating at different spatial and temporal scales (Goovaerts, 2011). For example, Figure 1D illustrates how arsenic levels measured at wells separated by 2 km or less are still correlated. Geostatistical spatio-temporal models provide a probabilistic framework for data analysis and predictions that builds on the joint spatial and temporal dependence between observations. Geostatistical tools are increasingly coupled with GIS capabilities for applications that characterize space-time structures, spatially interpolate scattered measurements to create spatially exhaustive layers of information and assess the corresponding accuracy and precision. Of critical importance when coupling GIS data and environmental/exposure models is the issue of error propagation, i.e. how the uncertainty in input data (e.g., arsenic concentrations) translates into uncertainty about model outputs (e.g., risk of cancer). This critical need is seldom, if ever, addressed in contemporary research – a theoretical and analytical deficit that will be solved by this research project. 

This SBIR Phase I project will evaluate the feasibility of these new techniques and their implementation in a user-friendly fashion in BioMedware’s Vesta software.  Once feasibility is demonstrated, an SBIR Phase II grant will undertake usability assessment, a detailed application study, and complete development of the techniques in BioMedware’s Vesta software. 

 

Acknowledgements 

Research reported in this blog post was supported by the National Library Of Medicine of the National Institutes of Health under Award Number R43LM014519. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 

Bibliography 

A Prüss-Ustün, J Wolf, C Corvalán, R Bos and M Neira,  2016.  “Preventing disease through healthy environments: a global assessment of the burden of disease from environmental risks”.  World Health Organization.  ISBN 978 92 4 156519 6. 

Meliker, J., Slotnick, M., AvRuskin, G., Kaufmann, A., Fedawa, S.A., Goovaerts, P., Jacquez, G.M. and J. Nriagu. 2007. Individual lifetime exposure to inorganic arsenic using a space-time information system. International Archives of Occupational and Environmental Health, 80(3), 184-197.Goovaerts et al. 2005 

Goovaerts, P., AvRuskin, G., Meliker, J., Slotnick, M., Jacquez, G.M., and J. Nriagu. 2005. Geostatistical modeling of the spatial variability of arsenic in groundwater of Southeast Michigan.  Water Resources Research, 41(7), W07013 10.1029. 

Goovaerts, P. 2011. Fate and Transport: Geostatistics and Environmental Contaminants. In: Nriagu JO (ed.) Encyclopedia of Environmental Health, volume 2, pp. 701–714 Burlington: Elsevier.