Current Research

Geostatistical Software for Non-Parametric Geostatistical Modeling of Uncertainty

SBIR Phase I from the National Library of Medicine of the National Institutes of Health
Principal Investigator: Pierre Goovaerts, PhD

 

Summary

A key component in any investigation of association and/or cause-effect relationships between the environment (e.g. air pollution, radon, water lead levels) and health outcomes is the availability of accurate models of exposure (estimate and uncertainty assessment).
 
One of the defining characteristics of environmental data, such as pollutant concentrations, is their structured distribution in space and time which reflects the impact of various factors operating at different spatial and temporal scales (Goovaerts, 2011). Geostatistical spatio-temporal models provide a probabilistic framework for data analysis and predictions that builds on the joint spatial and temporal dependence between observations. Geostatistical tools are increasingly coupled with GIS capabilities for applications that characterize space-time structures, spatially interpolate scattered measurements to create spatially exhaustive layers of information and assess the corresponding accuracy and precision. Of critical importance when coupling GIS data and environmental/exposure models is the issue of error propagation, i.e. how the uncertainty in input data (e.g., arsenic concentrations) translates into uncertainty about model outputs (e.g., risk of cancer). This critical need is seldom, if ever, addressed in contemporary research – a theoretical and analytical deficit that will be solved by this research project.
 
Phase I of this project will:
  1. Develop an indicator kriging (IK) alternative to Poisson and binomial kriging for filtering noise caused by the small numbers problem and to disaggregate areal rate data (Area-to-Point kriging), while avoiding the generation of negative kriging rate estimates.
  2. Implement simplicial indicator kriging for predicting the probability of occurrence of categorical data and, using the case of the composition of service lines (SL) in Flint Michigan, compare the accuracy of this compositional approach to: 1) traditional indicator kriging that can result in negative probabilities of occurrence and probabilities that do not sum to one, and 2) a combination of machine learning and Bayesian data analysis used by BlueConduit, a US leader in SL composition prediction.
  3. Develop and test a prototype module that will guide non-expert users through the soft indicator coding of information and variogram modeling, followed by the spatial interpolation and cross-validation based on BioMedware’s space-time visualization and analysis technology.
  4. Conduct a usability and user experience study.
 This SBIR Phase I project will evaluate the feasibility of these new techniques and their implementation in a user-friendly fashion in BioMedware’s Vesta software.  Once feasibility is demonstrated, an SBIR Phase II grant will undertake usability assessment, a detailed application study, and complete development of the techniques in BioMedware’s Vesta software.

Customer Endorsements

“I’m using SpaceStat to analyze the spatial structure of estuarine seagrass beds in Atlantic Canada for purposes of conservation and management and I’ve been able to do the type of analysis that I was after. The variogram modeling capability and LISA functions have especially been very useful.” 

Jeff Barrell, PhD Student
Dept. of Oceanography Dalhousie University