Current Research
Geostatistical Software for Non-Parametric Geostatistical Modeling of Uncertainty
SBIR Phase II from the National Library of Medicine of the National Institutes of Health
Principal Investigator: Pierre Goovaerts, PhD
Summary
A key component in any investigation of association and/or cause-effect relationships between the environment and health outcomes is the availability of accurate and precise models of exposure. Because the cost of collecting field data is often prohibitive, it is critical to incorporate any source of secondary information available to supplement sparse datasets. Secondary data can take many forms (e.g., continuous or categorical measurement scale), and display different levels of reliability: hard vs soft data (e.g., interval-type data, probability distributions). Merging these different data layers while accounting for their spatial patterns, compositional nature (case of categorical attributes) and local uncertainty is thus challenging. With the advent of artificial intelligence (AI), particularly machine learning (ML), geostatistical predictive models have become more sophisticated and effective. The marriage of geostatistics and AI empowers us to extract deeper insights from spatial datasets, opening doors to predictive modeling, risk assessment, and optimized decision-making.
This SBIR project is developing the first commercial software to offer tools for soft indicator coding and non-parametric geostatistical modeling of uncertainty, leveraging AI to analyze, interpret, and derive insights from spatial data. The research product will be a stand-alone desktop (ST) analysis and visualization tool, building on the legacy core software developed by BioMedware. These tools will be suited for the analysis of data outside health sciences, such as in remote sensing, geochemistry, urban infrastructure or soil science, broadening significantly the commercial market for the end product.
- Conduct further research developments to: 1) extend the new approach (quantile regression forest with kriged data layer) developed in Phase I to include additional ML algorithms (i.e., support vector machines, gradient boosting) and spatial data layers (e.g., eigenvectors of distance matrix) in the comparison study, and 2) generalize cross-validation and performance measures to the multivariate case.
- Complete a fully functional and tested soft indicator coding and ML geostatistical interpolation software product ready for commercial distribution.
- Conduct a formal usability study to evaluate the design of the prototype based on usability protocols developed by the NIH involving (i) expert evaluation by the firm Tec-Ed and (ii) usability testing by representative users.
Customer Endorsements
“I’m using SpaceStat to analyze the spatial structure of estuarine seagrass beds in Atlantic Canada for purposes of conservation and management and I’ve been able to do the type of analysis that I was after. The variogram modeling capability and LISA functions have especially been very useful.”
Jeff Barrell, PhD Student
Dept. of Oceanography Dalhousie University