Dr. Pierre Goovaerts, Principal Investigator, and BioMedware have been awarded a new SBIR Phase I grant titled “Software for Geostatistical Smoothing and Joinpoint Regression Modeling of Time Series of Compositional Variables in Epidemiology.” This grant will implement novel methods and software for working with variables that always sum to a specific value (e.g., compositional variables). 

For example, consider breast cancer stage at diagnosis, which may be early-stage, mid-stage, or late-stage. In a given population of women diagnosed in Washtenaw County in 2023, 27% of the diagnoses might be early-stage, 50% mid-stage, and 23% late-stage. These percentages sum to 100%, representing all of the diagnoses. Unsurprisingly, many variables have this property and are compositional variables; however, few modeling techniques are specifically designed for them.  This grant will develop geospatial modeling methods designed to handle compositional data, beginning with geostatistical interpolation and joinpoint regression

Here is the abstract of the new grant:

Joinpoint regression developed by the NCI Surveillance Research Program is increasingly used to identify the timing and extent of changes in time series of health outcomes and to project future disease burden. Many analyses of population data (e.g., cancer stage at diagnosis, causes of death, and patterns of health behavior), including those often used in joinpoint regression, are based on percentages or proportions, as the focus is typically on relative, not absolute frequencies. Such time series of compositional variables need to be modeled simultaneously to guarantee the coherency of the individual temporal trends; that is the predicted percentages sum to 100% at each time step. Analyzing temporal trends outside a spatial framework is also unsatisfactory because significant variation even within a single State is not accounted for. Modeling multivariate time series in a spatial framework is a significant theoretical, methodological, and computational challenge that is not tackled by the NCI software. This research will address this need using spatial compositional data analysis (CoDa) whereby geostatistical noise-filtering and time trend modeling is conducted on log transforms of ratios. 

This SBIR project is developing the first commercial software to offer tools for spatial geostatistical noise filtering and joinpoint regression analysis of time series of compositional variables in epidemiology. The research product will be a stand-alone desktop space-time (ST) analysis and visualization tool, building on the legacy core software developed by BioMedware. These tools will be suited for the analysis of data outside health sciences, such as in geochemistry, economy or soil science, broadening significantly the commercial market for the end product. This project will accomplish three aims: 

  1. Develop a simulation-based methodology to propagate the uncertainty caused by the small number problem through the computation of the main types of log-ratio transform available in the CoDa literature and compare the robustness of subsequent analysis with respect to this noise. The modeling of temporal trends by joinpoint regression will be adapted to the compositional nature of the data. These are all novel approaches that are currently not available in the statistical literature.
  2. Develop and test a prototype module that will implement novel methods (propagation of uncertainty, modeling of multivariate temporal trends) developed under Objective #1 into BioMedware’s space-time visualization and analysis technology (Vesta software).
  3. Conduct a usability and user experience study and identify additional methods and tools for Phase II work, including the first CoDa advisor to guide the user through the selection and interpretation of appropriate data representations based on the type of data (i.e., continuous vs count data, rounded vs true zeros).

These technologic, scientific, and commercial innovations will enhance our ability to incorporate compositional data into any epidemiological problem with an underlying spatial or temporal reference.

The purpose of this Phase I grant is to assess feasibility of the novel methods.  Should the techniques prove out Dr. Goovaerts plans to submit a Phase II proposal to fully develop commercial software for the widespread dissemination of geospatial software for the analysis of compositional variables.