BioMedware Chief Scientist Dr. Pierre Goovaerts will present his talk, titled “Finding water service lines with hazardous material using a geostatistics-informed machine learning approach,” at the 2025 AAG Annual Meeting on March 24th at 10:10AM EST. Learn more about the conference and register here.

Attendees will learn about how water service lines (SL), crucial for connecting buildings to the public water supply, are often outdated and built from lead, presenting significant health risks to Americans. Increasingly machine learning (ML) approaches are used to identify which houses are most likely to have lines with hazardous material given features that reflect a combination of property characteristics (e.g., year built, property tax assessment), historical data (e.g., plumbing permits, meter installation records, SL installation, inspection and maintenance records), administrative spatial data, and tap water quality samples. One of the strengths of ML algorithms is that they are very flexible and not restricted to linear relationships between these predictors and the target values. Such classification algorithms are however unable to handle heterogeneity related to geographic location and similarity of homes.

Pierre will discuss the development of a hybrid geostatistical-machine learning approach to account for secondary information in spatial predictions. This GeoAI (Geographic Artificial Intelligence) approach leverages the flexibility of Machine Learning Algorithm (MLA) and kriging algorithm. Spatial autocorrelation was incorporated using geostatistics-informed data layers (i.e., kriging estimates) as additional predictors in MLA, resulting in more accurate predictions and models of uncertainty than MLA or indicator kriging alone. The comparison study was based on a Flint dataset that includes observations (type of material) for 26,731 tax parcels and 72 covariates, notably construction year, type of fire hydrant the closest to the property, use type (residential, commercial, industrial), rental (yes/no), home value, USPS (US Postal Service), and a series of socio-demographic attributes at the census tract level.