Maps play a crucial role in representing spatial information, allowing us to visualize and analyze data in a geospatial context. However, one critical aspect that significantly influences the interpretation and accuracy of data analysis on maps is its level of detail or scale.
This situation is common in epidemiological studies where data is typically available over a wide range of spatial scales, spanning from individual-level to different levels of aggregation. For example, the percentage of patients diagnosed with prostate cancer at a late-stage was mapped over a region of Northern Florida using three types of administrative units: 25 counties, 273 ZIP codes, and 222 census tracts (Goovaerts, 2012).
Let’s further explore how scaling affects the level of detail shown on a map and how it can impact the accuracy of the spatial analysis.
Scale versus spatial scale
It’s essential to understand the distinction between scale and spatial scale.
The scale of a map is usually represented as a ratio or a bar scale (e.g., 50 km bar scale in the Florida example), indicating the relationship between the distance on the map and the actual distance on the ground. The scale is the same for all three maps above, but the accuracy or detail level differs.
Spatial scale involves grain (the size of the smallest resolvable unit) and extent (the size of the study area). All three maps have the same extent, but ZIP codes and census tracts tend to be smaller than counties, so the two maps on the right have a finer or smaller spatial scale.
Choropleth versus isopleth maps
A second important distinction is between choropleth and isopleth maps. These two types of thematic maps are used in cartography to represent spatial patterns of data. They have distinct characteristics and are suitable for different types of data visualization.
- Choropleth maps, such as the ones shown above for Florida, display data by dividing a geographic region into distinct areas and coloring or shading those areas based on the data being represented.
- Isopleth maps, or contour maps, represent data that varies continuously across a geographic area. These maps use contour lines or shaded areas to indicate the distribution and intensity of a particular variable; see the example below for the same Florida data.
Spatial scale and map interpretation
Returning to the Florida example, you’ll notice that all three choropleth maps display different spatial patterns. In particular, zones with higher rates of late-stage diagnosis shift east as the size of geographical units decreases.
Such influence of the aggregation level (i.e. county, ZIP code or census tract) on the results illustrates the modifiable areal unit problem (MAUP), whereby the interpretation of a geographical phenomenon within a map depends on the scale and partitioning of the areal units that are imposed on the map (Openshaw, 1984).
The aggregated data may appear very different in different partition sets. For instance, the epidemic situation in an area with very high disease incidence may be overlooked if some other adjacent districts with lower case incidence are aggregated together using a different set of district boundaries.
Another common source of visual bias is the smaller population density of large geographical units relative to smaller units. Think of large rural counties versus small urban counties. When looking at a choropleth map, these larger sparsely populated units might receive a disproportionate amount of attention. These effects are particularly important for census tracts since they typically display a wide range of sizes and shapes.
Spatial scale and accuracy of data analysis
One way to deal with MAUP would be to analyze the original individual-level data rather than the aggregated ones. Yet, sensitive data such as disease cases are often not available or cannot be published for confidentiality reasons.
Using smaller areal units (e.g., ZIP codes rather than counties or block groups rather than census tracts) for data aggregation may decrease this MAUP effect while protecting confidentiality. Smaller geographical units, however, tend to have smaller population sizes, leading to potentially unreliable or unstable disease rates, a phenomenon known as the “small number problem”.
The MAUP is also closely related to the ecological fallacy with the false assumptions of homogeneity in aggregated data. Therefore, users should use utmost caution when extrapolating results obtained through aggregated data analysis to individuals.
How to tackle the scaling problem?
Geostatistics offers tools to address the “small number problem” and the “modifiable areal unit problem”. The aims are twofold:
- Filter the noise attached to rates calculated from small populations, leading to more stable rate estimates
- Downscale data aggregated over different spatial supports to create a continuous map (i.e., isopleth) of the disease rate, thereby attenuating the visual bias associated with the interpretation of choropleth maps.
A measure of the variance of prediction errors is also available to identify large and sparsely populated areas where risk estimates are less reliable; see the right-hand-side isopleth map below. In the Florida example, ZIP code and census tract level data were combined, noise filtered and disaggregated using binomial kriging (Goovaerts, 2012). The resulting isopleth map exhibits a regional pattern that is closer to the one displayed by the more reliable county-level rates than by the original ZIP code-level and census tract-level data.
The upcoming August 2023 release of Vesta includes techniques for tackling the scaling problem, including geostatistical techniques for upscaling, downscaling and sidescaling data. It also includes other advanced multivariate geostatistical methods as well as the first-ever software implementation of space-time joinpoint regression.
Sources:
- Goovaerts, P. 2012. Geostatistical analysis of health data with different levels of spatial aggregation. Spatial and Spatio-temporal Epidemiology, 3(1), 83-92.
- Openshaw, S. 1984. The modifiable areal unit problem Concepts and Techniques in Modern Geography No. 38 Geo Books, Norwich http://qmrg.org.uk/files/2008/11/38-maup-openshaw.pdf