Problems with centroids

While point data is appropriate for many things, such as the representation of individual events or things that have a precise and small location on the ground, centroids are an imperfect representation for areas.

There are at least two limitations of centroids: they are a simplification of the data and the calculation of a centroid can be complex and accomplished in different ways, which may influence the results. (There are also limitations of polygon data, as will be illustrated in another example).

Centroids are a loss of information

In this case, we know that the data for each ZIP code comes from a particular area on the ground. The spatial support for a measurement, in this case SMR, is the area to which the measurement pertains. By reducing that support for the measurement from an area to a point, we're losing information about adjacencies and relationships among neighboring areas (for instance, we would lose the information that ZIP 11758 is adjacent to 11701 in the illustration below). And the map is less rich to view which interferes with visual interpretation.

Some of the ZIP codes on Long Island form multiple unconnected polygons (see below). In that case, the point and polygon representations will be very different. Instead of three pieces, with several different neighbors, there will be a single centroid for ZIP 11758 for which the location will vary by the centroid calculation method.

Where is the best place to put a centroid of a multi-part polygon?

By one centroid calculation method, the centroid for 11758 is right on the eastern edge of the biggest polygon (below).

3_part_ZIP_centroid.gif

Centroid calculation can be complex for complex geographies

Polygon centroids can be calculated several ways. The geometric centroid is the center of mass of the polygon, and for some shapes (a doughnut for example) the centroid is outside the polygon itself (see below). Other ways of calculating centroids restrict the centroid to be within the polygon. Although there is uncertainty about all geographic coordinates, the uncertainty about the precise location of the centroid, or even what is the most meaningful centroid, makes interpreting centroid results more tenuous.

The centroid of a polygon can be outside its area.

Next