Dr. Ningchuan Xiao, Associate Professor, Department of Geography
Rank at the time of award: Assistant Professor
A considerable amount research in population and health rely on aggregated data sources, with the U.S. Census population data as an prominent example. In this type of data, original individual information is converted on a fixed set of spatial units and a descriptive statistic (e.g., sum or mean) is often used to represent the population status of each unit. Researchers using such aggregated data often must face the modifiable areal unit problem in geography and ecological fallacy in other disciplines, as patterns or relationships obtained under one particular aggregation scheme (e.g., census blocks) may not apply under other schemes (e.g., census tracts).
Openshaw (1983) pointed out two specific issues caused by data aggregation: original data can be aggregated into different units and different levels, and both of these problems exist in realistic data sets such as the census data.
To some extent, we can argue that census geographies such as blocks were arbitrarily created (see Figure 1 for an illustration). In many population related studies, one often stipulates the impacts of these aggregations. Given the task of examining the relationship between two demographic variables, for example, a researcher might be interested in how much the result will differ depending on the way aggregation is carried out. Is it possible to obtain opposite results by changing the aggregation scheme?
If the spatial pattern of a particular variable shows a strong positive spatial autocorrelation (i.e., units with high values of this variable are spatially close to each other, and so are the low values), how much of such autocorrelation is caused by aggregation? For the same variable, is there an aggregation scheme that does not have a positive spatial autocorrelation, or even yields a negative autocorrelation?
The above issues are related to the complexity of spatial aggregation. In the scope of this proposed research, the complexity of spatial aggregation is referred to as the multiplicity of spatial aggregation schemes that exhibit the same measurable quality. But these aggregations have different spatial configuration and applying these aggregations will lead to significantly different conclusions.
Researchers have been sensitive to some of the above issues caused by spatial aggregation and a variety of methods have been developed to address these issues. Early attempts were oriented to identifying the impact of aggregation or scale (Openshaw, 1983; Lam and Quattrochi, 1992). For example Moellering and Tobler (1972) proposed the concept of geographical variance that can be used to determine the contribution of spatial scale to different spatial analysis levels. Researcher have also made much effort in developing new methods for the purpose of reducing the impact of aggregation (Subramanian et al., 2001; Lan and Wang, 2008). Among the many methods, Openshaw and Rao (1995) developed an optimization based method that can be used to re-engineer the census geography so that a set of new output spatial units can be created according to various criteria. Martin (1998), for example, used this method to create output census units that exhibit population and social homogeneity. All these methods are designed for specific purposes and they generally do not aim to explore the complexity of aggregation. For example, while the clustering method developed by Lan and Wang (2008) can be used to reduce spatial autocorrelation among aggregated units, it does not provide
further information about whether the reduced autocorrelation itself is an artifact, meaning it only occurs under a specific aggregation scheme. The same problem can be observed in the method by Openshaw and Rao (1995), which is mainly designed to find a single aggregation scheme.