You are here: Home Techical overview

Technical Overview

PART 1 - CONTEXT AND BACKGROUND TO GAP

 

1. Methodological foundations

 

GAP is strongly based on the CSIR’s mesoframe methodology, which takes its name from the irregular, meso-scale geoframe that was developed as the primary component of what has become the Geospatial Analysis Platform (GAP). The other two components include a) a workbench for multi-scale spatial data mining, and b) network datasets and analysis tools - used to link local areas in terms of larger regions, and calculate a variety of accessibility and regional concentration indicators.

Figure 1: Main components of the GAP platform

 

2. General functional specification

Seen from a functional perspective, GAP can be described as a common, meso-scale geospatial platform for the assembly, analysis and sharing of strategic geospatial information. Stated in more detail, GAP can be used for:

  • Assembling, aggregating and exchanging mesoscale information on key human need, demand, potential, economic accessibility and sustainability variables.
  • Profiling and comparing local development magnitudes (indicating “how much is where”) from a strategic, district / regional perspective.
  • Undertaking strategic pattern analysis and hot-spotting.
  • Identifying adjoining zones and/or wider functional regions, and developing a range of accessibility and reach indicators (i.e. indicating “how much can be reached from where”).
  • Constructing a range of composite territorial indicators (e.g. composite indicators of service delivery need or economic potential) from a combination of: a) local area or intra-locational variables; b) surrounding area, proximity, accessibility, and other inter-locational variables; and c) regional variables.
  • Developing an enhanced understanding of South Africa’s human/economic geography and how economic activity interacts with the built and natural environment.
  • Providing a basis for contextualisation and customisation of government policies and general service delivery strategies in accordance with “structurally different” types of contexts and/or relevant territorial indicators.

 

3. Meso-scale definition and relevance

The term meso-scale should be seen to refer both to a specific proposed convention (for a specific commonly used reference scale) and to a possible extended range of “meso-scales” (which could be more or less than the agreed convention for a commonly used scale, and vary according to the specific purpose or geographic extent of the relevant study area). The proposed convention for a commonly used meso-scale reference – in the South African context at least – is an intermediate, 5-10 km scale of spatial resolution and associated data accuracy. This is deemed to be a relevant as well as computationally feasible scale for undertaking: a) the assembly and rescaling of a variety of macroscopic and microscopic data layers; b) the construction of computationally manageable inter-zonal distance or interaction matrices; c) regional analysis and planning; d) robust, strategic-level comparisons of local development and related magnitudes; and e) bandwidth-efficient electronic transmission and sharing of strategic geospatial information.

Figure 2: Focus for GAP platform.

 

4. The key focus

The key focus of GAP is referring particularly to the “how much is where?” type of question – is mainly on human/ economic activity and population variables (such as volumes of economic activity or persons below the minimum living level) and derived indicators such as demands on infrastructure and ecosystem services.  In order to also address relevant explanatory questions (e.g. why does this area seem to have the highest concentration of employment opportunities?) or derived demand questions (e.g. what are the cumulative human activity related demands for infrastructure and ecosystem services in this region?), the range of estimated mesozone indicators is extended via linkages with other domain-specific information systems (such as AGIS – the SA Agricultural Geo-referenced Information System), models and/or web-linked analysis services (planned to be a key feature of future GAP deployments).

 

5. The problems addressed through GAP

GAP aims to address the following problems or constraints often associated with spatial analysis:

5.1    the Gordonia problem

One of the problems often encountered when using thematic type maps was the so-called Gordonia problem– named after the Gordonia magisterial district in the Northern Cape (see Figure 3).

 

 

 

 

 

 

 

 

 

 

 

Figure 3. The Gordonia problem – Example of a map from the first release of the NSDP

Besides its extreme internal heterogeneity or variability (viz. the stark difference between the irrigation agriculture areas along the Gariep river and the semi-desert areas in most of the remainder of the district), Gordonia was the largest magisterial district in the country (the principle applies for other administrative demarcations as well). Accordingly, many statistical comparisons based on magisterial district magnitudes tend to show it as an “area of significance” (for example, compared to the Wynberg district, which is almost too small to see on a countrywide map of magisterial districts). Should the magisterial districts totals (volume of economic activity, population or any other relevant magnitude) however be divided by the surface area (i.e. to provide measures of magnitudes per square kilometre), Gordonia would fade out of significance. But such a “solution” would then lead to an oversight of the high intensity irrigation agriculture areas along the Gariep.

 

The Gordonia problem can be seen to also exist at more detailed spatial scales, such as with the spatial analysis and representation of information based on census enumerator areas. Since most classifications of census enumerator areas are, by implication, based on the spatial distribution of population, the sizes of these areas also vary in relation to the density of the population. Besides the distorting effect that this could have on population related “quantity maps”, this makes it difficult to relate population and related statistics to statistics about the natural environment (which is mostly at a very fine scale, such as the 30m grid used for South Africa’s Land Cover dataset).

 

5.2  Differing analysis units and scales

A related type of problem is illustrated by Figure 4. This problem, described extensively in an article by Vermaat et. al. (2005) is the differing analysis units and scales that are typically used for economic and ecosystem modelling.

Figure 4. Illustration of Limited data and model inter-operability

 

5.3 The Modifiable Area Unit Problem

The Gordonia problem is a typical manifestation of what – in the relevant technical literature – is known as the modifiable area unit problem (MAUP). The MAUP is a common occurrence (and potential cause of disputes) in situations were the quantities in one area (such as persons in need, or economic output) have to be mapped and compared with that in another area, and decisions have to be made about the areas with greatest significance, or the hot spots to be targeted. Stated very simply, MAUP refers to distortions or wholly different pictures (e.g. different pictures of the apparent hot spots, or high magnitude areas) caused by: a) varying sizes and demarcations of statistical analysis units; and/or b) changes in the scale of analysis.

The effect of MAUP and related issues is often ignored when territorial indicators are developed, and tends to be aggravated by the apparent ease and lack of caution with which GIS tools are used to construct indicators and produce thematic maps. In a critical, outspoken article on the subject, Openshaw (1996) has drawn attention to the inadequate understanding and consideration of the geography of the data, resulting in the development of biased, oversimplified or wrong area profiles. Stated in his words, the problem is that:

“large amounts of public funds are often allocated on the basis of simple minded indices used to rank areas. Simple minded technology is clearly attractive to end users, because the results are easy to understand, but they can also be wrong!” (Openshaw, 1996, pp 63–64).

He goes on to raise the following questions and concerns:

“If you are ranking or comparing area A with area B, what makes you think that these areal entities are comparable? It is true that they may well be of the same class of areal objects (eg. Census district or wards) and that people seldom compare districts with wards because they are perceived to be different. However, are all districts comparable? Are some district-wise comparisons in fact equivalent to comparing a district with a ward or a district with a country?” (Openshaw, 1996, p. 64).

Most of this relates to the basic dilemma of having to differentiate on the basis of between-area differences, but then sometimes disregarding within-area differences (heterogeneity). This typically leads to the inadequate targeting of pockets of deprivation (or other types of minority need) in small towns or areas which, in total, may only constitute one ward or district.  In practice, the heterogeneity of the micro-data patterns interacts with the location of zonal boundaries (which is sometimes distorted by gerrymandering) and zone size to generate all manners of complexity (Openshaw, 1996). It should thus not be surprising that even the best science and spatial analysis technologies have still been unable to “solve” these and other “MAUP issues”.

 

PART 2:  THE MESOFRAME METHODOLOGY (2007 and 2011 version)

 

This section describes the CSIR’s mesoframe methodology, which takes its name from the irregular, meso-scale geoframe that was developed and became the primary component of the Geospatial Analysis Platform.

 

6.  Development of the mesoframe (2006-2007)

As noted, the primary component of GAP is the meso-scale "geoframe" for South Africa (SA Mesoframe) - a demarcation of South Africa into a “grid” of just less than 25 000 “mesozones”, each approximately 50 km2 in size. These mesozones have been defined in such a way that:

  • they are nested within municipal boundaries and other significant geo-economic and historical area demarcations (such as the former homeland boundaries); and
  • the zone boundaries correspond with major travel barriers (such as rivers), the “break lines” between sparsely populated areas (such as mountains), and areas with medium to high levels of human activity (such as fertile valleys or built-up areas).

 

Figure 5 provides an overview of the process used to demarcate the mesozones. First, key boundaries and break lines are used to define a so-called macroframe; with internal subdivisions such as functional urban areas (FUAs) and low (human) activity and protected areas (LAPAs) – e.g. uninhabited mountains and game reserves. The second step involves the application of a combination of conventional GIS tools (such tools for the demarcation of Thiessen polygons) as well as a number of rarely used tools (including a tool for defining “contiguous cartograms”) to undertake a semi-automated demarcation of each macroframe into the required number of zones, and resize them so that each approximates the chosen size norm.

Figure 5. Outline of methodology to demarcate the mesozones

7.  Refinement  of Mesozones (2007)

Based on feedback from end-users, an extensive, partly manual re-demarcation was undertaken in the period December 2006 to February 2007 to, inter alia:

  • Ensure that the boundaries of the mesozones (drawn with semi-automated tools) do not cut across or divide settlements.
  • Reduce the size variation of mesozones.
  • Improve the homogeneity of the mesozones – especially in areas where there are sharp differences in the density and type of human activity.
  • Ensure greater alignment with old homeland boundaries – especially where these coincide with sharp differences in population densities and land cover.
  • Adjust the shape and number of functional urban zones around small, medium sized and major towns.

 

Some of the specific feedback / user requirements received included the following:

  • The need to improve the homogeneity of analysis zones – especially in areas where there are sharp differences in the density and type of human activity.
  • Improve the capability to assess spatial linkages, interactions and the boundaries/attributes of functional regions (i.e. regions that might be formed as a result of spatial linkages and interactions).
  • Ensure that the boundaries of analysis zones (drawn with semi-automated tools) do not cut across settlements.
  • Produce better geographic position and accessibility measures, e.g. functional urbanisation measures, and service accessibility measures.

 

8.  Multi-scale spatial data mining (GAP version 2)

The second main GAP component is a multi-scale spatial data mining workbench which used extensively for GAP version 2 and changed during the 2011 GAP update (see 10), used principally to disaggregate large area data, assemble this together with small area, field and point data (e.g. town data) and - in this way - ‘populate’ the mesozones with so-called framework data (i.e. about levels of economic activity, numbers of households in different income brackets, land cover composition, etc.)

Figure 6. Allocation of macro control totals

A variety of spatial data mining methods were used. Principal among these is the use of a variety of “spatial weight fields” (derived from point, polygon or surface data per mesozone) to disaggregate so-called control totals (see Figure 8). The choice of relevant weight field data was informed both by the availability of data, and the type of activity (see first column of Table 1).

Table 1. Examples of chosen weight fields per type of activity

Main activity types

Examples of sub-types

Principal weight fields

Resource based activities

 

Agriculture and forestry

Weighted hectares of agriculture & forestry land cover

Mining and fishing

Weighted points (mines & harbours)

Residential and population- serving activities

“Basic” retail and other commercial services

Villages & towns

Population and household income

Health, education and other social services

Education and health facilities

High-threshold service and manufacturing activities

Transport terminal activities

Airports and harbours

Finance, real estate & high threshold retail activities

Town size, commercial land use  & shopping centre GLA

Manufacturing

- Industrial land use/ land cover

- Sawmills & other non-urban manufacturing sites

- Town size (measured in terms of Urban Functional Index (UFI)

Infrastructural activities

Bulk water & energy supply

Power stations & dams

Public transport, local roads, water & energy distribution

- Population & household income

- Non-infrastructural GVA

Tourism activities

Weighted tourism points

Figure 7 provides an example of resolution and type of information extracted from the South African National Land Cover Dataset – in this case showing the spatial distribution of cultivated and irrigated agricultural land, as well as land used for sugarcane farming.

Figure 7. Example of land cover data (Agriculture)

 

9.  Network datasets and analysis tools

As discussed earlier, the mesozones are also linked to a strategic national road network and associated analysis tools, forming the third main component of GAP. The use of this component makes it possible to:

  • construct a variety of inter-zonal distance and travel time matrices;
  • estimate quantities of economic and other human activities within specified distance or travel time ranges (e.g. undertake proximity counting); and
  • calculate a range of accessibility and related measures (including “functional urban accessibility measures” based on measured distances or travel times to the nearest town of a specified hierarchical order).

The strategic national road network (now referred to as the GAP digital road network) consists of an interconnected network of digital road links (classified in terms of type of link and maximum attainable speed), supplemented with “Delaunay-type[1]” connections between adjacent zones (linking most of the zones except mountainous, inaccessible areas). Using a GIS-based network analysis routine, it is then possible to construct a variety of inter-zonal distance and travel time matrices.

 

10.       Updating the GAP mesozones and related platform (GAP Update 2011)

The 2007 version for the Geospatial Analysis platform contained economic information supplied by Global Insight which represented information in 2004. Subsequently only minor updates (and mostly for a specific province) was undertaken on request.  In 2010 the decision was made to update the economic and demographic information to reflect a more up to date spatial reality. In addition there were also some boundary changes in some Local Municipalities, which also affected related district municipality- as well as provincial boundaries.  This meant that there was also a requirement to adjust the mesozone dataset in order to align (nest) with the new administrative boundaries. The multi-scale data mining approach was also changed to make future updates easier and more automated. Dasymetric mapping was applied in the process to populated the latest mesozones with attribute data. Dasymetric mapping, defined here generally as the use of an ancillary data set to disaggregate coarse resolution population data to a finer resolution (Eicher and Brewer 2001). Research suggests that dasymetric mapping can provide more accurate small-area population estimates than many areal interpolation techniques that do not use ancillary data. Thus, the process of a dasymetric map involves transforming data from the arbitrary zones of the aggregate dataset to recover (or try to recover) and depict the underlying statistical surface. This transformation process incorporates the use of an ancillary dataset (s) that is separate from, but related to, the variation in the statistical surface (Eicher & Brewer 2001).

The underlying datasets used was not polygons but a point data set (notable the Spot building count was used) as an ancillary source. The argument is that the SBC-points are an accurate ancillary source for human activity and therefore for all socio-economic related activities. The inverse of the argument is that one would not find any socio-economic activity where there is not any type of building present, whether formal or informal.

A novel approach based on dasymetric mapping principles was used in order to classify a point dataset of all buildings (spot building count) in South Africa based on socio-economic characteristics. The reasons behind using a socio-economic classified point dataset to develop a flexible data integration frame were that:

  • ecological / environmental data is bound by hard physiographic data (inflexible boundaries like water catchments, plant biomes, or mountain ranges);
  • point data can be assigned easily to any demarcation;
  • SBC-points are an accurate ancillary source for human activity.

 

As can be deduced from the principles of dasymetric mapping as a method of areal interpolation, the accuracy of the depiction of the data is heavily dependent on the quality of the ancillary data used to predict the variation in the spatial distribution of the variable in question. Another consideration is also that the ancillary data used must be updated regularly (at least yearly) to ensure consistency for future updates.

 



[1] A triangulation technique connecting the centroids of neighbouring zones to create a supplemental irregular road network

References:

Eicher, C.L. and Brewer, C.A., 2001. Dasymetricmapping and areal interpolation: implementation and evaluation. Cartography and Geographic Information Science, 28(2): 125-138.

Openshaw S., 1996, “Developing GIS-relevant zone-based spatial analysis methods”; in Longley, P and Batty, M., Spatial Analysis: Modelling in a GIS Environment. Geoinformation International, New York.

Vermaat J., Eppink F. and Van den Bergh, J.C.J.M., “Aggregation and the matching of scales in spatial economics and landscape ecology: Empirical evidence and prospects for integration”, Ecological Economics 52 (2005) pp 229– 237.

This document is an extract from a CSIR publication entitled: Naude et al. 2007. Technical Overview of the mesoframe methodology and South African Geospatial Analysis  Platform. Report number CSIR/BE/PSS/IR/2007/0104/B.