Tuesday, October 27, 2015

Transformation in GIS

A geographical transformation is a mathematical operation that converts the coordinates of a point in one geographic coordinate system to the coordinates of the same point in another geographic coordinate system.
Since geographic coordinate systems contain datums that are based on spheroids, a geographic transformation also changes the underlying spheroid. There are several methods, which have different levels of accuracy and ranges, for transforming between datums.
A geographic transformation always converts geographic (latitude–longitude) coordinates. Some methods convert the geographic coordinates to geocentric (X,Y,Z) coordinates, transform the X,Y,Z coordinates, and convert the new values back to geographic coordinates

Affine transformation is a geographic transformation that scales, rotates, skews, and/or translates images or coordinates between any two Euclidean spaces. It is commonly used in GIS to transform maps between coordinate systems.

In an affine transformation, parallel lines remain parallel, the mid-point of a line segment remains a mid-point and all points on a straight line remain on a straight line.

Geometric transformation is the process of using a set of control points and transformation equations to register a digitised map, satellite image, or an air photo to a projected coordination system.

Geometric transformation converts a newly digitised map into projected coordinates by a process called map-to-map transformation. A remotely sensed image is converted to projected coordinates using image-to-map transformation. This is also called georeferencing

Different methods have been proposed for transformation from one coordinate system to another. Each method is differentiated based on the geometric property it preserves and the changes it allows. Transformation results in either:
  • Changes in position and direction
  • Uniform change of scale or
  • Changes in size and shape
Below are listed the various transformations and their effect on a rectangular object:
  1. Equiarea transformation permits rotation of rectangle and preserves its shape and size.
  2. Similarity transformation permits rotation of rectangle and preserves its shape but not the size.
  3. Affine transformation allows angular distortion but preserves parallelism of lines
  4. Projective transformation allows both angular and length distortions and thus allows the rectangle to be transformed into an irregular quadrilateral.
Generally, Affine transformations are used for map-to-map or image-to-map transformations and projective transformation is used for aerial photographs with relief displacement.

Spatial Analysis in GIS


GIS is differentiated from other information systems due to its spatial analysis functions.
Spatial analysis functions are used to answer questions about the real world using  GIS databases as a model of the real world.
Spatial analysis techniques are used to create an image of reality that can be easily understood.
Basic spatial analysis can be performed at various levels:

  1. Sorting data in attribute tables for presentation
  2. Performing arithmetic, boolean and statistical operation on attribute tables
  3. Compiling new data based on original and derived attributes or based on geographic relationships.
  4. Within each level operations may be logical, arithmetic, geometric, statistical or a combination of any of these four types.

The two fundamental functions of GIS are:
  1. Generation of maps and
  2. Generation of tabular reports 
Spatial analysis requires logical connections between attribute data and map features. Spatial analysis builds operational procedures on spatial relationship between map features. 

Attribute query involves selecting information by the use of logical questions. When no spatial information is required to ask a question, the query is considered an attribute query. 

A spatial query involves selecting features based on spatial relationships. The answer to such queries can be obtained by a hard-copy map or by using a GIS.

Basic GIS analysis involves attribute queries and spatial queries. Complicated analysis require a series of GIS operations involving multiple attribute and spatial queries, alteration of original data and generation of new data sets.

An effective spatial analysis uses the best available methods appropriately for different types of attribute queries, spatial queries and data alteration. 

The design of the analysis depends on the purpose of the study.

The use of GIS to inquire geographic features and retrieve associated attribute information is called identification. This process generates new set of maps by query and analysis. Spatial analysis helps make the new information clearer. GIS operational procedure and analytical tasks that are suited for spatial analysis are discussed below:
  1. Single layer operations are procedures that correspond to queries and alterations of data that operate on a single data layer. For example, creating a buffer zone (silence zone) around all schools in a city is a single layer operation
  2. Multi layer operations are used for manipulation of spatial data on multiple data layers. For example, the overlay of two input data layers produces a map of combined polygons.
  3. Topological overlays: These are multi layer operations that allow combining features from different layers to form a new map and give new information and features that were not present in the individual maps.
  4. Point pattern analysis deals with examination and evaluation of spatial patterns and processes of point features.
  5. Network analysis: It is designed specifically for line features organized in connected features and typically applies to transportation problems and network analysis. For example: school bus routing, walking distance, bus stop optimization, etc
  6. Surface analysis deals with the spatial distribution of surface information in a three dimensional structure.
  7. Grid analysis involves processing of spatial data in a regularly spaced form. 
  8. Fuzzy spatial analysis is based on fuzzy set theory. Fuzzy set theory is a generalization of boolean algebra where zones of gradual transition are used to divide classes instead of crisp boundaries. Fuzzy algebra offers various other methods to combine different sets of data for landslide zonation map preparation. Fuzzy logic can also be used to handle mapping errors or uncertainty.
Geostatistical tools for spatial analysis
Geostatistics studies the spatial variability of regionalized values. Tools to characterise spatial variability are:
  1. Spatial auto-correlation function and
  2. Variogram
Spatial auto-correlation examines the correlation of a random process with itself in space. Examples of such phenomena are: 
                                      -Total amount of rainfall
                                      -Toxic element concentration
                                      -Elevation at triangulated points, etc
The spatial auto-correlation function depicted as a graph is called a spatial auto-correlogram and this gives an insight into the spatial behaviour of the phenomena under study.

A variogram is calculated from the variance of pairs or points at different separation.

Spatial analysis is a vital part of GIS and can be used for many applications like:
  1. Site suitability
  2. Natural resource management
  3. Environmental Disaster Management
Spatial Analysis is the heart or core of GIS because it includes transformations, manipulations and methods that can be applied to geographic data to support decisions, reveal patterns and anomalies not immediately obvious and add value.

Spatial analysis is a set of methods whose results change when locations of the objects being analysed or the frame used to analyse them changes.

Spatial analysis can be:
  1. Inductive: Examining empirical evidence and searching for patterns that might support new theories or general principles
  2. Deductive: Focussing on testing of known theories against data
  3. Normative: Using spatial analysis to develop new or better designs
Analysis can also be carried out on attribute tables of a GIS by plotting one variable against the other as a scatterplot and examining the dependence of one variable on one or more independent variables

Regression analysis can be used to find the simplest relationship and multiple regression can be used to understand the effects of multiple independent variables.

Changing relationship between variables with space is called spatial heterogeneity.

One of the most powerful features of a GIS is the ability to join tables based on common geographic location.

The point-in-polygon operation is used to determine if a point lies inside or outside a polygon.

The polygon overlay is similar to the point-in-polygon operation.

Overlay in raster is very simple. The attributes of each cell are combined according to a set of rules. 

The ability to calculate and manipulate distances forms the basis of spatial analysis.

Distance along a route (represented by a poly-line) is calculated by adding the lengths of each segment of the poly-line.

Since poly-lines short-cut corners, the length of a poly-line is shorter than the length of the object it represents leading to slight discrepancy.

Buffering builds new objects by identifying all areas that are within certain specified distance of the original object.

Point patterns can be identified as clustered, dispersed or random.

There are SIX CATEGORIES of spatial analysis:
  1. Queries and reasoning
  2. Measurements
  3. Transformations
  4. Descriptive summaries
  5. Optimization and
  6. Hypothesis testing
Queries and reasoning are the most basic analysis operations where GIS is used to answer simple questions. No changes occur in the database and no new data are produced.

Measurements involve measurement of simple properties of objects such as length, area or shape and relationship between pair of objects such as distance or direction.

Transformations are simple methods of spatial analysis that change data-sets by combining them or comparing them to obtain new data-sets and finally new ideas. Transformations use simple arithmetic, geometric or logical rules. They include operations that convert raster data to vector data and vice versa. They may create fields from collections of objects or detect collection of objects in fields.

Descriptive summaries attempt to capture the essence of a data-set in one or two numbers. 

Optimization techniques are normative in nature and are designed to select ideal locations for objects given specific constraints. They are widely used in market research, package delivery industry, etc.

Hypothesis testing focusses on reasoning from the results of a limited sample to make genaralizations about an entire population. Hypothesis testing is the basis for inferential statistics  and forms the core of statistical analysis.

Spatial analysis can be done by overlay analysis by overlaying land use and flood zone to determine the residential parcels inside a flood zone area. This data can be used by insurance companies to target their insurance sales.

Farmers can use interpolation to examine soil samples from a farm area.

Shop owners can establish their stores based on location (distance and density) analysis

Data types in spatial analysis
The three types of data used to characterize problems of spatial analysis are:
  1. Events or point patterns: Examples of this type are: crime spots, disease occurrences, localization of vegetal species, etc.
  2. Continuous surfaces: Examples of this type are: geological maps, topographical maps, ecological maps, etc.
  3. Areas with counts: Examples of this type are: population surveys, health statistics, etc that are demarcated by closed polygons (postal zones, municipalities, etc)

Problems of spatial analysis deal with environmental and socioeconomic data. 

Basic concepts of spatial analysis
  1. Spatial dependency is an important concept to understand and analyse a spatial phenomena. "Everything is related to everything else, but near things are more related than distant things" -Waldo Tobler (First law of Geography)
  2. The computational expression of the concept of spatial dependence is spatial autocorrelation.
  3. Statistical inference for spatial data: An important consequence of spatial dependence is that statistical inferences on this type of data will not be as efficient as in the case of independent samples of the same size.
Spatial interpolation is the process of manipulating spatial information to extract new information and meaning from original data. GIS provides spatial analysis tools for calculating feature statistics and carrying out geoprocessing activities as data interpolation.

The two widely used interpolation methods are:
  1. Inverse Distance Weighting (IDW) and
  2. Triangular Irregular Networks (TIN)
Other interpolation methods are:
  • Regularized Splines with Tension (RST)
  • Kriging or Trend Surface Interpolation

Friday, October 23, 2015

Connectivity functions

Connectivity functions examine relationships between objects in terms of adjacency and relative-effective-distance. Operators associated with connectivity include:
-Network analysis
-Diffusion models
-Cellular automata and
-Agent based models

The two connectivity functions widely used in GIS are:
-Contiguity and

Contiguity analysis exploits the topological relationships between objects. It helps determine if two objects are adjacent or if they share a node. Contiguity analysis also helps determine a pattern of spread.
Grouping vector data is used to reclassify a map and dissolve polygon boundaries. Grouping is also done on raster data but is less elegant than grouping operation on vector data.

Contiguity is used to measure shortest and longest straight line distances across and area and to identify areas of terrain with specified size and shape characteristics.

Proximity Functions. The simple distance between features. Four parameters are used to measure proximity are listed below
1. target locations.
2. unit of measurement.
3. a function to calculate proximity and
4. the area to be analyzed.

A common type of proximity analysis is the buffer zone.

Network Functions
A network is a set of interconnected linear features that form a pattern or framework. City Streets, Power Transmission Lines, and Airline Service Routes are examples.
There are three principal types of GIS Analysis performed by Networking.
1. Prediction of loading on the network itself (prediction of flood crests),
2. Rate optimentation (emergency routing of ambulances), and
3. Resource allocation (zones for servicing rescue areas)

Networks analysis entails four components.
1. set of resources (goods to be delivered)
2. one or more locations where the resources are located (several warehouses where the goods are located)
3. an objective to deliver the resources to a set of destinations (customer locational data base) and
4. Set of constraints that places limits on how the objective can be met (is it economically feasible to deliver goods from one point to another)

Spread Functions
Spread functions help determine the "BEST" way to go from point A to point B

Seek or Stream Functions
Seek or Stream Functions refer to a function that is directed outward in a step-by-step manner using a specified decision rule. This function can be used to evaluate erosion potential.

Intervisibility functions
This function is a graphic depiction of the area that can be seen from the specified target areas. Intervisibility functions rely on digital elevation data to define the surrounding topography.

Wednesday, October 14, 2015

Classification of GIS

A GIS application can be classified into the following types:

  1. Four dimensional GIS
  2. Multimedia or hypermedia GIS
  3. Web GIS and
  4. Virtual Reality GIS
The above types of GIS are briefly discussed below:

  1. Four dimensional GIS are designed to handle three dimensions of space and one dimension of time. The spatio-temporal representations can handle only two dimensions of space and one dimension of time. 
  2. Multimedia/hypermedia GIS allow the user to access a range of georeferenced multimedia data by selecting resources from a georeferenced image map base. A map serving as the primary index to multimedia data in a multimedia geo-representation is called a hypermap. Multimedia and virtual geo-representations can be stored either in extended relational databases, object databases or in application specific data stores.
  3. Web GIS: Widespread access to the internet coupled with the use of web browsers and the explosion of geographic information has made it possible to develop new forms of multimedia geo-representations on the web. Many geomatic web solutions are web-based and are rapidly overtaking desktop GIS with the future trends following the same direction.
  4. Virtual Reality GIS: Virtual Reality GIS have been developed to allow the creation, manipulation and exploration of geo-referenced virtual environments. For example, use of Virtual Reality Markup Language (VRML) to experiment with different scenarios. Virtual Reality GIS can also be web-based. An example of application of Virtual Reality GIS is 3D simulation for planning in various scenarios.

Tuesday, October 13, 2015

Analysis functions in GIS

GIS analysis functions fall into four categories:
  1. Retrieval/Classification/Measurement functions
  2. Overlay functions
  3. Neighbourhood functions and
  4. Connectivity functions
  • Retrieval functions basically involve a selective search
Classification/Reclassification functions involve two operations:
  1. Identifying a set of features as belonging to a group and
  2. Defining patterns
Measurement functions measure distances, lengths, perimeters and areas

A selective search is an example of a retrieval function. It involves selection of attributes based on graphic selection tools used to select areas in the map displayed.

Overlay functions could be:
  1. Arithmetic 
    1. Addition
    2. Subtraction
    3. Multiplication
    4. Division
  2. Logical
    1. Used to find where specific conditions occur (and, or, >,< etc)
Vector methods are good for sparse data sets while raster methods are easier for grid calculations

Neighbourhood functions
The basic functions that fall under this domain are:
  1. Average
  2. Diversity
  3. Minimum/Maximum and
  4. Total
The parameters that need to be defined to operate these functions are:
  1. Target locations
  2. Specification of neighbourhood
  3. Function to be performed on neighbourhood elements
  4. Search operation is one of the most common neighbourhood function
  5. Neighbourhood function on a vector model is a specialised search function while on a raster model, polygons are on a separate layer and points and lines are on a separate layer.
  6. Theissen polygon operation

Cartographic modeling by GIS analysis - procedure with an example

Cartographic modeling involves the use of basic GIS functions in a logical sequence to solve complex spatial problems.

It was developed to model:

Statement of conditions or assumptions
  1. Land-use planning alternatives and
  2. Applications that require integrated analysis of multiple geographically distributed factors
  • The term was coined by Dana Tomlin in 1983.
  • Cartographic modeling lies completely under the raster domain. 
  • The nature of analysis is purely additive or subtractive and this complements the values assigned to the raster format of the data
  • The digitised data is layered and these layers are combined to construct constraint maps that can be analysed with reference to any specific geographic problem to arrive at the best alternative.
A cartographic model has the ability to form a logical sequence. The process of cartographic modeling is characterised by working backward to insure that all data that will be needed are identified. This helps to avoid collecting data that will not be needed. The process insures that any judgements to be made are explicitly identified. Hence, subjective judgements are an integral part of cartographic modeling.

Cartographic modeling is a common way of expressing and organising the methods by which spatial variables and spatial operations are selected and used to develop an analytical solution within a GIS.

Cartographic modeling is based on the concept of data layers, operations and procedures. Cartographic modeling capabilities are found in most GIS software. 

Modeling is a logical or mathematical formulation that attempts to simulate some aspect of the real world.

The five steps involved in cartographic modeling are listed below and elaborated subsequently:
  1. Statement of problem or objectives
  2. Statement of conditions or assumptions
  3. Methodology
  4. Implementation and
  5. Evaluation
Statement of problem involves dividing the problem into sub-problems. The objective provides a direction and a clear end to the activity. It helps by the possible routes to solving for the objectives

Statement of conditions or assumptions includes the conditions of the problem. For example: current state, background or case history of the problem. Assumptions in the model define the limitations of the analysis. An assumption of most models is that the processes of the past will continue in the future.

Methodology involves:
  1. Assembly of sub-models into a model, which can be sub-divided into:
    1. Identification of sub-problems (analogous to the concept of divide and conquer)
    2. Development of sub-models that address the sub-problems
    3. Development of a strategy for integrating the sub-models
    4. Developing a flowchart that shows the parts in the context of the whole
  2. Identification of:
    1. Data sets needed
    2. Spatial operations
    3. Non-spatial operations and
    4. Interaction of spatial and non-spatial data
Implementation involves:
Implementing the model using the analytic tools available in GIS. It also involves implementation of techniques to circumvent the limitations of the GIS system.

This involves testing the effectiveness of the model. If the model does not conform to expectations,  its assumptions and components should be re-examined and adjusted where necessary. The above procedures should be performed in an iterative fashion until the objectives are achieved.


Problem: The municipal corporation of a city would like to measure the environmental equity as compared to the siting of waste transfer stations
Restating the problem: Is one particular income class bearing the burden of waste transfer stations ?

Conditions and assumptions
  1. Impact of waste transfer sites on surrounding communities is negligible beyond 500m
  2. Within 500m the effect is uniform
  3. Income classes are distributed evenly throughout a census tract
  • Data sets needed
    • income by census tract
    • location of waste transfer sites
  • Spatial operations
    • points in polygon
    • buffer
    • overlay
  • Operations on attributes
    • select
    • reclass
    • calculate area estimates and
    • generate statistics
  • Outline for flow of implementation
    • select only waste transfer sites in the city
    • generate a 500m buffer around these sites
    • select income data from census data and reclass into three income classes: low, medium and high
    • add a field to hold the original area of each census tract prior to polygon intersection
    • recalculate the income classes based on percentage of census tract left in the intersected polygons. Use the original area field that was brought along in the intersection
    • calculate the totals for each of three generated income classes for the entire city
    • generate pie charts for the number in each income class for:
      •  the entire city and 
      • affected areas
    • create a map showing output
  • The results obtained should be evaluated against the methodology used to test the validity of the model.
  • The model should accurately represent the process being modeled.
  • A statistical analyses that includes both qualitative and quantitative observations should be performed.
  • Based on the above listed criteria, changes to improve the model should be documented and the modeling should be repeated.

Monday, October 12, 2015

Raster data analysis

  • Raster data analysis is based on cells and rasters
  • Raster data analysis can be performed at the level of individual cells, or a group of cells, or cells within an entire raster
  • The type of cell value is an important aspect of raster data analysis
  • Various types of data are stored in raster format
  • Raster data analysis is software specific-raster data. Hence in order to use Digital Elevation Models(DEMs), Satellite images and other raster data in data analysis, they must first be processed and imported to software-specific raster data
  • The core of raster data analysis comprises of local operations that are cell-by-cell operations.
  • A local operation can create new raster from a single input raster or multiple input raster.
  • Converting a floating point raster to an integer raster is a simple local operation
  • Converting a slope raster measured in percent to one measured in degrees is also a local operation
  • Reclassification is a local operation that creates a new raster by classification. Reclassification is also called as recoding or transforming
  • Local operations with multiple rasters are also known as compositing, overlaying or superimposing maps
  • Map algebra is defined as local operations with multiple input rasters.
  • Local operations compute an output raster dataset where the output value at each location is a function of the value associated with that location on one or more raster datasets
  • Focal or Neighbourhood functions produce an output raster dataset in which output value at each location is a function of the value at a location and the value of cells in a specified neighbourhood around that location
  • Zonal functions compute an output raster dataset where the output value for each location depends on the value of the cell at the location and the association that the location has within a cartographic zone.
  • Global functions compute an output raster dataset in which the output value at each cell location is potentially a function of all the cells in the input raster datasets.
    • The two types of global functions are:
      • Euclidian distance and
      • Weighted distance
  • The raster calculator provides powerful tools in algebra syntax to perform mathematical calculations using operators and functions. It can also be used to set-up selection queries or type spatial analyst function syntax. Inputs can be grid datasets or raster layers, shape files, coverages, tables, constants and numbers.
  • The raster calculator provides four groups of mathematical functions: Logarithmic, Arithmetic, Trigonometric and Exponential.
  • The raster calculator allows boolean, relational and arithmetic operators
  • Raster data analysis also involves:
    • Terrain analysis
      • Contour
      • Slope
      • Aspect
      • Hillshade
      • Viewshed
      • Cut/Fill
    • Hydrologic analysis
      • Basin
      • Fill
      • Flow accumulation
      • Flow direction
      • Flow length
      • Sink
      • Snap pour point
      • Snap pour
      • Stream link
      • Stream order
      • Stream to feature
      • Stream shape
      • Watershed
  • Reclassification is replacing input cell values with new output cell values. The reasons to reclassify data are listed below:
    • To replace values based on new information
    • To group certain values together
    • To reclassify values to a common scale
    • To set specific cells to "NoData" or set "NoData" cells to a value

GIS and Knowledge based systems

A knowledge based system is a computer program that reasons and uses a knowledge base to solve complex problems. The common theme that unites all knowledge based systems is an attempt to represent knowledge explicitly via tools and rules as opposed to the methodology commonly adopted by conventional computer programs. A knowledge based system has two sub-systems:

  1. A knowledge base and
  2. An Inference engine
Knowledge Based Systems (KBS) were first developed by Artificial Intelligence (AI) researchers.
Expert systems or Knowledge Based Systems try to simulate the human decision making process manipulating data and knowledge through programmed logic. KBS functions can be integrated within a GIS based spatial decision support system as a knowledge and model base.

Coupling a KBS enables a GIS to be knowledgeable about its constraints and potential as well as become aware of the data source, context and use. Treating all data structures as component of an object provides a facility for flexible analysis of data by KBS and GIS.

The coupling takes the form of exchanging files between GIS and a model where input, display and output stages are the only uses of the GIS capability. The knowledge and mapping techniques are encoded in the schematic inquiry of the KBS to follow through the decision making process involved in mapping and prediction. 

The content and priorities of expert routines and their rules represent knowledge. Their procedures and rules form the basis of their action to achieve the goal. The priorities represent when to initiate a goal satisfying routine.

The effectiveness of communication to the user of GIS coupled with KBS for describing a resource would be greatly enhanced by the use of multimedia facilities. 

Finally, the representation of a specialized field of knowledge using Knowledge Based (KBS) approach provides a means of building linkages between skills of the subject specialist, spatial analyst and the needs of the user. 

Thus, this linkage helps provide answers to questions not easily answered by one discipline on its own.

Classification of GIS models


GIS models have been classified by purpose, methodology and logic although the boundary of their classification criteria has not been clear.
  1. A GIS model may be descriptive or prescriptive: A descriptive model describes existing conditions of spatial data whereas a prescriptive models predicts what the conditions could be or should be. As an example, a vegetation map represents a descriptive model as it shows existing vegetation, while a potential natural vegetation map represents a prescriptive model as it predicts the site that could be used for vegetation without disturbance.
  2. A GIS model may be deterministic or stochastic: Both deterministic and stochastic models are mathematical equations represented with parameters and variables. While a stochastic model considers presence of some randomness in one or more of its parameters, a deterministic model does not. Hence the predictions of a stochastic model can have a certain amount of error.
  3. A GIS model can be static or dynamic: A dynamic model emphasizes the changes of spatial data and interaction between variables whereas a static model deals with the state of spatial data at a given time. Time is important to show the process of changes in a dynamic model. Simulation is a technique that can generate different states of spatial data over time. Many environmental models such as water distribution have been effectively understood using dynamic models
  4. A GIS model may be deductive or inductive: A deductive model represents the conclusion derived from a set of premises. These premises are often based on scientific theories or physical laws. An inductive model represents the conclusion derived from empirical data or observations. For example, to assess the damage potential of a flood, a deductive model based on the laws of hydrology, geology, etc may be used or an inductive model based on recorded data from past floods can relied upon.
  5. A Binary Model is a GIS model that uses logical expressions to select features from a  composite feature layer or multiple rasters
  6. Index model is a GIS model that uses the index value calculated from a composite feature layer or multiple rasters to produce a layer with ranked data
  7. A process model is a GIS model that integrates existing knowledge into a set of relationships and equations for quantifying the physical processes
  8. A Regression model is a GIS model that uses a dependent variable and a number of independent variables in a regression equation for prediction or estimation

Digital Elevation Data

Digital Elevation Data may refer to a digital elevation surface or elevation type as discussed below:

Digital elevation surface is one among the following five types:

  1. Digital Surface Model (DSM) is the highest reflective surface of ground features captured by the sensor. It includes, tree tops, roof tops, tops of towers, telephone poles and other natural or man-made features. Photogrammetry, IFSAR, LIDAR and sonar can provide this type of surface however, accuracy and resolution my vary with technology.
    1. With sonar, DSM may include sunken vessels and other artifacts whereas the bathymetric surface reflects the underwater terrain
    2. With photogrammetry, LIDAR and IFSAR reflective surface includes passing cars and trucks not normally considered a part of digital terrain model.
  2. Digital Terrain Model (DTM) or Bare Earth Surface represents the surface of "bare-earth" after removal of ALL vegetation and man-made features. Such surfaces are generally called Digital Terrain Models (DTMs). Photogrammetry has traditionally generated DTMs when elevations are generated by manual compilation techniques. Unless specified, bare-earth surface includes top surface of water bodies.
  3. Bathymetric surface represents the submerged surface of underwater terrain
  4. Mixed surface is a hybrid of the above surface types. For example, coastal studies may require a DTM of the bare-earth surface merged with the bathymetric surface. Multiple surface representations are useful for a number of applications like, forest inventory studies that require a vegetation surface and a bare-earth surface for the same site.
  5. Point cloud elevation file is a raw data file containing single points with multiple elevations (Three dimensional point samples). An example of a point cloud file is a LIDAR multi-return dataset where there may be multiple z-values for each 'x' or 'y' coordinate.
Elevation types are discussed below:
  1. Orthometric height is the height above the geoid as measured along the plumbline between the geoid and a point on the Earth's surface, taken positive upward from the geoid. It is obtained from conventional differential levelling where the survey instruments are leveled to the local direction of gravity.
  2. Ellipsoid height is the height obtained from GPS surveys,  prior to corrections for the undulation of the geoid. Ellipsoid heights are independent of the local direction of gravity.
  3. Other types of elevation
    1. Bathymetric depth is measured in terms of water depths expressed as positive numbers downward, below the tidal datum.
    2. Geoid height is the difference between the ellipsoid height and the orthometric height at a specified location. It equals the undulation of the geoid.

Errors in GIS

Editing is an important operation during the process of database creation. GIS provides tools for examining coverages for mistakes. Errors may be introduced either from the original data source or during encoding. The process of detecting and removing errors by editing is called cleaning. Errors may be classified into:

  1. Entity errors
  2. Attribute errors and
  3. Entity-Attribute agreement error
Entity error may be of the following three types:
  1. Missing entities
  2. Incorrectly placed entities and
  3. Disordered entities
GIS systems should perform an operation to build topology after digitization process. Topology provides explicit information about relationships between entities in the database. This allows identification of types of errors in digital maps. Some of these errors are indicated by 'text-based error flags' while others might have to be interpreted by visually inspecting the database statistics. Six common types of errors are listed below:
  1. Not all entities that should be present are entered
  2. Extra entities have been digitised
  3. Not all entities are in the right place or are of the correct shape or size
  4. Some entities that are not supposed to be connected to each other are connected
  5. Not all polygons should have a single label point to identify them
  6. No entities should be outside the boundary identified by registration marks
A procedure for comparing the digitized entities and original map document is to produce a monitor display or a hard-copy plot of the pre-edited database . GIS sometimes provide several symbols that indicate errors.

Below is a detailed description of specific types of errors that are related to the six general types.
  • Nodes are often described as 'to nodes' and 'from nodes' and indicate overall extent of a line feature. Nodes should not occur at every line segment along a line or a polygon. Nodes should not be used to indicate points between line segments to indicate directional changes in line. Thus, false nodes or pseudo nodes is the first form of error that occurs when a line connects with itself (island pseudo node). Nodes are used to identify the existence of an intersection between two streets or a connection between two features (Eg: Stream and a Lake). GIS should flag existence of pseudo nodes using a graphic symbol. Pseudo nodes are NOT errors but only flags indicating the presence of potential problems. Pseudo nodes occur often due to pushing the wrong button on the digitizing puck. Incorrect nodes can be corrected by either selecting individually and deleting or by adding nodes where needed to convert an island to a polygon that is attached to other polygons.
  • A dangling node is a single node that is connected to a single entity. Dangle nodes can result from:
      • failure to close a polygon
      • failure to connect the node to the object (undershoot) or
      • going beyond the entity (overshoot)
    • sometimes, the problem is due to incorrect placement of digitizing puck or small setting the fuzzy tolerance distance.
  • In case of dangling nodes, a good practice is to overshoot than to undershoot as it is easier to find overshoots than undershoots.
  • The two types of errors that can occur in case of label points in polygons are:
      • Missing labels and
      • Too many labels
    • Both the above errors are caused due to failure to keep track of the digitizing process. In most cases, the error is caused due to:
      • Confusion
      • Disruption in digitizing process or
      • Fatigue
    • Such errors are easy to find and are indicated by a graphic device distinguishing them from other error types.
    • Editing is simply adding label points where necessary and deleting label points wherever they are not required.
  • Another common form of digitising error while using vector data model is treating each polygon as a separate entity resulting in digitising adjacent lines between polygon more than once. Failure to place the digitizing puck exactly at the correct location for each point along the line results in a series of tiny graphic polygons called SLIVER POLYGONS. Sliver polygons can also occur due to overlay operations.
    • The easy way to avoid sliver polygons during input is to use a GIS that does not require digitising the same line twice. If a line is digitized twice, the presence of a dangling node confirms this and the redundant line can be removed eliminating the problem.
    • Finding slivers in the absence of dangling nodes is very difficult. One way is to compare the number of polygons in the digital coverage with that of the original input map.
  • Another problem related to polygons is the production of weird polygons.Weird polygons are polygons with missing nodes. Frequently, this error is caused due to digitizing a point in the wrong place or in the wrong order.
    • A simple way to avoid this problem is to number the input points or by establishing a set pattern for the digitising polygons. 
After editing the coverage to correct the errors the GIS software will have to rebuild the topological structure on the basis of new entities. Finally, the new coverage must be saved.

Diagram of a Networked Database Structure

A Network Database Structure based on a Simple Map (Bernhardsen, 1992)

A heirarchical database structure based on a simple map (Bernhardsen, 1992)

A heirarchical database structure based on a simple map (Bernhardsen, 1992)

Sunday, October 11, 2015

Diagram showing vector topology

Diagram depicting vector and raster representations of map features and their linkage to the attribute database

Diagram differentiating between vector and raster representation of map features and their linkage to the 'attribute database'

Diagram depicting linkage between Spatial data and Attribute data in a coverage

Diagram showing the conceptual data model

Map annotations


Annotation is a note added to comment or explain. In case of maps, it would be hard to understand the information conveyed by them without proper annotations. 
Annotation stored in the map document is referred to as map document annotation or graphic annotation or map annotation.
There are two types of map document annotation.
-Data frame annotation and
-Page text or graphic text

Data frame annotation is stored in annotation groups
  • Text is stored in annotation groups which is a part of data frame
  • Each piece of text stores its own symbol. If there are too many pieces of text in the map, the files become very large and give poor performance. In this case, geodatabase is a good alternative.
  • Each annotation group has its own reference scale which together with its symbol size determines the size of text. The annotation group reference scales are editable
  • Data frame annotation is never feature-linked. Only geo-database annotation can be feature-linked
Page annotation
Page annotation is the same as data frame annotation except for the following points:
  • Text is stored in page space instead of geographic space (page coordinates are used instead of geographic coordinates).
  • Page annotation is visible only in the 'layout' view
  • Newly placed text is placed on the page when in 'layout' view and no data frame has focus.

Saturday, October 10, 2015

Conflation in GIS

Conflation is the action of unifying two distinct datasets into a new dataset. This may be relatively easy to extremely difficult depending upon the complexity of representation and the size and quality of datasets involved.

Conflation terminology:

'Matching' is the activity of identifying features or data elements that represent the same real-world entity.

'Alignment' describes the degree to which the two features or data elements have coincident geometry

'Adjustment' is the alteration of geometry or attributes of matched features to align them.

The 'reference dataset' is the one which is to be conflated to. It is of greater spatial accuracy than the 'subject dataset'

The 'subject dataset' is the dataset to be matched or adjusted.

Conflation problems can be classified into:
-Vertical and

Horizontal conflation is the process of eliminating discrepancies along the common boundary of datasets that are adjacent to one another. These include datasets containing data from same feature classes. For example, aligning boundaries of adjacent coverages or edge-matching neighbouring networks.

Vertical conflation involves matching or eliminating discrepancies between datasets that occupy the same area in space. For example, road network matching between two representations of roads in the same region.
Two important types of vertical conflation are:
-Version matching and
-Feature Alignment

In version matching, the input datasets consist of different versions of the same features. The conflation process helps identify matching features. Attributes are transferred between matched features and unmatched features are transferred completely. For example, matching different versions of road networks for the same geographical area.

In feature alignment, the input data consists of features from two or more different feature classes that have some defined relationship to each other. The conflation process is aimed at removing discrepancies between datasets that falsify this relationship. For example, geometric alignment. A specific example in this context is that of aligning boundaries of different kinds of feature classes such as municipal districts

Internal conflation involves resolving features or element within a single dataset. For example, coverage cleaning may require removal of overlaps in polygons within a coverage.

The process of conflation can be broken down into the following sub tasks:
i. Data pre-processing: This step normalizes the datasets and ensures that they are compatible. This may involve format translation and other basic preparation of the datasets. An example of data pre-processing is to ensure that the datasets must have the same coordinate system.
ii. Data quality assurance: In this step, the internal consistency of the datasets is verified and improved if necessary. Sometimes, conflation tasks require that datasets have an internal level of consistency. For example, coverage alignment algorithms require that the input datasets are a clean coverage.
iii. Dataset alignment: In case the datasets are mis-aligned, an initial alignment process is required to carry-out precise conflation. This alignment is coarse-grained in nature and does not align individual features.
iv. Feature matching: This step involves matching of common features between datasets. After this phase is performed, the discrepancies between datasets would have been identified and can be visualised. It is used to provide statistical summaries of data quality.
v. Geometry alignment removes discrepancies between geometries
vi. Information transfer involves updating one dataset with information from the other. This information can be either attributes or geometry to be added to an existing feature or entire features to be added to the dataset.


Monday, October 5, 2015

Object Structural Model in GIS

DBMS (Data Base Management System) play an important role in GIS by forming a link between spatial and aspatial data. DBMS provide access to data to multiple users at the same time, prevent loss of data and provide security access. Currently available DBMS are generally based on the hierarchical model,  network model, relational model or their derivatives. Columns of the table are called attributes and all the values of an attribute describe the set of all possible values. Rows are also called records, tuples or relation elements.
However, the data model on which most relational DBMS are based do not meet the requirements of modern GIS. GIS integrate data from a variety of sources into a single homogeneous system and hence need powerful and flexible data models to meet the requirements of multiple tasks. For example:

  1. Sophisticated treatment of real world geometry
  2. Representation of data at different conceptual levels of resolution and detail
  3. management of history and versions of objects
  4. Combinations of measurements of different resolutions and accuracy

Remotely sensed data

GIS and Remote Sensing are linked by remotely sensed data, mostly in the form of aerial or satellite images of a specified piece of land are used as input into GIS for manipulation and analysis. This in turn is presented to policy makers for the formulation of policies in the context of development or conservation of resources.

For many applications, remote sensing can be used effectively and efficiently to update GIS data layers. These updated layers in GIS can be used to improve the interpretability and information extraction potential of remotely sensed data.

GIS users require timely input data to optimise their systems for analysis and decision making. Usually, raster data is preferred. GIS users require multiple multiple images from different regions of the electromagnetic spectrum and/or different dates at scales ranging from local to global. Moreover, the data might have a variety of spatial, spectral and temporal resolutions.

Maps are compiled using photogrammetric techniques to process remotely-sensed data and these maps form the base upon which GIS applications are achieved. Remotely sensed data are also used to measure several environmental parameters like: surface and cloud top reflectances, albedo, soil and snow water content, fraction of photosynthetically active radiation, areas and potential yield of crop types, height and density of forest stands, etc. Such data mapped and/or monitored over time form the basis for monitoring.

Environmental planners, resource managers and public-policy decision makers are employing remotely-sensed data within the context of GIS to improve the management of resources.

Remote sensing data are acquired by aerial camera systems and a variety of active and passive remote  sensor systems operating at wavelengths throughout the electromagnetic spectrum. Data acquired by aerial camera systems can be scanned, converted into digital format and input into GIS.

Most satellite scanners are typically electro-mechanical scanners, linear devices or imaging spectrometers that operate in either a 'sweep' (LANDSAT) or 'pushbroom' (SPOT) mode. These are passive systems that record solar radiation reflected from Earth's surface.

Data derived from multi-spectral scanners can provide information on, vegetation types, its distribution and condition, geomorphology, soils, surface waters and river networks.

Short Wave Infra-Red (SWIR) sensors record emitted energy from surfaces and have been particularly useful for monitoring fires and studying areas of geothermal and volcanic activity. Thermal or Long Wave Infra Red (LWIR) sensors are used for mapping ocean temperatures and study of the dynamics of ocean waters and currents. Thermal maps are used to monitor urban areas, industrial sites, manufacturing centers and agricultural fields.

Active systems operating in the visible spectrum use laser technologies (Ex: LIght Detection And Ranging Systems (LIDARS)) mainly for oceanographic and forestry applications. Regardless of the wavelengths they use, active systems DO NOT depend on the sun for image capture.

The following is a list of satellite based sensors currently providing operational raster remotely sensed data for GIS developers and users:

  1. LANDSAT programme: -
      1. It has provided coverage of Earth for almost 25 years
        1. It is the result of NASA Earth Resources Survey Program and several other U.S. government agencies
        2. It was originally known as Earth Resources Technology Satellite in 1972.
        3. Four additional satellites have been placed in orbit since 1972 for providing continuous data for use in a wide range of environmental applications
        4. The first three LANDSATs had a Multi Spectral Scanner (MSS) as the primary sensor while the next two had a high resolution scanner called the Thematic Mapper (TM)
        5. MSS had a spatial resolution of 80m and images in the visible and near-infra Red region while the TM had a resolution of 30m and images in the visible and thermal infra-Red band.
        6. The LANDSAT programme established the operational viability of space-based remotely sensed data
    1. Satellite Pour I'Observation de la Terre:-
        1. SPOT is an operational, commercial remote sensing programme that operates on an international scale. 
        2. SPOT satellites are owned and operated by a french space agency, Centre National d'Etudes Spatiales (CNES). 
        3. Three SPOT satellites have been placed into orbit since 1986.
        4. The mission objectives for SPOT are:
          1. providing remotely-sensed data suited for land cover, agriculture, forestry, geology, regional planning and cartography applications.
        5. Data from the High Resolution Visible sensor (HRV) provide both multispectral coverage with 20m spatial resolution and panchromatic imagery with 10m resolution. This data is particularly well suited for urban and cartographic applications.
    2. Advanced Very High Resolution Radiometer:-
        1. The AVHRR sensor is carried aboard the USNOAA's (United States National Oceanic and Atmospheric Administration's) Polar Orbiting Environmental Satellites (POES)
        2. This program was established to provide data for use in meteorological applications.
        3. The daily coverage provided by AVHRR has resulted in the data being used for many operational land mapping and monitoring programs
        4. AVHRR data are multispectral and the data have a resolution of 1.1km at nadir and an orbital swath of 2600 km.
    3. Marine Observation Satellite
    4. Japanese Earth Resources Satellite
    5. India Remote Sensing Satellite
    6. European Resource Satellite
    8. High Spatial Resolution Satellites

    Conversion of existing digital data

    An important technique that is becoming increasingly popular for data input is the conversion of existing digital data. A variety of spatial data, including digital maps, are openly available from a wide range of government and private sources. The most common digital data to be used in a GIS is data from CAD systems. A number of data conversion programs exist, mostly from GIS software vendors, to transform data from CAD formats to a raster or topological GIS data format. Several adhoc standards for data exchange have been established in the market place. These are supplemented by a number of government distribution formats that have been developed. Due to the wide variety of data formats, most GIS vendors have developed and provide data exchange/conversion software to go from their format to those considered common in the market place.

    Most GIS software vendors also provide an ASCII (American Standard Code for Information Interchange) data exchange format specific to their product, and a programming subroutine library that allows users to write their own data conversion routines to fulfil their own specific needs. As digital data becomes more readily available this capability becomes a necessity for any GIS. Data conversion from existing digital data is not a problem for most technical persons in the GIS field. However, for smaller GIS installations who have limited access to a GIS analyst this can be a major problem in getting a GIS operational. Government agencies are usually a good source for technical information on data conversion requirements.

    Some of the data formats common to the GIS marketplace are listed below.

    IGDS - Interactive Graphics Design Software (Intergraph / Microstation)

    This binary format is a standard in the turnkey CAD market and has become a de facto standard in the mapping industry. It is a proprietary format, however most GIS software vendors provide DGN translators.

    DLG - Digital Line Graph (US Geological Survey)

    This ASCII format is used by the USGS as a distribution standard and consequently is well utilized in the United States. It is not extensively used even though most software vendors provide two way conversion to DLG.

    DXF - Drawing Exchange Format (Autocad)

    This ASCII format is used primarily to convert to/from the Autocad drawing format and is a standard in the engineering discipline. Most GIS software vendors provide a DXF translator.

    GENERATE - ARC/INFO Graphic Exchange Format

    A generic ASCII format for spatial data used by the ARC/INFO software to accommodate generic spatial data.

    EXPORT - ARC/INFO Export Format .

    An exchange format that includes both graphic and attribute data. This format is intended for transferring ARC/INFO data from one hardware platform, or site, to another. It is also often used for archiving.

    ARC/INFO data. This is not a published data format, however some GIS and desktop mapping vendors provide translators. EXPORT format can come in either uncompressed, partially compressed, or fully compressed format

    A wide variety of other vendor specific data formats exist within the mapping and GIS industry. In particular, most GIS software vendors have their own proprietary formats. However, almost all provide data conversion to/from the above formats. As well, most GIS software vendors will develop data conversion programs dependant on specific requests by customers. Potential purchasers of commercial GIS packages should determine and clearly identify their data conversion needs, prior to purchase, to the software vendor.

    Cartographic database

    Cartographic database:
    A database containing x-y coordinates defining a geographical area. When combined with other data (such as any of a wide variety of variables, such as income distribution, age, etc.), a cartographic database can be used to map the distribution of that variable within a geographical region.

    Digital Elevation Data

    Digital Elevation Data (DED) consists of an ordered array of ground elevations at regularly spaced intervals. The digital data for DED is extracted from the hypsographic and hydrographic elements of the various scaled positioned data acquired from the region in question.

    Data compression

    Data compression:
    Data compression in GIS refers to the compression of geospatial data so that the volume of data transmitted across networks can be reduced. A properly choosen compression algorithm can reduce data size upto 5 - 10% of the original image and 10 - 20% for vector and text data. Such compression ratios could result in significant performance improvement.
    Data compression algorithms can be classified into lossless and lossy . Lossless compression algorithms are those where the bit streams can be recovered to original data. These algorithms should be used if loss of even a single bit may cause serious and unpredictable consequences.
    Lossy compression algorithms should be used where a certain level of distortion can be tolerated. They are used to achieve a higher level of compression.
    Examples of lossless compression algorithms are:
    -Huffman coding
    -Arithmetic coding
    -Lempel-Ziw Coding (LZC) and
    -Burrows-Wheeler Transform (BWT)

    Examples of lossy compression algorithms are:
    -Differential Pulse Coded Modulation (DPCM)
    -Transform Coding
    -Subband Coding and
    -Vector Quantization

    Data compression refers to the process of reducing the size of a file or database. Compression improves data handling, storage, and database performance.
    Examples of compression methods include quadtrees, run-length encoding, and wavelets.

    Typically, a GIS software refers to data compression as a process that removes unreferenced rows from geodatabase system tables and user delta tables. Compression helps maintain versioned geodatabase performance.