Monday, October 12, 2015

Raster data analysis


RASTER DATA ANALYSIS
  • Raster data analysis is based on cells and rasters
  • Raster data analysis can be performed at the level of individual cells, or a group of cells, or cells within an entire raster
  • The type of cell value is an important aspect of raster data analysis
  • Various types of data are stored in raster format
  • Raster data analysis is software specific-raster data. Hence in order to use Digital Elevation Models(DEMs), Satellite images and other raster data in data analysis, they must first be processed and imported to software-specific raster data
  • The core of raster data analysis comprises of local operations that are cell-by-cell operations.
  • A local operation can create new raster from a single input raster or multiple input raster.
  • Converting a floating point raster to an integer raster is a simple local operation
  • Converting a slope raster measured in percent to one measured in degrees is also a local operation
  • Reclassification is a local operation that creates a new raster by classification. Reclassification is also called as recoding or transforming
  • Local operations with multiple rasters are also known as compositing, overlaying or superimposing maps
  • Map algebra is defined as local operations with multiple input rasters.
  • Local operations compute an output raster dataset where the output value at each location is a function of the value associated with that location on one or more raster datasets
  • Focal or Neighbourhood functions produce an output raster dataset in which output value at each location is a function of the value at a location and the value of cells in a specified neighbourhood around that location
  • Zonal functions compute an output raster dataset where the output value for each location depends on the value of the cell at the location and the association that the location has within a cartographic zone.
  • Global functions compute an output raster dataset in which the output value at each cell location is potentially a function of all the cells in the input raster datasets.
    • The two types of global functions are:
      • Euclidian distance and
      • Weighted distance
  • The raster calculator provides powerful tools in algebra syntax to perform mathematical calculations using operators and functions. It can also be used to set-up selection queries or type spatial analyst function syntax. Inputs can be grid datasets or raster layers, shape files, coverages, tables, constants and numbers.
  • The raster calculator provides four groups of mathematical functions: Logarithmic, Arithmetic, Trigonometric and Exponential.
  • The raster calculator allows boolean, relational and arithmetic operators
  • Raster data analysis also involves:
    • Terrain analysis
      • Contour
      • Slope
      • Aspect
      • Hillshade
      • Viewshed
      • Cut/Fill
    • Hydrologic analysis
      • Basin
      • Fill
      • Flow accumulation
      • Flow direction
      • Flow length
      • Sink
      • Snap pour point
      • Snap pour
      • Stream link
      • Stream order
      • Stream to feature
      • Stream shape
      • Watershed
  • Reclassification is replacing input cell values with new output cell values. The reasons to reclassify data are listed below:
    • To replace values based on new information
    • To group certain values together
    • To reclassify values to a common scale
    • To set specific cells to "NoData" or set "NoData" cells to a value

GIS and Knowledge based systems

GIS - KNOWLEDGE BASED SYSTEMS
A knowledge based system is a computer program that reasons and uses a knowledge base to solve complex problems. The common theme that unites all knowledge based systems is an attempt to represent knowledge explicitly via tools and rules as opposed to the methodology commonly adopted by conventional computer programs. A knowledge based system has two sub-systems:

  1. A knowledge base and
  2. An Inference engine
Knowledge Based Systems (KBS) were first developed by Artificial Intelligence (AI) researchers.
Expert systems or Knowledge Based Systems try to simulate the human decision making process manipulating data and knowledge through programmed logic. KBS functions can be integrated within a GIS based spatial decision support system as a knowledge and model base.

Coupling a KBS enables a GIS to be knowledgeable about its constraints and potential as well as become aware of the data source, context and use. Treating all data structures as component of an object provides a facility for flexible analysis of data by KBS and GIS.

The coupling takes the form of exchanging files between GIS and a model where input, display and output stages are the only uses of the GIS capability. The knowledge and mapping techniques are encoded in the schematic inquiry of the KBS to follow through the decision making process involved in mapping and prediction. 

The content and priorities of expert routines and their rules represent knowledge. Their procedures and rules form the basis of their action to achieve the goal. The priorities represent when to initiate a goal satisfying routine.

The effectiveness of communication to the user of GIS coupled with KBS for describing a resource would be greatly enhanced by the use of multimedia facilities. 

Finally, the representation of a specialized field of knowledge using Knowledge Based (KBS) approach provides a means of building linkages between skills of the subject specialist, spatial analyst and the needs of the user. 

Thus, this linkage helps provide answers to questions not easily answered by one discipline on its own.

Classification of GIS models

CLASSIFICATION OF GIS MODELS

GIS models have been classified by purpose, methodology and logic although the boundary of their classification criteria has not been clear.
  1. A GIS model may be descriptive or prescriptive: A descriptive model describes existing conditions of spatial data whereas a prescriptive models predicts what the conditions could be or should be. As an example, a vegetation map represents a descriptive model as it shows existing vegetation, while a potential natural vegetation map represents a prescriptive model as it predicts the site that could be used for vegetation without disturbance.
  2. A GIS model may be deterministic or stochastic: Both deterministic and stochastic models are mathematical equations represented with parameters and variables. While a stochastic model considers presence of some randomness in one or more of its parameters, a deterministic model does not. Hence the predictions of a stochastic model can have a certain amount of error.
  3. A GIS model can be static or dynamic: A dynamic model emphasizes the changes of spatial data and interaction between variables whereas a static model deals with the state of spatial data at a given time. Time is important to show the process of changes in a dynamic model. Simulation is a technique that can generate different states of spatial data over time. Many environmental models such as water distribution have been effectively understood using dynamic models
  4. A GIS model may be deductive or inductive: A deductive model represents the conclusion derived from a set of premises. These premises are often based on scientific theories or physical laws. An inductive model represents the conclusion derived from empirical data or observations. For example, to assess the damage potential of a flood, a deductive model based on the laws of hydrology, geology, etc may be used or an inductive model based on recorded data from past floods can relied upon.
  5. A Binary Model is a GIS model that uses logical expressions to select features from a  composite feature layer or multiple rasters
  6. Index model is a GIS model that uses the index value calculated from a composite feature layer or multiple rasters to produce a layer with ranked data
  7. A process model is a GIS model that integrates existing knowledge into a set of relationships and equations for quantifying the physical processes
  8. A Regression model is a GIS model that uses a dependent variable and a number of independent variables in a regression equation for prediction or estimation

Digital Elevation Data

Digital Elevation Data may refer to a digital elevation surface or elevation type as discussed below:

Digital elevation surface is one among the following five types:

  1. Digital Surface Model (DSM) is the highest reflective surface of ground features captured by the sensor. It includes, tree tops, roof tops, tops of towers, telephone poles and other natural or man-made features. Photogrammetry, IFSAR, LIDAR and sonar can provide this type of surface however, accuracy and resolution my vary with technology.
    1. With sonar, DSM may include sunken vessels and other artifacts whereas the bathymetric surface reflects the underwater terrain
    2. With photogrammetry, LIDAR and IFSAR reflective surface includes passing cars and trucks not normally considered a part of digital terrain model.
  2. Digital Terrain Model (DTM) or Bare Earth Surface represents the surface of "bare-earth" after removal of ALL vegetation and man-made features. Such surfaces are generally called Digital Terrain Models (DTMs). Photogrammetry has traditionally generated DTMs when elevations are generated by manual compilation techniques. Unless specified, bare-earth surface includes top surface of water bodies.
  3. Bathymetric surface represents the submerged surface of underwater terrain
  4. Mixed surface is a hybrid of the above surface types. For example, coastal studies may require a DTM of the bare-earth surface merged with the bathymetric surface. Multiple surface representations are useful for a number of applications like, forest inventory studies that require a vegetation surface and a bare-earth surface for the same site.
  5. Point cloud elevation file is a raw data file containing single points with multiple elevations (Three dimensional point samples). An example of a point cloud file is a LIDAR multi-return dataset where there may be multiple z-values for each 'x' or 'y' coordinate.
Elevation types are discussed below:
  1. Orthometric height is the height above the geoid as measured along the plumbline between the geoid and a point on the Earth's surface, taken positive upward from the geoid. It is obtained from conventional differential levelling where the survey instruments are leveled to the local direction of gravity.
  2. Ellipsoid height is the height obtained from GPS surveys,  prior to corrections for the undulation of the geoid. Ellipsoid heights are independent of the local direction of gravity.
  3. Other types of elevation
    1. Bathymetric depth is measured in terms of water depths expressed as positive numbers downward, below the tidal datum.
    2. Geoid height is the difference between the ellipsoid height and the orthometric height at a specified location. It equals the undulation of the geoid.

Errors in GIS

Editing is an important operation during the process of database creation. GIS provides tools for examining coverages for mistakes. Errors may be introduced either from the original data source or during encoding. The process of detecting and removing errors by editing is called cleaning. Errors may be classified into:

  1. Entity errors
  2. Attribute errors and
  3. Entity-Attribute agreement error
Entity error may be of the following three types:
  1. Missing entities
  2. Incorrectly placed entities and
  3. Disordered entities
GIS systems should perform an operation to build topology after digitization process. Topology provides explicit information about relationships between entities in the database. This allows identification of types of errors in digital maps. Some of these errors are indicated by 'text-based error flags' while others might have to be interpreted by visually inspecting the database statistics. Six common types of errors are listed below:
  1. Not all entities that should be present are entered
  2. Extra entities have been digitised
  3. Not all entities are in the right place or are of the correct shape or size
  4. Some entities that are not supposed to be connected to each other are connected
  5. Not all polygons should have a single label point to identify them
  6. No entities should be outside the boundary identified by registration marks
A procedure for comparing the digitized entities and original map document is to produce a monitor display or a hard-copy plot of the pre-edited database . GIS sometimes provide several symbols that indicate errors.

Below is a detailed description of specific types of errors that are related to the six general types.
  • Nodes are often described as 'to nodes' and 'from nodes' and indicate overall extent of a line feature. Nodes should not occur at every line segment along a line or a polygon. Nodes should not be used to indicate points between line segments to indicate directional changes in line. Thus, false nodes or pseudo nodes is the first form of error that occurs when a line connects with itself (island pseudo node). Nodes are used to identify the existence of an intersection between two streets or a connection between two features (Eg: Stream and a Lake). GIS should flag existence of pseudo nodes using a graphic symbol. Pseudo nodes are NOT errors but only flags indicating the presence of potential problems. Pseudo nodes occur often due to pushing the wrong button on the digitizing puck. Incorrect nodes can be corrected by either selecting individually and deleting or by adding nodes where needed to convert an island to a polygon that is attached to other polygons.
  • A dangling node is a single node that is connected to a single entity. Dangle nodes can result from:
      • failure to close a polygon
      • failure to connect the node to the object (undershoot) or
      • going beyond the entity (overshoot)
    • sometimes, the problem is due to incorrect placement of digitizing puck or small setting the fuzzy tolerance distance.
  • In case of dangling nodes, a good practice is to overshoot than to undershoot as it is easier to find overshoots than undershoots.
  • The two types of errors that can occur in case of label points in polygons are:
      • Missing labels and
      • Too many labels
    • Both the above errors are caused due to failure to keep track of the digitizing process. In most cases, the error is caused due to:
      • Confusion
      • Disruption in digitizing process or
      • Fatigue
    • Such errors are easy to find and are indicated by a graphic device distinguishing them from other error types.
    • Editing is simply adding label points where necessary and deleting label points wherever they are not required.
  • Another common form of digitising error while using vector data model is treating each polygon as a separate entity resulting in digitising adjacent lines between polygon more than once. Failure to place the digitizing puck exactly at the correct location for each point along the line results in a series of tiny graphic polygons called SLIVER POLYGONS. Sliver polygons can also occur due to overlay operations.
    • The easy way to avoid sliver polygons during input is to use a GIS that does not require digitising the same line twice. If a line is digitized twice, the presence of a dangling node confirms this and the redundant line can be removed eliminating the problem.
    • Finding slivers in the absence of dangling nodes is very difficult. One way is to compare the number of polygons in the digital coverage with that of the original input map.
  • Another problem related to polygons is the production of weird polygons.Weird polygons are polygons with missing nodes. Frequently, this error is caused due to digitizing a point in the wrong place or in the wrong order.
    • A simple way to avoid this problem is to number the input points or by establishing a set pattern for the digitising polygons. 
After editing the coverage to correct the errors the GIS software will have to rebuild the topological structure on the basis of new entities. Finally, the new coverage must be saved.

Diagram of a Networked Database Structure

A Network Database Structure based on a Simple Map (Bernhardsen, 1992)

A heirarchical database structure based on a simple map (Bernhardsen, 1992)

A heirarchical database structure based on a simple map (Bernhardsen, 1992)