IAT: Methodology

Methodology Overview

Since its inception in 1997 the Mountain Legacy Project (then called the Bridgland Repeat Photography Project) has used historic images and matching modern repeat photos to examine landscape level change in the Canadian cordillera. Analysis has been both qualitative and quantitative and has played a significant role in a diverse array of research projects and papers. See for example an overview of the resource by Trant et al., [1], use of historic/modern images in estimating landscape composition by Fortin et al., [2]; use of repeat photography in assessing land cover change by Rhemtulla et al., [3].

In 2015 a group of computer scientists attached to MLP began to look through this diversity of research to see if it was possible to develop a suite of image analysis tools that would help MLP team members better pursue their work. The research question the computer scientists asked was as follows: “What kind of visualization tools will help researchers explore, interpret, and discover meaning in the images”? The first step in answering this question was the the Image Analysis Toolkit (IAT). It underpins much of the methodology discussed here. An overview of its features was published by Sanseverino et al. in Mountain Research and Development in 2016 [4]. IAT development context as of 2016 can be seen in Figure 3. The area highlighted in light green represents IAT development pre-Landscapes in Motion (LiM).

Resources and collaboration facilitated by LiM afforded MLP the opportunity to work on building out IAT into an end-to-end process that would allow the following:

  • Ability to apply user-initiated landscape classification categories to the images;
  • Generate landscape classification masks from the images;
  • Visualize change between classification masks and their associated images;
  • Quantify change within classification masks between images;
  • Quantify the actual “on-the-ground” extent of each class in a given image(e/g/, square metres, hectares);
  • Produce a georectified viewshed of the landscape classification mask for a given image.

Other guiding concepts for IAT development encompass the following:

  • Support for an end-to-end process for MLP’s unique workflow;
  • Be entirely accessible to researchers from all mountain study areas – hence a commitment to web-based development;
  • Work with the strengths of MLP’s oblique image collection rather than converting immediately to rectified images;
  • Be adapted to the finest resolution MLP’s image collection could provide.

Figure 3 is a process diagram showing how IAT can be used to take an initial historic/modern image pair from segmenting into landscape classification masks though to creating georeferenced viewsheds of those masks. Such viewsheds are suitable for incorporating into a GIS for further spatial analysis.

Figure 3: Process diagram showing the end-to-end Image Analysis Toolkit (IAT) methodology.

Three processes are key to using the IAT research tool: Image Alignment, Build Virtual Photo, and Create Georeferenced Masked Viewshed. Explanations and examples of all three are available from the tabs at the top of the page.


Align Image Pairs Process

In order to compare an historic photo with its modern repeat the images have to be as accurately aligned as possible. Therefore, the process of image alignment is at the heart of almost all Mountain Legacy analysis. And the key assumption behind any alignment is that matching control points selected in each of the images do indeed accurately describe the same location in the both photos. If this is true, then a mathematical model can be used to transform the images into an aligned pair. In IAT this foundational process is realized using one of two different models: an affine transformation model or a perspective transformation model.

Both affine and perspective transforms imply that straight lines stay straight,and planes stay planar. In an affine transform parallel lines maintain their parallel nature and length ratio. A perspective transform allows parallel lines to converge. The result of an affine transform on a rectangular image will always be a parallelogram, while a perspective transform yields the more generalized quadrilateral. Figure 4 gives an overview of the difference between the two.

Figure 4: Illustrating the generalized differences between affine and perspective transformations. Adapted from graphicsmill.com.

As can be seen in Figure 4, three points are required for an affine transform. In IAT three control points in an historic/modern pair will automatically generate an affine transformation. Four or more control points will result in a perspective transform. Video 1 below illustrates a four point perspective transform.

If more than four points are selected IAT minimizes the root-mean-square error (RMSE) among combinations of four points. The result is the “best” four points. The perspective transformation is  performed using these points.

The same process and assumptions are used when aligning a selected image to a Virtual Photo (VP) of the same scene. But, if the alignment is poor, the camera location of the VP can be adjusted to more perfectly align it to the same location as the actual photo was taken from.

Video 1: Perspective alignment transformation at work in the Image Analysis Toolkit – an example from the
upper Elbow watershed, SW Alberta.

At this point landscape classification and land cover mask creation can take place on two aligned images. Aligned images allow land cover masks to be effectively compared and analyzed. Examples of this type of comparison at use can be seen in Fortin et al. [2] and Taggart-Hodge [5]. Gallery 1 below illustrates this with the historic / modern image pair used in Video 1: South Quirk Station – historic photo by Arthur Wheeler, 1897 / modern photo by the Mountain Legacy Project, 2014.

Gallery 1: Land cover masks built and applied to an aligned South Quirk historic/modern image pair. Visit
explore.mountainlegacy.ca to see this gallery in geographic context.

The next step in the process is building a Virtual Photograph (VP) for a selected image. Once a VP is built and correctly aligned with the actual photo, land cover masks from the selected photo can be overlaid on the VP and used to create georeferenced viewsheds. The tabs at the top of the pageVirtual Photo and Viewshed – describe these processes and continue with the South Quirk example given here.


Build Virtual Photo Process

The Virtual Photo (VP) is a shaded 3D representation of the terrain in a given photograph – a “virtual” representation of the photograph. The more closely the location metadata agrees with the true camera location, the more accurately the VP represents the actual image.

The Build Virtual Photo Process is one of the georeferencing components in IAT. Building a VP depends upon accurate location metadata. This includes the camera location in UTM coordinates, the azimuth (direction in degrees) that the camera is pointed, and the Field of View (FoV) encompassed by the camera. Fine-tuning the VP is often required as standard GPS field units can be within +/- 5 metres of accuracy. Such adjustments include moving the camera position, tilting (inclining) the camera up or down, and rotating (rolling) the camera side to side. When a final VP is produced, information about the height and width size (in pixels) and aspect ratio of the actual photo is also required.

Any given Virtual Photo is based upon a Digital Elevation Model (DEM) of the landscape the actual photo looks out into. The DEMs used in the LiM project are created from 2 metre resolution lidar-derived bare earth Digital Terrain Models (DTM), courtesy of Alberta Environment and Parks. The appropriate DTMs are mosaiced into a DEM  using a GDAL (Geospatial Data Abstraction Library) component from the Open Source Geospatial Foundation. A hillshade model along with an algorithm designed to look “forward” (the Forward Algorithm – FA) along rays of light emanating from the camera is applied to the DEM. This assists in making a more easily recognizable Virtual Photo of the landscape represented in the actual image.

Figure 5 gives an example of a VP build, including the historic image it was built to represent.

Figure 5: The VP is a perspective view of a Digital Elevation Model with hillshading applied via the Forward Algorithm. The view depends on accurate camera location metadata including UTM coordinates, FoV, and azimuth. Elevation is also required, but given a camera location in conjunction with the DEM, IAT can derive elevation.

Forward Algorithm
The more detailed and realistic the VP, the more accurate its alignment with the actual image. The georeferencing component in IAT implements the Forward Algorithm (FA) to achieve detail that is accurate to the resolution of the underlying DEM. The FA is notable in that it makes a viewshed and an image simultaneously – there is no need to preprocess a DEM first to find visible pixels.

In the following example let us assume, without loss of generality:

  • The y axis is the view axis (the direction the camera is looking);
  • There is a hill shading model of the same resolution as the DEM available (e.g. the Hillshade Model in Figure 5);
  • The camera inclination and rotation is zero degrees.

Let us further assume that the desired VP is going to be 10,000 pixels wide. The FA considers 10,000 rays emanating forward from the camera viewpoint. Taken together the rays will trace out the FoV angle as they move across the DEM. In building the Virtual Photo the FA considers each ray separately with each ray corresponding to one vertical column in the VP.

For a given ray moving forward through the DEM, we are clearly moving away from the camera viewpoint at an angle within the FoV. We move one y unit at a time, stop at each x intercept (the y-axis is the major axis, and the x-axis the minor axis), get the elevation at that point with linear interpolation along x, and then calculate the height v of the corresponding 2D pixel in the Virtual Photo.

If and only if v is higher than any previously calculated height, it is visible, and is drawn using a corresponding linear interpolation applied to our shading model. We proceed along through the DEM until we are off the DEM, or have reached a predefined maximum depth away from the viewpoint. (For the LiM project this might be 30 km, or 15000 DEM pixels.) Now, all higher image pixels in the current column of the VP (aka the vertical scan line) are coloured “sky”.

Note that when we have found a new visible pixel, we have also identified a location in the viewshed. Since there is an interpolation along x (with an integral y), we define two adjacent DEM pixels as viewshed pixels. For example, if a newly visible 3D pixel is (5.7, 38, 1456 metres), then pixels (5,38) and (6,38) are categorized as within the viewshed. This is accurate to the resolution of the DEM itself.

More work is needed when a newly drawn pixel in the VP leaves undrawn pixels beneath it. Generally speaking, this can happen in the foreground, and at very steep inclines (e.g. cliffs). This situation is easily and deftly handled by interpolating along the y direction, between the last shading value calculated for this scan line, and the current one, using these calculations to fill in the undrawn image pixels.

Alignment in IAT
Once created the VP and the actual image are now aligned in IAT. The VP is the digital artifact that ties the aligned actual photo back to the DEM. Exploiting this relationship allows IAT to generate a georeferenced viewshed of the overlay. For the LiM project the goal was to design and build a web-based process to create georeferenced viewsheds of the land cover masks. MLP researchers were successfully able to generate these viewsheds.  Land cover viewshed masks were created based on images from 1897, 1895, 1916, and 1940. They can be found under both the Locations and Resources links on this site.

In other photo georeferencing systems (i.e. WSL- Monoplotting Tool or Pic2Map) the use of an oblique image (the “human-eye” photo) and an orthogonal (“birds-eye”) view from above of the same landscape are required – see for example Bozzini et al.,[6], Stockdale et al.,[7] and [8], and Marco et al.,[9]. In preferring to stay with the oblique photographs as much as possible, IAT takes a different approach. One that reimagines early work in the field – for example the ideas of Aschenwald et al.,[10] around georectification of high oblique images, Guth and Craven’s [11] use of Digital Elevation Models (DEM) to register oblique images, and Corripio’s [12] “virtual photograph” development. As well, we look to current work – like that of Guillet et al.,[13] on camera calibration – to move forward. Not only does this match MLP’s philosophy of “letting the photographs speak”, but the user doesn’t have to make the context switch from oblique to orthogonal and back again for every pair of control points. Variations in terrain, ridge lines, and horizons are more readily apparent. And, while it is not there yet, IAT shows promise for automatically detecting control points between the VP and actual photo, perhaps using a form of edge and vertex detection. This would allow for automation in what is a somewhat time-intensive process – no matter the system used.

Once the accuracy of the VP to actual image has been established, georeferencing the viewshed can proceed. This process is described under the Viewshed tab at the top of this page.

Appendix A: Visualizing the Forward Algorithm

Although we are all immersed in the three-dimensional world around us, describing that world so it makes sense – especially inside a computer program – is not an easy task. Consider the following problem: we wish to create an accurate, computer generated shaded relief “virtual” photo of a landscape scene depicted in a photograph. The camera information (e.g. location, FoV, azimuth, elevation) is known. A DEM and corresponding hillshade model of the area encompassed by the photo is available.

A straightforward enough problem description, and, for IAT the resulting “Virtual” Photo looks to be easily produced. But, as with many things, the devil is in the details, and so we present here a deeper explanation of the way this problem is approached in MLP’s Image Analysis Toolkit. We call it the Forward Algorithm (FA) and, as displayed in Figure 6 below, it produces an accurate Virtual Photograph (VP) of the scene displayed in the actual photograph.

Figure 6: An example of a Virtual Photo match with the actual photo it is meant to represent

This VP is conceptually based on rays emanating forward from the camera viewpoint. Taken together the rays will trace out the camera’s FoV angle as they move across the DEM. In building the Virtual Photo the FA considers each ray separately with each ray corresponding to one vertical column in the VP.

Figures 7 and 8 presented below, along with accompanying discussion, illustrate some of the finer points of the Forward Algorithm.

Working With the DEM: An Overhead View

Figure 7 is an overhead view of a view triangle superimposed on a DEM. This particular triangle has the interesting property of crossing one of the four 45 degree offset lines that separate major axes. In this case, the azimuth is 124 degrees, and the field-of-view is 60 degrees. Two view rays are shown. Ray 1 is an example where the major axis is x. The forward algorithm visits DEM points a through i (and on), each one pixel apart along x. All rays in the upper shaded portion of the view triangle are treated similarly. Ray 2 has a major axis of y, so DEM points A through H (and on) are each one pixel apart along y. All rays in the lower shaded portion of the view triangle are treated similarly.

Note that if we treated Ray 1 similarly to Ray 2 – that is, with major axis y – we would be looking at actual DEM values only one third as often. For example, instead of visiting points c, d, e and f, we would instead visit only c and f. This shows the importance of choosing the correct major axis for each ray, rather than globally for an entire view triangle. Perspective projections only allow for images less than 180 degrees field-of-view, so a changeover from one major axis to another occurs at most twice.

Figure 7: Overhead view of the DEM with the view triangle overlaid.

Building the Virtual Photo: a side-on view of Ray 1

Figure 8 shows a side view corresponding to Ray 1 of Figure 7. The main triangle here is 60 degrees, but because the drawing is in a plane and not orthogonal to 3D space as defined by the DEM and its elevations, the actual vertical field-of-view of the Virtual Photo is somewhat less than 60 degrees. Points a through i are shown on a cutaway mountainous landscape. Some of these points get projected onto the corresponding vertical column in the picture plane of the Virtual Photo. In particular, points (c,d,e,f,h) all get drawn on the picture plane. But these pixels are often separated by large gaps. This is where the vertical interpolation of the Forward Algorithm comes in. The shade value of a pixel (e.g., shading 90 for point h) comes from a hillshade model corresponding to the DEM.

Figure 8: Drawing Ray 1 on the picture plane of the Virtual Photo.

Let’s see what happens as we move forward from the camera. In Figure 8, all 25 vertical column points on the picture plane, which must be all filled in, have a number corresponding to the order in which they are drawn. Point a gets projected below the picture plane, so is ignored. Point b is also too low. But note here that these two points are in the overall 360 degree viewshed of the viewpoint, although not in the viewshed as limited by the camera view.

Point c now projects onto the picture plane, so its pixel gets drawn. But there are two pixels under c that did not get drawn, so they get shade values defined by an interpolation function based on pixel values from b and c. Note that only 2 additional pixels are drawn, even though the function is defined over a delta of 4. Next, point d is projected higher than the last drawn pixel c, so its pixel is drawn, and 6 more pixels filling the gap are drawn too, based on a new interpolation function.

The next pixel, corresponding to point e, is by far the most common case. It is projected to position 11, right above position 4 (i.e., the highest previous pixel drawn). In this case, no extra interpolation work is done. With point f, a big jump up a cliff, many intervening pixels are filled in. Point g marks a new situation. Because its position is lower, it is not in the viewshed, and therefore not drawn on the picture plane. However, when we reach point h, its projection is higher than that of f, so it is drawn as pixel 21. Again, we have a steep cliff’s worth of pixels to fill in, but note that their shade values are based on an interpolation between where would have projected, and h. If the delta is 5.5 picture plane pixels, and the shade values of g and h are 112 and 90 respectively, pixels 22 and 23 get values 94 and 98 as shown. ((112-90)/5.5=4).

Note that if the FA were not quite so clever and interpolated from the last visible point f, we would not see the effect of the steep cliff being somehow behind f. Instead it would look like a continuous slope connecting f and h.

Finally, we reach point i. In this example we’ll assume all subsequent projected pixels (not “sky” pixels) are lower than pixel 21. Point i itself projects lower than pixel 21, so it is not in the viewshed. We include point i here to remind hillwalkers of their nemesis – the False Peak. (Note that the elevation of point i is higher than that of h, but remains invisible until one reaches the vicinity of h.)

Once we have concluded processing all DEM points along Ray 1, verifying that there are no high points, we are nearly done, and complete the vertical column by filling in pixels 24 and 25 with colour/shade “sky”. It is time to move to the next ray.





Create Georeferenced Viewshed Process

Creating a georeferenced viewshed builds on the relationship formed when an image is aligned with its corresponding Virtual Photo (VP). An aligned image/VP pair has a one-to-one correlation at the pixel level between the image and the VP. And, since the VP has a mathematical correlation to the DEM that was used in its creation, that same correlation can now be thought of as applying to the image.

One of the objectives of the Landscapes in Motion project was to see if MLP researchers could generate a georeferenced viewshed based on the land cover masks from historic images. The georeferencing component of IAT was able to successfully demonstrate this. Figure 9 below illustrates the process using a South Quirk 1897/2014 image pair. Although still in a beta development stage, MLP scientists have already begun applying the process to work beyond the LiM project.

The viewshed generation process turns on the one-to-one pixel relationship between the aligned land cover mask of a given image and its corresponding VP. The algorithm used in IAT to generate the viewshed uses two established information layers to create the resulting viewshed:

  • The land cover mask aligned with the VP;
  • The DEM used to create the VP;
  • The resulting viewshed layer of the land cover mask.

All of the camera and location information used to create the VP is also included when building the viewshed. This allows the algorithm in IAT to look “forward” along rays of light emanating from the camera location. These rays “light up” DEM locations that are visible in the VP. Leveraging the relationship between the VP and aligned mask allows the “forward” ray to look at the corresponding RGB pixel colour for a given location on the mask. The ray then “paints” that colour onto the viewshed layer.

Because the VP and its corresponding land cover mask are finer in resolution than the DEM, every pixel in the DEM that would be visible in the VP is covered by at least one pixel in the land cover mask. Therefore, the rays in the “forward” algorithm are going to paint every pixel in the viewshed layer at least once, and many of the pixels – in the foreground especially – will be painted multiple times. In this way the viewshed fully captures the continuity of mask and its underlying VP: if a given pixel is visible in the VP and classified in the land cover mask, it will be visible on the viewshed at the resolution of the DEM.

Figure 9: The viewshed layer is generated in the georeferencing component of IAT. It is built using a forward ray tracing algorithm that takes the colour of a pixel in the land cover mask, its corresponding location on the DEM, and “paints” it onto the viewshed in the correct location. The result is a georeferenced viewshed of a land cover that, in this example, dates back over 120 years to 1897.

Since the resolution of the DEM is known – in the case of the LiM project it is 2 metres – the actual area in square metres / hectares of each land cover class represented on the viewshed can be calculated in IAT. Opening the viewshed in IAT and running an analysis on pixels per class gives the total number of pixels in each class. Multiply the class totals by 4 (in this case) and the area in square metres is returned. Figure 10 demonstrates this by comparing South Quirk viewsheds from 1897 and 2014.

Figure 10: IAT generates a viewshed at the same resolution as the underlying DEM. In this example from South Quirk in SW Alberta the DEM is at 2 metre resolution, thus 4 square metres per pixel. Since IAT can quantify the number of pixels in a given land cover category, calculating the area is straightforward. For example, there are 3,510,198 pixels in the 1897 Conifer Class. Expressed as hectares this is approximately 1404.1 hectares (3,510,198 pixels x 4 square metres per pixel / 10,000).