GIS course @ yDiv
1 Introduction
This document is a living tutorial that mirrors the lectures of the yDiv course “An Introduction to GIS for Ecology”, held at iDiv in Leipzig. In this two day course we will cover the very basics of GIS, focusing especially on applications to ecology. During the first day, we will get to know the main data types of GIS, that is geometries and rasters. During the second day, we will learn some of the most used GIS operations. You will also start your own individual/group project. The aims of this course are to
- Familiarize yourself with basic GIS data and operations
- Start a project of your interest
- Getting some simple figures/analyses started
Do not expect to become a world-leading GIS expert by the end of the course; this is a basic introductory course (and GIS is quite hard to master).
At the end of each section, you will find multiple exercises; they are not mandatory, but they will help you. Before starting, be sure to have your machine set up properly. Also, you can download all data used in this tutorial by clicking here.
1.1 Set up
- R installed.
terra
installed.codetools
installed (optional, used byterra
to perform code checks).- Download the data archive and decompress it.
Try run library(terra)
. If it works, good news. If it does not, contact me some days before the course starts.
1.2 Data Sets
In this tutorial, we will use data sets for:
- NDVI for the year 2024 in Leipzig, obtained from Sentinel2 (
ndvi-2024.tif
). - Landcover of Leipzig in 2015, obtained from the German Aerospace Center (DLR;
landcover-2015.tif
). - Average annual surface temperature from WorldClim (
wc2-bio1.tif
) - Bioclimatic variables from WorldClim (
bios.tif
) - Digital elevation model from WorldClim (
wc2-elevation.tif
)
1.3 Lecturers
1.3.1 Emilio Berti (TiBS)
- Theory in Biodiversity Science
- Macroecologist / biogeographer
- Theoretical ecologists
1.3.2 Guilherme Pinto (BioEcon)
2 Data Types
There are two main data types used in GIS:
- Geometries (also called vectors or shapes). This usually has extension
.shp
. - Rasters. This usually has extension
.tif
.
They fill two different needs, namely to represent structures that can be well approximated using geometric objects, such as lines, circles, etc., and to represent grid data over an area.
If you have produced figures for a manuscript, you are actually already familiar with these type of data. Images can be saved as vector graphics, where each element is represented by a vector/geometry, or as a matrix/raster of pixels. This is an old topic in computer graphics, and a nice summary can be found on Adobe website: https://www.adobe.com/creativecloud/file-types/image/comparison/raster-vs-vector.html. This analogy is a useful one; remember it.
2.1 Geometries
Structures such as trees, streets, or buildings can be represented by geometric objects. For instance, a tree can be represented by a point, a street by straight line, and a building by a polygon. It makes sense to store the data of these types of structures as their geometric representation. For example, the building we are in and the adjacent street:
# This code in incomprehensible to you FOR NOW
# Just an example of geometries to show you how they look on a map
idiv <- rast("data/idiv-building.tif")
plotRGB(idiv)
l <- vect(
rbind(
c(12.39585, 51.31866),
c(12.39846, 51.31722)
),
crs = "EPSG:4326"
)
lines(l, col = "green", lw = 5)
idiv <- vect(
rbind(
c(12.39573, 51.31794),
c(12.39645, 51.31765),
c(12.39664, 51.31812),
c(12.39602, 51.31849),
c(12.39573, 51.31794)
),
crs = "EPSG:4326"
) |>
as.lines() |>
as.polygons()
idiv$name <- "iDiv building"
idiv$perimeter <- perim(idiv)
idiv$area <- expanse(idiv)
writeVector(idiv, "data/idiv.shp", overwrite = TRUE)
plot(idiv, add = TRUE, col = adjustcolor("blue", alpha.f = .5))
Geometries are saved in vector- or _shape-_files (usually with extension .shp
). In addition to the geometric representation of the structure, shapefiles also contain metadata for the geometric representation, such as its extent and coordinate reference system (we will talk about this later), and for the structure, such as their name, length, etc. For instance, the geometry of the iDiv building above is:
## class : SpatVector
## geometry : polygons
## dimensions : 1, 3 (geometries, attributes)
## extent : 12.39573, 12.39664, 51.31765, 51.31849 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## names : name perimeter area
## type : <chr> <num> <num>
## values : iDiv building 237.7 3265
Note that this structure has one geometry and three attributes/variables (dimensions: 1, 3 (geometries, attributes)
): name
, perimeter
(\(m\)), and area
(\(m^2\)).
2.1.1 Geometry Types
In terra
, there are three main geometry types:
- Points: they are defined by a vector with two values for coordinates (e.g., longitude and latitude)
- Lines: they are a series of points connected by straight lines
- Polygons: they are a series of lines inscribing a closed area
In terra
, geometries are created with the function vect()
. To create a point geometry, you need to pass a matrix with the coordinates:
## class : SpatVector
## geometry : points
## dimensions : 1, 0 (geometries, attributes)
## extent : 12.39585, 12.39585, 51.31866, 51.31866 (xmin, xmax, ymin, ymax)
## coord. ref. :
To create a geometry of multiple points, simply pass a matrix that has multiple rows:
In order to create lines, first create a multi-point geometry, then convert it using as.lines()
:
In order to create polygons, first create a mutli-point geometry, then covert it to lines, and finally to polygons using as.polygons()
. The last point should be the same as the first to close the geometry. A geometry that is not closed does not inscribe an area and is not a polygon.
xy <- rbind(
cbind(12.39585, 51.31866),
cbind(12.39584, 51.31865),
cbind(12.39584, 51.31866),
cbind(12.39585, 51.31866)
)
pol <- vect(xy) |> as.lines() |> as.polygons()
2.1.2 Reading & Writing Geometries
In most cases, you will not create your own geometry by hands, but load geometries from files. To read geometries from a file, use vect()
.
## class : SpatVector
## geometry : polygons
## dimensions : 1, 5 (geometries, attributes)
## extent : 5.866755, 15.04179, 47.27012, 55.05866 (xmin, xmax, ymin, ymax)
## source : germany.shp
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## names : shapeName shapeISO shapeID shapeGroup
## type : <chr> <chr> <chr> <chr>
## values : the Federal Republi~ DEU 19620994B6459175825~ DEU
## shapeType
## <chr>
## ADM0
To write geometries to a file, use writeVector()
.
2.2 Rasters
Rasters are gridded areas where each grid pixel takes a value. Rasters are useful because:
- they compress the information of geometries to a manageable size. Imagine to have to work with building geometries globally.
- the data they represent is obtained in a gridded format. This is common for satellite data, for example.
When the data we are interested in are values on a gridded area, rasters are a better options than to using geometries, as they save a lot of memory and space on disk. For example, if we want to represent landcover type over a large area(forest, agriculture, build-up, etc.), it is easier to grid the area into pixels and save the landcover type of each pixel rather than to create a geometry for each different structure.
Rasters are saved in raster files (usually with extension
.tif
). In addition to the values of each grid, rasterfiles also contain metadata for the grid, such as the its extent and coordinate reference system (we will talk about this later). For instance, some of the metadata of the landcover raster:
## class : SpatRaster
## dimensions : 2042, 2032, 1 (nrow, ncol, nlyr)
## resolution : 10, 10 (x, y)
## extent : 4477870, 4498190, 3126660, 3147080 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035)
## source : landcover-2015.tif
## color table : 1
## categories : category
## name : category
## min value : Artificial Land
## max value : Water
Note the metadata dimensions
(grid size), resolution
(spatial resolution, in \(m\) for this raster), and extent
(spatial extent).
2.2.1 Creating Rasters from Scratch
Simply put, rasters are matrices/arrays with associated spatial metadata. In terra
, use rast()
to create a raster from a matrix.
Usually, you will not create rasters from scratch. However, if you are familiar with R arrays, you will notice a nice properties that is inherited by rasters: rasters can have multiple layers, with each layer being a matrix.
When an area has multiple rasters, it is possible to stack them to create a single raster object. The rasters in the stack need to have all the same extent and coordinate reference system. Use c()
to stack rasters using terra
.
r1 <- rast(matrix(rnorm(100), nrow = 10, ncol = 10))
r2 <- rast(matrix(rnorm(100), nrow = 10, ncol = 10))
r <- c(r1, r2) # a stack
plot(r)
2.2.2 Reading & Writing Rasters
To read rasters into memory, use rast()
.
## class : SpatRaster
## dimensions : 2042, 2032, 1 (nrow, ncol, nlyr)
## resolution : 10, 10 (x, y)
## extent : 4477870, 4498190, 3126660, 3147080 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035)
## source : landcover-2015.tif
## color table : 1
## categories : category
## name : category
## min value : Artificial Land
## max value : Water
To write rasters to disk, use writeRaster()
.
Here, the argument datatype = "INT1U"
(specifying the values in the grid are 1-byte unsigned integers) is needed because the rasters contains classes (levels). In most cases, this argument is best left unspecified, as terra
picks the optimal value by default. Once you have used GIS for some time, you will notice that some of the rasters you downloaded have been scaled up to weird numbers, e.g. 65535. This is because the creator of those rasters decided to save them as, e.g., 2-bytes unsigned integers, to make them smaller and, thus, easier to move around the internet. If you need to deal with extremely big rasters, this trick may come in handy.
2.3 Spatial metadata
Spatial objects always need certain metadata to be useful. For example, a raster is of little help if there is not information of the area it covers. The most important metadata are:
- Spatial extent: the four corners of the quadrilateral polygon that inscribe the object represented.
- Coordinate reference system: how the earth surface is represented.
- Resolution (for rasters only): the size of a pixel.
2.3.1 Spatial extent
The spatial extent is the quadrilateral that inscribe the area of the spatial data. The extent is usually represented by the coordinates of the four vertices of the quadrilateral, i.e. xmin
, xmax
, ymin
, and ymax
. Use ext()
to get the spatial extent of an object.
## SpatExtent : 4477870, 4498190, 3126660, 3147080 (xmin, xmax, ymin, ymax)
2.3.2 Coordinate reference system
GIS try to represent the surface of the Earth, a 3D spheroid, onto a plane. The coordinate reference system (CRS), also known as spatial reference system (SRS), defines how this projection of a 3D object to a 2D one is achieved. It is not possible to achieve this projection accurately and some distortions will always be present. In particular, at least one of distance, angular conformity, and area will be distorted. Projections can be grouped into types, depending on which property of the Earth surface they do not distort:
- Conformal: they correctly represent the angles between points and, thus, shapes (e.g. ESRI:54004).
- Equidistant: they correctly represent distances (e.g. ESRI:54002).
- Equal-area: they correctly represent areas (e.g. ESRI:54034).
An overview of ESRI and EPSG1 projections can be found at https://spatialreference.org/. Wikipedia also has a nice list with the property of some projection: https://en.wikipedia.org/wiki/List_of_map_projections.
Use crs()
to know the CRS of an object. By default, crs()
displays the CRS in Well Known Text (WKT) format.
## [1] "GEOGCRS[\"WGS 84\","
## [2] " DATUM[\"World Geodetic System 1984\","
## [3] " ELLIPSOID[\"WGS 84\",6378137,298.257223563,"
## [4] " LENGTHUNIT[\"metre\",1]]],"
## [5] " PRIMEM[\"Greenwich\",0,"
## [6] " ANGLEUNIT[\"degree\",0.0174532925199433]],"
## [7] " CS[ellipsoidal,2],"
## [8] " AXIS[\"geodetic latitude (Lat)\",north,"
## [9] " ORDER[1],"
## [10] " ANGLEUNIT[\"degree\",0.0174532925199433]],"
## [11] " AXIS[\"geodetic longitude (Lon)\",east,"
## [12] " ORDER[2],"
## [13] " ANGLEUNIT[\"degree\",0.0174532925199433]],"
## [14] " USAGE["
## [15] " SCOPE[\"unknown\"],"
## [16] " AREA[\"World\"],"
## [17] " BBOX[-90,-180,90,180]],"
## [18] " ID[\"EPSG\",4326]]"
Notice, among the others, the attributes AREA
(the area of usage for the CRS) and ID
(in this case, the EPSG code). WKT is not very nice for humans; use the extra argument proj = TRUE
to see the PROJ4 format of the CRS.
## [1] "+proj=longlat +datum=WGS84 +no_defs"
Use project()
to project a spatial object from one CRS to another.
2.3.3 Resolution
Resolution applies only to rasters, as geometries can be scaled at any level. Use res()
to get the resolution of a raster.
## [1] 0.0001178543 0.0001178543
The first value is the “horizontal” and the second is the “vertical” resolution. The units of the output of res()
is the same as the unit of the CRS, in this case \(m\):
## [1] "+proj=longlat +datum=WGS84 +no_defs"
3 Reprojecting
It is quite common that data are obtained from a source that uses a CRS that is not ideal for analyses. It is even more common that data are gathered from multiple sources that do not use the same CRS. In such cases, the spatial data should be reprojected to one common CRS that is ideal for analyses.
Use project()
to project a spatial object from one CRS to another. Three types of arguments can be passed to project()
:
- A spatial object with known CRS.
- A string with the WKT or PROJ4 CRS.
- A code of a known CRS, e.g. from the EPSG standard.
r <- rast("data/landcover-2015.tif")
germany <- vect("data/germany.shp")
germany <- project(germany, crs(r)) # a spatial object
germany_mollweide <- project(germany, "+proj=moll") # a proj4
germany_4326 <- project(germany, "EPSG:4326") # a CRS code
When using an existing spatial object (<obj>
) as CRS target, remember to useproject(<x>, crs(<obj>))
and notproject(<x>, <obj>)
if you are interested in reprojecting only. See section 5.3 about resampling for details.
CRS have a big influence on analyses and visualization.
plot(buffer(vect(cbind(2e6, 3e6)), 4e6), col = "white", lw = 1e-6)
polys(germany, col = "grey90")
polys(germany_mollweide, col = "grey90")
polys(germany_4326, col = "grey90")
l <- vect(
rbind(
cbind(5e5, 5e5),
cbind(0, 0)
)
)
lines(l[1], l[2], arrows = TRUE, length = .1)
l <- vect(
rbind(
cbind(2e6, 2e6),
cbind(0, 0)
)
)
text(l[1], labels = "Cannot be seen,\nbut EPSG:4326 is here")
Be sure to use the CRS that is best for your analyses or visualization.
4 Data Conversion
Sometimes is needed to create a raster from a geometry or vice versa.
4.1 Geometries to Rasters
Use rasterize()
to convert geometries to raster. In addition to the geometry to rasterize, you need to pass a raster to function as template, i.e. from which extent, CRS, and resolution are extracted from.
landcover <- rast("data/landcover-2015.tif")
idiv <- vect("data/idiv.shp")
idiv <- project(idiv, crs(landcover)) # CRS needs to be the same
idiv_r <- rasterize(idiv, landcover, touches = TRUE)
If specified, touches = TRUE
assigns a value of 1 to all cells that are touched by the polygon. The extent of this new raster is the same as the landcover one, which is too big to actually see the raterized polygon. Use trim()
to trim the raster to the smallest raster containing all values that are not NA
.
4.2 Rasters to Geometries
Use as.points()
or as.polygons()
to convert rasters to geometries.
landcover <- crop(landcover, buffer(idiv, 1e2)) # restrict the area to something that can be plotted
poi <- as.points(landcover)
pol <- as.polygons(landcover)
op <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
plot(poi, "category")
lines(idiv)
plot(pol, "category")
lines(idiv)
5 Geometry Operations
Commonly operations on geometries are to calculate their length and area, find their centroids, calculate distance, and buffering them.
5.1 Length
Use perim()
to obtain the length of lines or the perimeter of polygons.
## [1] 237.7479
## [1] 237.7479
The units are defined by the CRS.
It is best to project the geometry to a longitude/latitude CRS to get more accurate results.
## [1] 237.7449
5.2 Area
Use expanse()
to obtain the area of polygons.
## [1] 3264.621
The units are defined by the CRS.
It is best to project the geometry to a longitude/latitude CRS to get more accurate results.
## [1] 3264.621
5.3 Centroid
The centroid of a geometry is the point defined as the arithmetic mean position of all the points on the surface of the geometry. Use centroids()
to get the centroids of the geometries.
For concave polygons, the centroid may lay outside of the polygon itself
p <- vect(matrix(c(0, 0, 1, 0, 1, 1, 0.9, 0.1, 0, 0), byrow = TRUE, ncol = 2)) |>
as.lines() |>
as.polygons()
centr <- centroids(p)
plot(p, col = "blue")
points(centr, bg = "gold", cex = 2, pch = 21)
Specify centroids(inside = TRUE)
to force the “centroid” to be inside the polygon.
p <- vect(matrix(c(0, 0, 1, 0, 1, 1, 0.9, 0.1, 0, 0), byrow = TRUE, ncol = 2)) |>
as.lines() |>
as.polygons()
centr <- centroids(p, inside = TRUE)
plot(p, col = "blue")
points(centr, bg = "gold", cex = 2, pch = 21)
5.4 Distance
Use distance()
to get the distances between geometries When passing only an object will multiple geometries, distances will be calculated among each geometry and a (symmetric) matrix is returned.
poi <- vect(matrix(rnorm(10), ncol = 2), crs = "EPSG:4326") # 5 random points
poi |> distance() |> as.matrix()
## 1 2 3 4 5
## 1 0.00 190406.8 17477.25 260076.3 50142.74
## 2 190406.79 0.0 179483.33 409508.5 230864.84
## 3 17477.25 179483.3 0.00 257884.1 67467.95
## 4 260076.25 409508.5 257884.09 0.0 265154.45
## 5 50142.74 230864.8 67467.95 265154.5 0.00
When passing also a second geometry, distances will be calculated among each geometry of the first object and each geometry of the second object.
## [,1]
## [1,] 5876848
## [2,] 5894491
## [3,] 5864594
## [4,] 5666371
## [5,] 5906264
5.5 Buffer
The buffer of a geometry is obtained by extending the geometry perpendicular to the tangent line of its side. It is easier to see it than to explain it. Use buffer()
to buffer geometries.
5.6 Exercise
Find the centroid of Germany, how far is from the centroid of the main iDiv bulding, and plot the circle centered at the centroid with the smallest area that is touching the centroid of the iDiv building. The geometry of Germany is saved in data/germany.shp
. The geometry of the iDiv building is saved in data/idiv.shp
. Click on the “Code” button below to see a solution to this problem.
ger <- vect("data/germany.shp")
idiv <- vect("data/idiv.shp")
centr <- centroids(ger) # centroid of Germany
idiv_centr <- idiv |> project(crs(ger)) |> centroids() # centroid of iDiv
d <- distance(centr, idiv_centr) # distance between centroids
b <- buffer(centr, d) # circle with the smallest area touching iDiv
l <- as.lines(rbind(centr, idiv_centr)) # line connecting centroids
plot(ger)
polys(b, col = adjustcolor("blue", alpha.f = .5))
lines(l)
points(centr, pch = 21, cex = 2, bg = "gold")
points(idiv_centr, pch = 21, cex = 2, bg = "green")
text(centr, "Germany\ncentroid", pos = 2)
text(idiv_centr, "iDiv\ncentroid", pos = 4)
text(l, paste(round(d) / 1e3, "km"), pos = 3)
text(b, paste("Area =", round(expanse(b) / 1e6), "km^2"), pos = 3, offset = 5)
6 Raster Operations
Commonly operations on rasters are to calculate summary statistics, aggregate or disaggregate to different resolutions, resampling, and interpolation.
6.1 Summary Statistics
6.1.1 Local Summary
A local summary is calculated across pixels with the same coordinates i.e. locally. Raster stacks can be summarized using common function, such as mean
, sd
, sum
, etc. The output is a single raster layer with the summary. Mean, sum, minimum, and maximum, can be calculated simply using mean()
, sum()
, etc. For other function, use app()
.
ndvi <- rast("data/ndvi-2024.tif")
avg_ndvi <- mean(ndvi, na.rm = TRUE)
std_ndvi <- app(ndvi, "sd", na.rm = TRUE)
month_max_ndvi <- app(ndvi, "which.max", na.rm = TRUE) # month with the maximum NDVI
plot(avg_ndvi, col = colorRampPalette(c("dodgerblue3", "orange", "darkgreen"))(100))
6.1.2 Global Summary
Global summaries are calculated across all pixels of a layer. Use global()
to get global summaries. When the input is a raster stack, the global summary is calculated for each layer separately.
6.1.3 Zonal Summary
Zonal statistics summarize the values of a raster using categories from another raster of geometry. Use zonal()
to obtain zonal statistics. Two arguments (<x>
, <y>
) are always required for zonal(<x>, <y>)
, with <x>
being the raster with the values to summarize and <y>
the raster/geometry specifying the categories for the summary.
landcover <- rast("data/landcover-2015.tif")
avg_ndvi <- project(avg_ndvi, landcover) # this will also resample avg_ndvi (see the section Resampling)
zonal(avg_ndvi, landcover, fun = "mean", na.rm = TRUE)
6.2 Aggregate and Disaggregate
6.2.1 Aggregate
Aggregating creates a raster that is at coarser resolution compared to the original raster. Use aggregate()
to aggregate rasters. The arguments fact
, number of cells in each direction to be aggregated, and fun
, the function used for aggregatation, need to be specified.
r <- rast(matrix(rnorm(16^2), 16, 16))
r_aggr <- aggregate(r, 4, "max") # maximum values in the 4x4 grid
par(mfrow = c(1, 2)) # cannot stack rasters with different resolution
plot(r)
plot(r_aggr)
6.2.2 Disaggregate
Disaggregating creates a raster that is at finer resolution compared to the original raster. Use disagg()
to disaggregate rasters. The arguments fact
, number of cells in each direction to create for each cell of the original raster, and method
, the method used for disaggregation, need to be specified. Available methods are near
, for nearest neighbor, or bilinear
, for bilinear interpolation.
r_disagg <- disagg(r, 4, method = "bilinear")
par(mfrow = c(1, 2)) # cannot stack rasters with different resolution
plot(r)
plot(r_disagg)
Aggregating and disaggregating a raster does not give, in general, the same raster back. Some of the original information is lost during aggregation.
6.2.3 Exercise
Assess how global average of NDVI is affected by the resolution of the NDVI layers. Aggregate the NDVI layer by a factor seq(2, 20, by = 1)
and see if the global average change. Click on the “Code” button below to see a solution to this problem.
ndvi <- rast("data/ndvi-2024.tif") |> mean()
global_avg_ndvi <- rep(NA, 20)
global_avg_ndvi[1] <- (ndvi |> global("mean", na.rm = TRUE))[1, 1]
for (f in 2:20) {
global_avg_ndvi[f] <- (ndvi |> aggregate(fact = f) |> global("mean", na.rm = TRUE))[1, 1]
}
global_avg_ndvi <- data.frame(
fact = 1:20,
avg_ndvi = global_avg_ndvi
)
scatter.smooth(global_avg_ndvi, ylim = c(0, .5))
The average NDVI is not changing. This is a property of the average. Try instead to see if the standard deviation changes.
ndvi <- rast("data/ndvi-2024.tif") |> mean()
global_std_ndvi <- rep(NA, 20)
global_std_ndvi[1] <- (ndvi |> global("sd", na.rm = TRUE))[1, 1]
for (f in 2:20) {
global_std_ndvi[f] <- (ndvi |> aggregate(fact = f) |> global("sd", na.rm = TRUE))[1, 1]
}
global_std_ndvi <- data.frame(
fact = 1:20,
std_ndvi = global_std_ndvi
)
scatter.smooth(global_std_ndvi, ylim = c(0.1, .3))
The standard deviation decreases for increasing aggregation factor. This is because aggregating is done here (implicitly) using
mean
as the aggregating function. This reduces spatial heterogeneity and, therefore, the global standard deviation. Scale is important for some processes, but not others.
6.3 Resampling
When two rasters do not align, i.e. they have different origin or resolution, to make them comparable one of them must be resampled. Resampling transform one of the two rasters to a raster that align with the other. Use resample()
to resample rasters. The argument method
is used to specify the algorithm used for resampling. Among the available algorithms are:
near
, for nearest neighbor.bilinear
, for bilinear interpolation.cubic
, for cubic interpolation.cubicspline
, for cubic-spline interpolation.lanczos
, for Lanczos resampling.
landcover <- rast("data/landcover-2015.tif")
ndvi <- rast("data/ndvi-2024.tif") |>
mean(na.rm = TRUE) |>
project(crs(landcover))
ndvi_resampled <- resample(ndvi, landcover, method = "lanczos")
plot(ndvi_resampled, col = colorRampPalette(c("dodgerblue3", "orange", "darkgreen"))(100))
The difference betweenproject(x, crs(y))
andproject(x, y)
is that the first only project the CRS, whereas the second actually resamplex
toy
. In many casesresample(x, y, method = <method>)
can be avoided by usingproject(x, y, method = <method>)
6.4 Interpolation
Interpolation uses a statistical model to make geographic predictions. A simple statistical model can include, for example, the coordinates of the grid. This works relatively well with processes that change gradually with longitude or latitude, such as temperature.
tas <- rast("data/wc2-bio1.tif") # average annual temperature, Celsius
elev <- rast("data/wc2-elevation.tif") # elevation, meters
par(mfrow = c(1, 2))
plot(tas, col = hcl.colors(100, "Zissou 1"))
plot(elev, col = map.pal("haxby", 5))
plot(elev, col = colorRampPalette(c("forestgreen", "#CDFFA2", "#FFAF4D", "#FFFFFF"))(100))
par(mfrow = c(1, 1))
r <- c(tas, elev)
# construct table for model fitting
xy <- xyFromCell(r, 1:ncell(r))
xy <- cbind(xy, extract(r, xy, cells = TRUE))
# fit a simple linear model with coordinates
model <- lm(tas ~ x * y + elevation, data = xy)
# interpolate
interpolated <- interpolate(r, model)
interpolated <- mask(interpolated, tas) # remove sea and large water bodies
names(interpolated) <- "tas_interpolated"
plot(c(tas, interpolated), col = hcl.colors(100, "Zissou 1"))
Considering the simplicity of the underlying model, the interpolation works quite well.
# Prediction errors
# In red, over-predictions
# In blue, under-predictions
plot(interpolated - tas, col = hcl.colors(100, "Blue-Red 3"))
7 Examples
I provide three examples below of some of the most common applications of GIS in Ecology:
- Species distribution modelling (SDM).
- Analysis of telemetry (GPS) data.
- Count points inside polygons, e.g. number of trees in Leipzig districts.
These examples are not reserach-ready, i.e. the analysis can be largely improved. However, they provide a good starting point to start similar projects using GIS.
7.1 Species Distribution Model (SDM)
I will show how to perform a simple species distribution model (SDM) for the species Erica arborea.
I load the libraries.
I load the file downloaded from GBIF and convert it to a geometry (points).
gbif <- read_tsv("data/erica-arborea-gbif.tsv") |>
select("decimalLongitude", "decimalLatitude") |>
vect(
geom = c("decimalLongitude", "decimalLatitude"),
crs = "+proj=longlat +datum=WGS84" # I know this is GBIF CRS
)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 13228 Columns: 50
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (31): datasetKey, occurrenceID, kingdom, phylum, class, order, family, ...
## dbl (13): gbifID, individualCount, decimalLatitude, decimalLongitude, coord...
## lgl (4): infraspecificEpithet, depth, depthAccuracy, typeStatus
## dttm (2): dateIdentified, lastInterpreted
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I load the bioclimatic variables from WorldClim.
bios <- rast("data/bios.tif")
# I also create a template of the land
land <- bios[[1]]
land[!is.na(land)] <- 1
names(land) <- "land"
I thin the GBIF data retaining only one point per grid cell.
gbif <- extract(land, gbif, cells = TRUE, xy = TRUE) |>
as_tibble() |>
filter(!is.na(land)) |>
group_by(cell) |>
slice_sample(n = 1) |>
ungroup() |>
select("x", "y") |>
vect(geom = c("x", "y"), crs = "+proj=longlat +datum=WGS84")
I sample pseudo-absences randomly in the polygon that contains all GBIF observation, buffered to 100 km.
range <- gbif |> convHull()
abs <- spatSample(buffer(range, 1e5), length(gbif) * 2)
abs <- extract(land, abs, cells = TRUE, xy = TRUE) |>
as_tibble() |>
filter(!is.na(land)) |>
group_by(cell) |>
slice_sample(n = 1) |>
ungroup() |>
select("x", "y") |>
slice_sample(n = length(gbif)) |>
vect(geom = c("x", "y"), crs = "+proj=longlat +datum=WGS84")
I prepare the data frame to be used for fitting the SDM.
plot(land, col = "grey90", legend = FALSE)
points(p, col = ifelse(p$occ == 0, "darkblue", "gold2"))
lines(buffer(range, 1e5))
I run the SDM, in this case a simple GLM. I use only some of the bioclimatic variables2.
d <- d[, c("BIO01", "BIO04", "BIO12", "BIO15")]
d$occ <- p$occ
sdm <- glm(occ ~ ., data = d, family = "binomial")
summary(sdm)
##
## Call:
## glm(formula = occ ~ ., family = "binomial", data = d)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0260 -1.0576 0.0838 1.1065 2.0504
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0426731 0.4942614 -2.110 0.0349 *
## BIO01 0.1719717 0.0232111 7.409 1.27e-13 ***
## BIO04 -0.0035294 0.0004838 -7.294 3.00e-13 ***
## BIO12 0.0012559 0.0001916 6.556 5.52e-11 ***
## BIO15 -0.0009162 0.0035120 -0.261 0.7942
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3410.3 on 2459 degrees of freedom
## Residual deviance: 3162.4 on 2455 degrees of freedom
## AIC: 3172.4
##
## Number of Fisher Scoring iterations: 4
I use the model to predict the suitability of the species in geographic space.
suitability <- predict(bios, sdm, type = "response")
plot(
suitability,
col = hcl.colors(100, "Geyser", rev = TRUE),
breaks = seq(0, 1, by = .02),
type = "continuous",
main = "Predicted suitability"
)
Zooming on the Mediterranean regions.
plot(
crop(suitability, ext(-9, 24, 35, 50)),
col = hcl.colors(100, "Geyser", rev = TRUE),
breaks = seq(0, 1, by = .02),
type = "continuous",
main = "Predicted suitability"
)
points(p[p$occ == 1, ], cex = .3)
7.3 Density of Planted Trees in Leipzig
I will show how to find the density of planted trees 3 for the districts of the city of Leipzig. The original data was obtained from the Leipzig Bureau for green and water in the city: https://opendata.leipzig.de/dataset/baumkataster-stadt-leipzig1.
I load the shapefile of the districts of Leipzig and the planted tree census (baumkataster). The census is a csv file with the column geom
representing the geometries in well-known text (WKT) format.
districts <- vect("data/ot.shp") # Leipzig districts
baumkataster <- read_csv(
"data/Baumkataster_Stadt_Leipzig.csv",
show_col_types = FALSE
) |>
mutate( # rename the columns
species = ga_lang_wiss,
geom,
.keep = "none" # keep only species and geom columns
)
baumkataster
I create the geometry object, specifying the CRS to be the same of the Leipzig districts.
## class : SpatVector
## geometry : points
## dimensions : 185049, 1 (geometries, attributes)
## extent : 308130.4, 328398.8, 5680725, 5701485 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89 / UTM zone 33N (EPSG:25833)
## names : species
## type : <chr>
## values : Quercus robur
## Quercus robur
## Quercus robur
Next, I want to know in which district each tree is located. There are several ways of achieving this, but the fastest is probably using terra::extract(<x>, <y>)
, which returns a data frame with the features of <x>
for the geometries of <y>
. I extract the features of district
for the locations specified by tree
.
tab <- extract(districts, tree) |>
as_tibble() |>
bind_cols(tree |> as_tibble()) |> # add the columns from tree
filter(!is.na(Name)) # remove trees outside Leipzig proper
tab
Then, I count the number of rows (one row = one tree) per district.
##
## Althen-Kleinpösna Altlindenau
## 985 2491
## Anger-Crottendorf Baalsdorf
## 3266 567
## Böhlitz-Ehrenberg Burghausen-Rückmarsdorf
## 2230 1783
## Connewitz Dölitz-Dösen
## 3666 2405
## Engelsdorf Eutritzsch
## 3238 5950
## Gohlis-Mitte Gohlis-Nord
## 1734 2090
## Gohlis-Süd Großzschocher
## 1324 3715
## Grünau-Mitte Grünau-Nord
## 4329 3606
## Grünau-Ost Grünau-Siedlung
## 5089 862
## Hartmannsdorf-Knautnaundorf Heiterblick
## 803 2330
## Holzhausen Kleinzschocher
## 2386 4922
## Knautkleeberg-Knauthain Lausen-Grünau
## 1616 4122
## Leutzsch Liebertwolkwitz
## 2630 1965
## Lindenau Lindenthal
## 2777 2433
## Lößnig Lützschena-Stahmeln
## 3413 3673
## Marienbrunn Meusdorf
## 1676 1280
## Miltitz Mockau-Nord
## 1288 2193
## Mockau-Süd Möckern
## 815 2299
## Mölkau Neulindenau
## 1383 1758
## Neustadt-Neuschönefeld Paunsdorf
## 2188 4730
## Plagwitz Plaußig-Portitz
## 1962 2092
## Probstheida Reudnitz-Thonberg
## 14567 4085
## Schleußig Schönau
## 2608 3879
## Schönefeld-Abtnaundorf Schönefeld-Ost
## 6055 3529
## Seehausen Sellerhausen-Stünz
## 4538 1633
## Stötteritz Südvorstadt
## 4438 2470
## Thekla Volkmarsdorf
## 2704 2180
## Wahren Wiederitzsch
## 2592 3348
## Zentrum Zentrum-Nord
## 1363 1501
## Zentrum-Nordwest Zentrum-Ost
## 3357 933
## Zentrum-Süd Zentrum-Südost
## 4031 4975
## Zentrum-West
## 3864
I add the value to district
so I have everything in one object.
Finally, I calculate the area of each polygon using expanse()
and the tree density.
districts$area <- expanse(districts, "km")
districts$tree_density <- log10(districts$n_trees / districts$area)
tree_density
is \(log_{10}-\)scaled to make visualization easier. I plot everything, adding also the Leipzig riparian forest (auwald
) to the map.
auwald <- vect("data/auwald.shp") |> project(crs(districts))
plot(districts, "tree_density", col = hcl.colors(100, "Fall", rev = TRUE), type = "continuous")
lines(auwald, col = "grey20", lw = 5)
# I show the names of the districts with highest tree density
text(
districts[districts$tree_density > quantile(districts$tree_density, .9)],
districts$Name[districts$tree_density > quantile(districts$tree_density, .9)],
col = "grey20",
halo = TRUE,
hw = .15,
cex = 0.7
)
text(auwald, "Auwald", adj = c(0, -3.7), halo = TRUE, hw = .3, cex = 1.8)
ESRI stands for Environmental Systems Research Institute, Inc., which is the company that developed ArcGIS and created a code standard for projections. The other commonly used standard is maintained by the European Petroleum Survey Group (EPSG).↩
This is just an example, and you should do better than this.↩
Or better, of planted trees that have been recorded.↩