GIS course @ yDiv
1 Introduction
This document is a living tutorial that mirrors the lectures of the yDiv course “An Introduction to GIS for Ecology”, held at iDiv in Leipzig. In this two day course we will cover the very basics of GIS, focusing especially on applications to ecology. During the first day, we will get to know the main data types of GIS, that is geometries and rasters. During the second day, we will learn some of the most used GIS operations. You will also start your own individual/group project. The aims of this course are to
- Familiarize yourself with basic GIS data and operations
- Start a project of your interest
- Getting some simple figures/analyses started
Do not expect to become a world-leading GIS expert by the end of the course; this is a basic introductory course (and GIS is quite hard to master).
At the end of each section, you will find multiple exercises; they are not mandatory, but they will help you. Before starting, be sure to have your machine set up properly. Also, you can download all data used in this tutorial by clicking here.
1.1 Set up
You are strongly encouraged to run the examples as we progress through the lectures. For this you will need to following set up:
- R installed.
terra
installed.codetools
installed (optional, used byterra
to perform code checks).- Download the data archive and decompress it.
Try run library(terra)
. If it works, good news. If it does not, contact me some days before the course starts.
For the final exercise of the course, you will perform a simple species’ distribution modeling. For this, you will need to:
- Download WorldClim bioclimatic variables at 5 arc-minute resolution.
- Download the presence records for a species of your choosing from GBIF.
1.2 Data Sets
In this tutorial, we will use data sets for:
- NDVI for the year 2024 in Leipzig, obtained from Sentinel2 (
ndvi-2024.tif
). - Landcover of Leipzig in 2015, obtained from the German Aerospace Center (DLR;
landcover_4326-2015.tif
). - Average annual surface temperature from WorldClim (
wc2-bio1.tif
) - Bioclimatic variables from WorldClim (
bios.tif
) - Digital elevation model from WorldClim (
wc2-elevation.tif
)
All these files can be downloaded from https://emilio-berti.github.io/teaching/gis-ydiv/data.zip.
1.3 Lecturers
Emilio Berti (TiBS)
- Theory in Biodiversity Science
- Macroecologist / biogeographer
- Theoretical ecologist
Guilherme Pinto (BioEcon)
- Geoinformatics
- Data scientist
- Marine heatwaves / fisheries
2 Data Types
There are two main data types used in GIS:
- Geometries (also called vectors or shapes). This usually has extension
.shp
. - Rasters. This usually has extension
.tif
.
They fill two different needs, namely to represent structures that can be well approximated using geometric objects, such as lines, circles, etc., and to represent grid data over an area.
If you have produced figures for a manuscript, you are actually already familiar with these type of data. Images can be saved as vector graphics, where each element is represented by a vector/geometry, or as a matrix/raster of pixels. This is an old topic in computer graphics, and a nice summary can be found on Adobe website: https://www.adobe.com/creativecloud/file-types/image/comparison/raster-vs-vector.html. This analogy is a useful one; remember it.
2.1 Geometries
Structures such as trees, streets, or buildings can be represented by geometric objects. For instance, a tree can be represented by a point, a street by straight line, and a building by a polygon. It makes sense to store the data of these types of structures as their geometric representation. For example, the building we are in and the adjacent street:
# This code in incomprehensible to you FOR NOW
# Just an example of geometries to show you how they look on a map
<- rast("data/idiv-building.tif")
idiv plotRGB(idiv)
<- vect(
l rbind(
c(12.39585, 51.31866),
c(12.39846, 51.31722)
),crs = "EPSG:4326"
)lines(l, col = "green", lw = 5)
<- vect(
idiv rbind(
c(12.39573, 51.31794),
c(12.39645, 51.31765),
c(12.39664, 51.31812),
c(12.39602, 51.31849),
c(12.39573, 51.31794)
),crs = "EPSG:4326"
|>
) as.lines() |>
as.polygons()
## Warning: [as.polygons] dropped attributes
$name <- "iDiv building"
idiv$perimeter <- perim(idiv)
idiv$area <- expanse(idiv)
idivwriteVector(idiv, "data/idiv.shp", overwrite = TRUE)
plot(idiv, add = TRUE, col = adjustcolor("blue", alpha.f = .5))
Geometries are saved in vector- or shape-files (usually with extension .shp
). In addition to the geometric representation of the structure, shapefiles also contain metadata for the geometric representation, such as its extent and coordinate reference system (we will talk about this later), and for the structure, such as their name, length, etc. For instance, the geometry of the iDiv building above is:
idiv
## class : SpatVector
## geometry : polygons
## dimensions : 1, 3 (geometries, attributes)
## extent : 12.39573, 12.39664, 51.31765, 51.31849 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## names : name perimeter area
## type : <chr> <num> <num>
## values : iDiv building 237.7 3265
Note that this structure has one geometry and three attributes/variables (dimensions: 1, 3 (geometries, attributes)
): name
, perimeter
(\(m\)), and area
(\(m^2\)).
2.1.1 Geometry Types
In terra
, there are three main geometry types:
- Points: they are defined by a vector with two values for coordinates (e.g., longitude and latitude)
- Lines: they are a series of points connected by straight lines
- Polygons: they are a series of lines inscribing a closed area
In terra
, geometries are created with the function vect()
. To create a point geometry, you need to pass a matrix with the coordinates:
<- cbind(12.39585, 51.31866) # cbind force it to be a matrix
xy vect(xy)
## class : SpatVector
## geometry : points
## dimensions : 1, 0 (geometries, attributes)
## extent : 12.39585, 12.39585, 51.31866, 51.31866 (xmin, xmax, ymin, ymax)
## coord. ref. :
To create a geometry of multiple points, simply pass a matrix that has multiple rows:
<- rbind(
xy cbind(12.39585, 51.31866),
cbind(12.39584, 51.31865)
)<- vect(xy) poi
In order to create lines, first create a multi-point geometry, then convert it using as.lines()
:
<- rbind(
xy cbind(12.39585, 51.31866),
cbind(12.39584, 51.31865)
)<- vect(xy) |> as.lines() lin
In order to create polygons, first create a mutli-point geometry, then covert it to lines, and finally to polygons using as.polygons()
. The last point should be the same as the first to close the geometry. A geometry that is not closed does not inscribe an area and is not a polygon.
<- rbind(
xy cbind(12.39585, 51.31866),
cbind(12.39584, 51.31865),
cbind(12.39584, 51.31866),
cbind(12.39585, 51.31866)
)<- vect(xy) |> as.lines() |> as.polygons() pol
## Warning: [as.polygons] dropped attributes
plot(pol, col = "gold")
lines(lin, col = "red", lw = 5)
points(poi, col = "blue", cex = 1)
2.1.2 Reading, Subsetting & Writing Geometries
In most cases, you will not create your own geometry by hands, but load geometries from files. To read geometries from a file, use vect()
.
<- vect("data/germany_bundes.shp") # shapefile with the country boundary of Germany
ger ger
## class : SpatVector
## geometry : polygons
## dimensions : 16, 11 (geometries, attributes)
## extent : 5.866251, 15.04181, 47.2707, 55.05653 (xmin, xmax, ymin, ymax)
## source : germany_bundes.shp
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## names : GID_1 GID_0 COUNTRY NAME_1 VARNAME_1 NL_NAME_1
## type : <chr> <chr> <chr> <chr> <chr> <chr>
## values : DEU.1_1 DEU Germany Baden-Württemberg NA NA
## DEU.2_1 DEU Germany Bayern Bavaria NA
## DEU.3_1 DEU Germany Berlin NA NA
## TYPE_1 ENGTYPE_1 CC_1 HASC_1 ISO_1
## <chr> <chr> <chr> <chr> <chr>
## Land State 08 DE.BW NA
## Freistaat Free State 09 DE.BY DE-BY
## Land State 11 DE.BE DE-BE
plot(ger)
To subset a geometry, use subset()
<- subset(ger, ger$NAME_1 == "Sachsen")
sn
snplot(sn)
To load a geometry from a .csv file (or any data xy coordinates), use vect()
with the geom
parameter
<- read.csv2("data/points.csv")
coo <- vect(coo, geom=c("x", "y"))
poi poi
## class : SpatVector
## geometry : points
## dimensions : 5, 1 (geometries, attributes)
## extent : 12.31867, 12.54047, 51.26761, 51.38168 (xmin, xmax, ymin, ymax)
## coord. ref. :
## names : geom
## type : <int>
## values : 1
## 2
## 3
plot(poi)
To write geometries to a file, use writeVector()
.
writeVector(sn, "data/sachsen.shp", overwrite = TRUE) # if already exists you need overwrite = TRUE
2.2 Rasters
Rasters are gridded areas where each grid pixel takes a value. Rasters are useful because:
- they compress the information of geometries to a manageable size. Imagine to have to work with building geometries globally.
- the data they represent is obtained in a gridded format. This is common for satellite data, for example.
When the data we are interested in are values on a gridded area, rasters are a better options than to using geometries, as they save a lot of memory and space on disk. For example, if we want to represent landcover type over a large area(forest, agriculture, build-up, etc.), it is easier to grid the area into pixels and save the landcover type of each pixel rather than to create a geometry for each different structure.
<- rast("data/landcover_4326-2015.tif")
landcover plot(landcover) # from EOC of DLR
Rasters are saved in raster files (usually with extension
.tif
). In addition to the values of each grid, rasterfiles also contain metadata for the grid, such as the its extent and coordinate reference system (we will talk about this later). For instance, some of the metadata of the landcover raster:
landcover
## class : SpatRaster
## dimensions : 1607, 2553, 1 (nrow, ncol, nlyr)
## resolution : 0.0001178543, 0.0001178543 (x, y)
## extent : 12.24617, 12.54705, 51.22333, 51.41272 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : landcover_4326-2015.tif
## color table : 1
## categories : category
## name : category
## min value : Artificial Land
## max value : Water
Note the metadata dimensions
(grid size), resolution
(spatial resolution, in \(m\) for this raster), and extent
(spatial extent).
2.2.1 Creating Rasters from Scratch
Simply put, rasters are matrices/arrays with associated spatial metadata. In terra
, use rast()
to create a raster from a matrix.
<- matrix(rnorm(100), nrow = 10, ncol = 10) # 10x10 matrix
m <- rast(m)
r plot(r)
Usually, you will not create rasters from scratch. However, if you are familiar with R arrays, you will notice a nice properties that is inherited by rasters: rasters can have multiple layers, with each layer being a matrix.
<- rast(array(rnorm(200), dim = c(10, 10, 2))) # 10x10x2 array
r plot(r)
When an area has multiple rasters (e.g.: multiple observation dates), it is possible to stack them to create a single raster object. The rasters in the stack need to have all the same extent and coordinate reference system. Use c()
to stack rasters using terra
.
<- rast(matrix(rnorm(100), nrow = 10, ncol = 10))
r1 <- rast(matrix(rnorm(100), nrow = 10, ncol = 10))
r2 <- c(r1, r2) # a stack
r plot(r)
2.2.2 Reading & Writing Rasters
To read rasters into memory, use rast()
.
<- rast("data/landcover_4326-2015.tif")
landcover landcover
## class : SpatRaster
## dimensions : 1607, 2553, 1 (nrow, ncol, nlyr)
## resolution : 0.0001178543, 0.0001178543 (x, y)
## extent : 12.24617, 12.54705, 51.22333, 51.41272 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : landcover_4326-2015.tif
## color table : 1
## categories : category
## name : category
## min value : Artificial Land
## max value : Water
To write rasters to disk, use writeRaster()
.
writeRaster(landcover, "data/landcover_4326-2015.tif", overwrite = TRUE, datatype = "INT1U")
Here, the argument datatype = "INT1U"
(specifying the values in the grid are 1-byte unsigned integers) is needed because the rasters contains classes (levels). In most cases, this argument is best left unspecified, as terra
picks the optimal value by default. Once you have used GIS for some time, you will notice that some of the rasters you downloaded have been scaled up to weird numbers, e.g. 65535. This is because the creator of those rasters decided to save them as, e.g., 2-bytes unsigned integers, to make them smaller and, thus, easier to move around the internet. If you need to deal with extremely big rasters, this trick may come in handy.
2.3 Spatial metadata
Spatial objects always need certain metadata to be useful. For example, a raster is of little help if there is not information of the area it covers. The most important metadata are:
- Spatial extent: the four corners of the quadrilateral polygon that inscribe the object represented.
- Coordinate reference system: how the earth surface is represented.
- Resolution (for rasters only): the size of a pixel.
2.3.1 Spatial extent
The spatial extent is the quadrilateral that inscribe the area of the spatial data. The extent is usually represented by the coordinates of the four vertices of the quadrilateral, i.e. xmin
, xmax
, ymin
, and ymax
. Use ext()
to get the spatial extent of an object.
<- rast("data/landcover_4326-2015.tif")
landcover ext(landcover)
## SpatExtent : 12.2461656680878, 12.547047594374, 51.2233274363786, 51.4127192325172 (xmin, xmax, ymin, ymax)
2.3.2 Coordinate reference system
GIS try to represent the surface of the Earth, a 3D spheroid, onto a plane. The coordinate reference system (CRS), also known as spatial reference system (SRS), defines how this projection of a 3D object to a 2D one is achieved. It is not possible to achieve this projection accurately and some distortions will always be present. In particular, at least one of distance, angular conformity, and area will be distorted. Projections can be grouped into types, depending on which property of the Earth surface they do not distort:
- Conformal: they correctly represent the angles between points and, thus, shapes (e.g. ESRI:54004).
- Equidistant: they correctly represent distances (e.g. ESRI:54002).
- Equal-area: they correctly represent areas (e.g. ESRI:54034).
An overview of ESRI and EPSG1 projections can be found at https://spatialreference.org/. Wikipedia also has a nice list with the property of some projection: https://en.wikipedia.org/wiki/List_of_map_projections.
Use crs()
to know the CRS of an object. By default, crs()
displays the CRS in Well Known Text (WKT) format.
crs(idiv, parse = TRUE)
## [1] "GEOGCRS[\"WGS 84\","
## [2] " ENSEMBLE[\"World Geodetic System 1984 ensemble\","
## [3] " MEMBER[\"World Geodetic System 1984 (Transit)\"],"
## [4] " MEMBER[\"World Geodetic System 1984 (G730)\"],"
## [5] " MEMBER[\"World Geodetic System 1984 (G873)\"],"
## [6] " MEMBER[\"World Geodetic System 1984 (G1150)\"],"
## [7] " MEMBER[\"World Geodetic System 1984 (G1674)\"],"
## [8] " MEMBER[\"World Geodetic System 1984 (G1762)\"],"
## [9] " MEMBER[\"World Geodetic System 1984 (G2139)\"],"
## [10] " ELLIPSOID[\"WGS 84\",6378137,298.257223563,"
## [11] " LENGTHUNIT[\"metre\",1]],"
## [12] " ENSEMBLEACCURACY[2.0]],"
## [13] " PRIMEM[\"Greenwich\",0,"
## [14] " ANGLEUNIT[\"degree\",0.0174532925199433]],"
## [15] " CS[ellipsoidal,2],"
## [16] " AXIS[\"geodetic latitude (Lat)\",north,"
## [17] " ORDER[1],"
## [18] " ANGLEUNIT[\"degree\",0.0174532925199433]],"
## [19] " AXIS[\"geodetic longitude (Lon)\",east,"
## [20] " ORDER[2],"
## [21] " ANGLEUNIT[\"degree\",0.0174532925199433]],"
## [22] " USAGE["
## [23] " SCOPE[\"Horizontal component of 3D system.\"],"
## [24] " AREA[\"World.\"],"
## [25] " BBOX[-90,-180,90,180]],"
## [26] " ID[\"EPSG\",4326]]"
Notice, among the others, the attributes AREA
(the area of usage for the CRS) and ID
(in this case, the EPSG code). WKT is not very nice for humans; use the extra argument proj = TRUE
to see the PROJ4 format of the CRS.
crs(idiv, proj = TRUE)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
Use project()
to project a spatial object from one CRS to another.
<- project(landcover, crs("EPSG:3035"))
landcover crs(landcover)
## [1] "PROJCRS[\"ETRS89-extended / LAEA Europe\",\n BASEGEOGCRS[\"ETRS89\",\n ENSEMBLE[\"European Terrestrial Reference System 1989 ensemble\",\n MEMBER[\"European Terrestrial Reference Frame 1989\"],\n MEMBER[\"European Terrestrial Reference Frame 1990\"],\n MEMBER[\"European Terrestrial Reference Frame 1991\"],\n MEMBER[\"European Terrestrial Reference Frame 1992\"],\n MEMBER[\"European Terrestrial Reference Frame 1993\"],\n MEMBER[\"European Terrestrial Reference Frame 1994\"],\n MEMBER[\"European Terrestrial Reference Frame 1996\"],\n MEMBER[\"European Terrestrial Reference Frame 1997\"],\n MEMBER[\"European Terrestrial Reference Frame 2000\"],\n MEMBER[\"European Terrestrial Reference Frame 2005\"],\n MEMBER[\"European Terrestrial Reference Frame 2014\"],\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[0.1]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4258]],\n CONVERSION[\"Europe Equal Area 2001\",\n METHOD[\"Lambert Azimuthal Equal Area\",\n ID[\"EPSG\",9820]],\n PARAMETER[\"Latitude of natural origin\",52,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8801]],\n PARAMETER[\"Longitude of natural origin\",10,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8802]],\n PARAMETER[\"False easting\",4321000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8806]],\n PARAMETER[\"False northing\",3210000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8807]]],\n CS[Cartesian,2],\n AXIS[\"northing (Y)\",north,\n ORDER[1],\n LENGTHUNIT[\"metre\",1]],\n AXIS[\"easting (X)\",east,\n ORDER[2],\n LENGTHUNIT[\"metre\",1]],\n USAGE[\n SCOPE[\"Statistical analysis.\"],\n AREA[\"Europe - European Union (EU) countries and candidates. Europe - onshore and offshore: Albania; Andorra; Austria; Belgium; Bosnia and Herzegovina; Bulgaria; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Germany; Gibraltar; Greece; Hungary; Iceland; Ireland; Italy; Kosovo; Latvia; Liechtenstein; Lithuania; Luxembourg; Malta; Monaco; Montenegro; Netherlands; North Macedonia; Norway including Svalbard and Jan Mayen; Poland; Portugal including Madeira and Azores; Romania; San Marino; Serbia; Slovakia; Slovenia; Spain including Canary Islands; Sweden; Switzerland; Türkiye (Turkey); United Kingdom (UK) including Channel Islands and Isle of Man; Vatican City State.\"],\n BBOX[24.6,-35.58,84.73,44.83]],\n ID[\"EPSG\",3035]]"
2.3.3 Resolution
Resolution applies only to rasters, as geometries can be scaled at any level. Use res()
to get the resolution of a raster.
res(landcover)
## [1] 9.855366 9.855366
The first value is the “horizontal” and the second is the “vertical” resolution.Basically, it is the size of the pixel. The units of the output of res()
is the same as the unit of the CRS, in this case \(m\):
crs(landcover, proj = TRUE)
## [1] "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs"
3 Reprojecting
It is quite common that data are obtained from a source that uses a CRS that is not ideal for analyses. It is even more common that data are gathered from multiple sources that do not use the same CRS. In such cases, the spatial data should be reprojected to one common CRS that is ideal for analyses.
Use project()
to project a spatial object from one CRS to another. Three types of arguments can be passed to project()
:
- A spatial object with known CRS.
- A string with the WKT or PROJ4 CRS.
- A code of a known CRS, e.g. from the EPSG standard.
<- rast("data/landcover_4326-2015.tif")
r <- vect("data/germany_bundes.shp")
germany
<- project(germany, crs(r)) # a spatial object
germany <- project(germany, "+proj=moll") # a proj4
germany_mollweide <- project(germany, "EPSG:3035") # a CRS code germany_3035
When using an existing spatial object (<obj>
) as CRS target, remember to useproject(<x>, crs(<obj>))
and notproject(<x>, <obj>)
if you are interested in reprojecting only. See section 5.3 about resampling for details.
CRS have a big influence on analyses and visualization.
plot(buffer(vect(cbind(2e6, 3e6)), 4e6), col = "white", lw = 1e-6)
polys(germany, col = "grey90")
polys(germany_mollweide, col = "grey90")
polys(germany_3035, col = "grey90")
<- vect(
l rbind(
cbind(5e5, 5e5),
cbind(0, 0)
)
)lines(l[1], l[2], arrows = TRUE, length = .1)
<- vect(
l rbind(
cbind(2e6, 2e6),
cbind(0, 0)
)
)text(l[1], labels = "Cannot be seen,\nbut EPSG:4326 is here")
Be sure to use the CRS that is best for your analyses or visualization.
4 Data Conversion
Sometimes is needed to create a raster from a geometry or vice versa.
4.1 Geometries to Rasters
Use rasterize()
to convert geometries to raster. In addition to the geometry to rasterize, you need to pass a raster to function as template, i.e. from which extent, CRS, and resolution are extracted from.
<- rast("data/landcover_4326-2015.tif")
landcover <- vect("data/idiv.shp")
idiv <- project(idiv, crs(landcover)) # CRS needs to be the same
idiv <- rasterize(idiv, landcover, touches = TRUE) idiv_r
If specified, touches = TRUE
assigns a value of 1 to all cells that are touched by the polygon. The extent of this new raster is the same as the landcover one, which is too big to actually see the raterized polygon. Use trim()
to trim the raster to the smallest raster containing all values that are not NA
.
plot(trim(idiv_r))
lines(idiv)
4.2 Rasters to Geometries
Use as.points()
or as.polygons()
to convert rasters to geometries.
<- crop(landcover, buffer(idiv, 1e2)) # restrict the area to something that can be plotted
landcover <- as.points(landcover)
poi <- as.polygons(landcover) pol
<- par(no.readonly = TRUE)
op par(mfrow = c(1, 2))
plot(poi, "category")
lines(idiv)
plot(pol, "category")
lines(idiv)
par(op)
5 Geometry Operations
Commonly operations on geometries are to calculate their length and area, find their centroids, calculate distance, and buffering them.
5.1 Length
Use perim()
to obtain the length of lines or the perimeter of polygons.
perim(idiv)
## [1] 237.7449
|> disagg(segments = TRUE) |> perim() |> sum() # disagg(segments = TRUE) split the polygon into lines idiv
## [1] 237.7449
The output units are in “m”, though you may get more accurate results by first transforming the data to longitude/latitude.
It is best to project the geometry to a longitude/latitude CRS to get more accurate results.
|> project("EPSG:4326") |> perim() # difference of 3 mm idiv
## [1] 237.7449
5.2 Area
Use expanse()
to obtain the area of polygons and rasters (please read the function’s help to see tips on how the calculations work for both).
expanse(idiv)
## [1] 3264.621
The output units can be in “m”, “km”, or “ha”.
It is best to project the geometry to a longitude/latitude CRS to get more accurate results.
|> project("EPSG:4326") |> expanse() # same result, as CRS was equal-area idiv
## [1] 3264.621
5.3 Centroid
The centroid of a geometry is the point defined as the arithmetic mean position of all the points on the surface of the geometry. Use centroids()
to get the centroids of the geometries.
<- centroids(idiv)
centr plot(idiv, col = "blue")
points(centr, bg = "gold", cex = 2, pch = 21)
For concave polygons, the centroid may lay outside of the polygon itself
<- vect(matrix(c(0, 0, 1, 0, 1, 1, 0.9, 0.1, 0, 0), byrow = TRUE, ncol = 2)) |>
p as.lines() |>
as.polygons()
## Warning: [as.polygons] dropped attributes
<- centroids(p)
centr plot(p, col = "blue")
points(centr, bg = "gold", cex = 2, pch = 21)
Specify centroids(inside = TRUE)
to force the “centroid” to be inside the polygon.
<- vect(matrix(c(0, 0, 1, 0, 1, 1, 0.9, 0.1, 0, 0), byrow = TRUE, ncol = 2)) |>
p as.lines() |>
as.polygons()
## Warning: [as.polygons] dropped attributes
<- centroids(p, inside = TRUE)
centr plot(p, col = "blue")
points(centr, bg = "gold", cex = 2, pch = 21)
5.4 Distance
Use distance()
to get the distances between geometries. When passing only one object with multiple geometries, distances will be calculated among each geometry and a (symmetric) matrix is returned.
<- vect(matrix(rnorm(10), ncol = 2), crs = "EPSG:4326") # 5 random points
poi |> distance(unit = "km") |> as.matrix() poi
## 1 2 3 4 5
## 1 0.0000 249.3231 219.8547 144.15574 141.23091
## 2 249.3231 0.0000 333.1207 225.05815 215.19348
## 3 219.8547 333.1207 0.0000 112.29506 122.23561
## 4 144.1557 225.0582 112.2951 0.00000 10.04541
## 5 141.2309 215.1935 122.2356 10.04541 0.00000
When passing a second geometry, distances will be calculated among each geometry of the first object and each geometry of the second object.
distance(
poi,|> project("EPSG:4326") |> centroids(),
idiv unit = "km"
)
## [,1]
## [1,] 5893.045
## [2,] 5649.156
## [3,] 5941.702
## [4,] 5857.176
## [5,] 5848.732
5.5 Buffer
The buffer of a geometry is obtained by extending the geometry perpendicular to the tangent line of its side, i.e., a buffer around all cells/geometries. It is easier to see it than to explain it. Use buffer()
to buffer geometries. The distance unit of the buffer width
parameter generally is meters.
<- buffer(idiv, width = 10) # 10 meters
b plot(b, col = "gold")
polys(idiv, col = "blue")
5.6 Exercise
- Find the centroid of each German state and how far is from the centroid of the main iDiv bulding.
- Find the state which centroid is closest to iDiv.
- Plot the circle centered at the centroid with the smallest area that is touching the centroid of the iDiv building.
The geometry of Germany is saved in data/germany_bundes.shp
. The geometry of the iDiv building is saved in data/idiv.shp
. Click on the “Code” button below to see a solution to this problem.
<- vect("data/germany_bundes.shp")
ger <- vect("data/idiv.shp")
idiv
<- centroids(ger) # centroids of German states
centr <- idiv |> project(crs(ger)) |> centroids() # centroid of iDiv
idiv_centr <- distance(idiv_centr, centr)
d <- which.min(d)
closest_state_id <- centr[closest_state_id] # closest centroid state
closest <- buffer(centr[closest_state_id], d[closest_state_id]) # circle with the smallest area touching iDiv
b <- as.lines(rbind(centr[closest_state_id], idiv_centr)) # line connecting centroids
l
plot(ger)
polys(b, col = adjustcolor("blue", alpha.f = .5))
lines(l)
points(centr, pch = 21, cex = 2, bg = "gold")
points(idiv_centr, pch = 21, cex = 2, bg = "green")
text(centr, centr$NAME_1, pos = 3)
text(idiv_centr, "iDiv", pos = 4)
text(l, paste(round(d[closest_state_id]) / 1e3, "km"), pos = 3)
text(b, paste("Area =", round(expanse(b) / 1e6), "km^2"), pos = 1, offset = 1)
6 Raster Operations
Commonly operations on rasters are to calculate summary statistics, aggregate or disaggregate to different resolutions, resampling, and interpolation.
6.1 Summary Statistics
6.1.1 Local Summary
A local summary is calculated across pixels with the same coordinates i.e. locally. Raster stacks can be summarized using common function, such as mean
, sd
, sum
, etc. The output is a single raster layer with the summary. Mean, sum, minimum, and maximum, can be calculated simply using mean()
, sum()
, etc. For other function, use app()
.
<- rast("data/ndvi-2024.tif")
ndvi <- mean(ndvi, na.rm = TRUE)
avg_ndvi <- app(ndvi, "sd", na.rm = TRUE)
std_ndvi <- app(ndvi, "which.max", na.rm = TRUE) # month with the maximum NDVI
month_max_ndvi plot(avg_ndvi, col = colorRampPalette(c("dodgerblue3", "orange", "darkgreen"))(100))
plot(std_ndvi, col = hcl.colors(100, "Reds", rev = TRUE))
6.1.2 Global Summary
Global summaries are calculated across all pixels of a layer. Use global()
to get global summaries. When the input is a raster stack, the global summary is calculated for each layer separately.
global(ndvi, "mean", na.rm = TRUE)
6.1.3 Zonal Summary
Zonal statistics summarize the values of a raster using categories from another raster of geometry. Use zonal()
to obtain zonal statistics. Two arguments (<x>
, <y>
) are always required for zonal(<x>, <y>)
, with <x>
being the raster with the values to summarize and <y>
the raster/geometry specifying the categories for the summary.
<- rast("data/landcover_4326-2015.tif")
landcover <- project(avg_ndvi, landcover) # this will also resample avg_ndvi (see the section Resampling)
avg_ndvi zonal(avg_ndvi, landcover, fun = "mean", na.rm = TRUE)
6.2 Aggregate and Disaggregate
6.2.1 Aggregate
Aggregating creates a raster that is at coarser resolution compared to the original raster. Use aggregate()
to aggregate rasters. The arguments fact
, number of cells in each direction to be aggregated, and fun
, the function used for aggregatation, need to be specified.
<- rast(matrix(rnorm(16^2), 16, 16))
r <- aggregate(r, 4, "max") # maximum values in the 4x4 grid
r_aggr
par(mfrow = c(1, 2)) # cannot stack rasters with different resolution
plot(r)
plot(r_aggr)
par(mfrow = c(1, 1))
6.2.2 Disaggregate
Disaggregating creates a raster that is at finer resolution compared to the original raster. Use disagg()
to disaggregate rasters. The arguments fact
, number of cells in each direction to create for each cell of the original raster, and method
, the method used for disaggregation, need to be specified. Available methods are near
, for nearest neighbor, or bilinear
, for bilinear interpolation.
<- disagg(r, 4, method = "bilinear")
r_disagg
par(mfrow = c(1, 2)) # cannot stack rasters with different resolution
plot(r)
plot(r_disagg)
par(mfrow = c(1, 1))
Aggregating and disaggregating a raster does not give, in general, the same raster back. Some of the original information is lost during aggregation.
plot(c(r, r |> aggregate(4, "mean") |> disagg(4, "bilinear")))
6.2.3 Exercise
Assess how global average of NDVI is affected by the resolution of the NDVI layers. Aggregate the NDVI layer by a factor seq(2, 20, by = 1)
and see if the global average change. Click on the “Code” button below to see a solution to this problem.
<- rast("data/ndvi-2024.tif") |> mean()
ndvi <- rep(NA, 20)
global_avg_ndvi 1] <- (ndvi |> global("mean", na.rm = TRUE))[1, 1]
global_avg_ndvi[for (f in 2:20) {
<- (ndvi |> aggregate(fact = f) |> global("mean", na.rm = TRUE))[1, 1]
global_avg_ndvi[f]
}<- data.frame(
global_avg_ndvi fact = 1:20,
avg_ndvi = global_avg_ndvi
)scatter.smooth(global_avg_ndvi, ylim = c(0, .5))
The average NDVI is not changing. This is a property of the average. Try instead to see if the standard deviation changes.
<- rast("data/ndvi-2024.tif") |> mean()
ndvi <- rep(NA, 20)
global_std_ndvi 1] <- (ndvi |> global("sd", na.rm = TRUE))[1, 1]
global_std_ndvi[for (f in 2:20) {
<- (ndvi |> aggregate(fact = f) |> global("sd", na.rm = TRUE))[1, 1]
global_std_ndvi[f]
}<- data.frame(
global_std_ndvi fact = 1:20,
std_ndvi = global_std_ndvi
)scatter.smooth(global_std_ndvi, ylim = c(0.1, .3))
The standard deviation decreases for increasing aggregation factor. This is because aggregating is done here (implicitly) using
mean
as the aggregating function. This reduces spatial heterogeneity and, therefore, the global standard deviation. Scale is important for some processes, but not others.
6.3 Resampling
When two rasters do not align, i.e. they have different origin or resolution, to make them comparable one of them must be resampled. Resampling transform one of the two rasters to a raster that align with the other. Use resample()
to resample rasters. The argument method
is used to specify the algorithm used for resampling. Among the available algorithms are:
near
, for nearest neighbor.bilinear
, for bilinear interpolation.cubic
, for cubic interpolation.cubicspline
, for cubic-spline interpolation.lanczos
, for Lanczos resampling.
<- rast("data/landcover_4326-2015.tif")
landcover <- rast("data/ndvi-2024.tif") |>
ndvi mean(na.rm = TRUE) |>
project(crs(landcover))
<- resample(ndvi, landcover, method = "lanczos")
ndvi_resampled
plot(ndvi_resampled, col = colorRampPalette(c("dodgerblue3", "orange", "darkgreen"))(100))
The difference betweenproject(x, crs(y))
andproject(x, y)
is that the first only project the CRS, whereas the second actually resamplex
toy
. In many casesresample(x, y, method = <method>)
can be avoided by usingproject(x, y, method = <method>)
6.4 Raster and vector techniques
6.4.1 Mask
Create a new raster (or vector) where values outside a specified mask are set to NA (or another specified value). Essentially, it “masks out” or hides portions of a raster based on a separate “mask” object, which can be another raster, a polygon, or an extent.
<- rast("data/landcover_4326-2015.tif")
landcover <- vect("data/idiv.shp")
idiv <- mask(landcover, idiv)
landcover_idiv <- trim(landcover_idiv) # remove rows and columns with all NAs
landcover_idiv plot(landcover_idiv)
lines(idiv)
6.4.2 point in raster
Extract the values from a raster or a polygon based on a set of locations (ex.: points or polygons).
<- rast("data/landcover_4326-2015.tif")
landcover <- read.csv2("data/points.csv")
coo <- vect(coo, geom=c("x", "y"))
poi <- extract(landcover, poi)
landcover_ex landcover_ex
6.4.3 exercise
Get the mean, min and max ndvi values for the month of August, using a 100 meters buffer of around the points given.
<- read.csv2("data/points.csv")
coo <- vect(coo, geom=c("x", "y"))
poi <- rast("data/ndvi-2024.tif")
ndvi <- buffer(poi, 100)
poi_b <- extract(ndvi[["august"]], poi_b, fun = mean, na.rm = T)
ndvi.mean ndvi.mean
<- extract(ndvi[["august"]], poi_b, fun = min, na.rm = T)
ndvi.min ndvi.min
<- extract(ndvi[["august"]], poi_b, fun = max, na.rm = T)
ndvi.max ndvi.max
7 Examples
I provide three examples below of some of the most common applications of GIS in Ecology:
- Species distribution modelling (SDM).
- Analysis of telemetry (GPS) data.
- Count points inside polygons, e.g. number of trees in Leipzig districts.
These examples are not reserach-ready, i.e. the analysis can be largely improved. However, they provide a good starting point to start similar projects using GIS.
7.1 Species Distribution Model (SDM)
I will show how to perform a simple species distribution model (SDM) for the species Erica arborea.
I load the libraries.
library(tibble)
library(dplyr)
library(readr)
library(terra)
I load the file downloaded from GBIF and convert it to a geometry (points).
<- read_tsv("data/erica-arborea-gbif.tsv") |>
gbif select("decimalLongitude", "decimalLatitude") |>
vect(
geom = c("decimalLongitude", "decimalLatitude"),
crs = "+proj=longlat +datum=WGS84" # I know this is GBIF CRS
)
## Warning: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 13228 Columns: 50
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (31): datasetKey, occurrenceID, kingdom, phylum, class, order, family, ...
## dbl (13): gbifID, individualCount, decimalLatitude, decimalLongitude, coord...
## lgl (4): infraspecificEpithet, depth, depthAccuracy, typeStatus
## dttm (2): dateIdentified, lastInterpreted
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I load the bioclimatic variables from WorldClim.
<- rast("data/bios.tif")
bios
# I also create a template of the land
<- bios[[1]]
land !is.na(land)] <- 1
land[names(land) <- "land"
I thin the GBIF data retaining only one point per grid cell.
<- extract(land, gbif, cells = TRUE, xy = TRUE) |>
gbif as_tibble() |>
filter(!is.na(land)) |>
group_by(cell) |>
slice_sample(n = 1) |>
ungroup() |>
select("x", "y") |>
vect(geom = c("x", "y"), crs = "+proj=longlat +datum=WGS84")
I sample pseudo-absences randomly in the polygon that contains all GBIF observation, buffered to 100 km.
<- gbif |> convHull()
range <- spatSample(buffer(range, 1e5), length(gbif) * 2)
abs <- extract(land, abs, cells = TRUE, xy = TRUE) |>
abs as_tibble() |>
filter(!is.na(land)) |>
group_by(cell) |>
slice_sample(n = 1) |>
ungroup() |>
select("x", "y") |>
slice_sample(n = length(gbif)) |>
vect(geom = c("x", "y"), crs = "+proj=longlat +datum=WGS84")
I prepare the data frame to be used for fitting the SDM.
$occ <- 1
gbif$occ <- 0
abs<- rbind(gbif, abs)
p <- extract(bios, p, ID = FALSE) |> as_tibble()
d d
plot(land, col = "grey90", legend = FALSE)
points(p, col = ifelse(p$occ == 0, "darkblue", "gold2"))
lines(buffer(range, 1e5))
I run the SDM, in this case a simple GLM. I use only some of the bioclimatic variables2.
<- d[, c("BIO01", "BIO04", "BIO12", "BIO15")]
d $occ <- p$occ
d<- glm(occ ~ ., data = d, family = "binomial")
sdm summary(sdm)
##
## Call:
## glm(formula = occ ~ ., family = "binomial", data = d)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.987248 0.497046 -1.986 0.047 *
## BIO01 0.193248 0.023100 8.366 < 2e-16 ***
## BIO04 -0.004121 0.000493 -8.359 < 2e-16 ***
## BIO12 0.001432 0.000194 7.380 1.58e-13 ***
## BIO15 -0.003442 0.003508 -0.981 0.326
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3410.3 on 2459 degrees of freedom
## Residual deviance: 3113.3 on 2455 degrees of freedom
## AIC: 3123.3
##
## Number of Fisher Scoring iterations: 4
I use the model to predict the suitability of the species in geographic space.
<- predict(bios, sdm, type = "response")
suitability plot(
suitability,col = hcl.colors(100, "Geyser", rev = TRUE),
breaks = seq(0, 1, by = .02),
type = "continuous",
main = "Predicted suitability"
)
Zooming on the Mediterranean regions.
plot(
crop(suitability, ext(-9, 24, 35, 50)),
col = hcl.colors(100, "Geyser", rev = TRUE),
breaks = seq(0, 1, by = .02),
type = "continuous",
main = "Predicted suitability"
)points(p[p$occ == 1, ], cex = .3)
7.3 Density of Planted Trees in Leipzig
I will show how to find the density of planted trees 3 for the districts of the city of Leipzig. The original data was obtained from the Leipzig Bureau for green and water in the city: https://opendata.leipzig.de/dataset/baumkataster-stadt-leipzig1.
library(readr)
library(dplyr)
library(terra)
I load the shapefile of the districts of Leipzig and the planted tree census (baumkataster). The census is a csv file with the column geom
representing the geometries in well-known text (WKT) format.
<- vect("data/ot.shp") # Leipzig districts
districts
<- read_csv(
baumkataster "data/Baumkataster_Stadt_Leipzig.csv",
show_col_types = FALSE
|>
) mutate( # rename the columns
species = ga_lang_wiss,
geom,.keep = "none" # keep only species and geom columns
)
baumkataster
I create the geometry object, specifying the CRS to be the same of the Leipzig districts.
<- vect(baumkataster, geom = "geom", crs = crs(districts))
tree tree
## class : SpatVector
## geometry : points
## dimensions : 185049, 1 (geometries, attributes)
## extent : 308130.4, 328398.8, 5680725, 5701485 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89 / UTM zone 33N (EPSG:25833)
## names : species
## type : <chr>
## values : Quercus robur
## Quercus robur
## Quercus robur
Next, I want to know in which district each tree is located. There are several ways of achieving this, but the fastest is probably using terra::extract(<x>, <y>)
, which returns a data frame with the features of <x>
for the geometries of <y>
. I extract the features of district
for the locations specified by tree
.
<- extract(districts, tree) |>
tab as_tibble() |>
bind_cols(tree |> as_tibble()) |> # add the columns from tree
filter(!is.na(Name)) # remove trees outside Leipzig proper
tab
Then, I count the number of rows (one row = one tree) per district.
<- table(tab$Name)
tally tally
##
## Althen-Kleinpösna Altlindenau
## 985 2491
## Anger-Crottendorf Baalsdorf
## 3266 567
## Böhlitz-Ehrenberg Burghausen-Rückmarsdorf
## 2230 1783
## Connewitz Dölitz-Dösen
## 3666 2405
## Engelsdorf Eutritzsch
## 3238 5950
## Gohlis-Mitte Gohlis-Nord
## 1734 2090
## Gohlis-Süd Großzschocher
## 1324 3715
## Grünau-Mitte Grünau-Nord
## 4329 3606
## Grünau-Ost Grünau-Siedlung
## 5089 862
## Hartmannsdorf-Knautnaundorf Heiterblick
## 803 2330
## Holzhausen Kleinzschocher
## 2386 4922
## Knautkleeberg-Knauthain Lausen-Grünau
## 1616 4122
## Leutzsch Liebertwolkwitz
## 2630 1965
## Lindenau Lindenthal
## 2777 2433
## Lößnig Lützschena-Stahmeln
## 3413 3673
## Marienbrunn Meusdorf
## 1676 1280
## Miltitz Mockau-Nord
## 1288 2193
## Mockau-Süd Möckern
## 815 2299
## Mölkau Neulindenau
## 1383 1758
## Neustadt-Neuschönefeld Paunsdorf
## 2188 4730
## Plagwitz Plaußig-Portitz
## 1962 2092
## Probstheida Reudnitz-Thonberg
## 14567 4085
## Schleußig Schönau
## 2608 3879
## Schönefeld-Abtnaundorf Schönefeld-Ost
## 6055 3529
## Seehausen Sellerhausen-Stünz
## 4538 1633
## Stötteritz Südvorstadt
## 4438 2470
## Thekla Volkmarsdorf
## 2704 2180
## Wahren Wiederitzsch
## 2592 3348
## Zentrum Zentrum-Nord
## 1363 1501
## Zentrum-Nordwest Zentrum-Ost
## 3357 933
## Zentrum-Süd Zentrum-Südost
## 4031 4975
## Zentrum-West
## 3864
I add the value to district
so I have everything in one object.
$n_trees <- tally[districts$Name] districts
Finally, I calculate the area of each polygon using expanse()
and the tree density.
$area <- expanse(districts, "km")
districts$tree_density <- log10(districts$n_trees / districts$area) districts
tree_density
is \(log_{10}-\)scaled to make visualization easier. I plot everything, adding also the Leipzig riparian forest (auwald
) to the map.
<- vect("data/auwald.shp") |> project(crs(districts))
auwald
plot(districts, "tree_density", col = hcl.colors(100, "Fall", rev = TRUE), type = "continuous")
lines(auwald, col = "grey20", lw = 5)
# I show the names of the districts with highest tree density
text(
$tree_density > quantile(districts$tree_density, .9)],
districts[districts$Name[districts$tree_density > quantile(districts$tree_density, .9)],
districtscol = "grey20",
halo = TRUE,
hw = .15,
cex = 0.7
)text(auwald, "Auwald", adj = c(0, -3.7), halo = TRUE, hw = .3, cex = 1.8)
ESRI stands for Environmental Systems Research Institute, Inc., which is the company that developed ArcGIS and created a code standard for projections. The other commonly used standard is maintained by the European Petroleum Survey Group (EPSG).↩︎
This is just an example, and you should do better than this.↩︎
Or better, of planted trees that have been recorded.↩︎