Data visualization - independent exercises
1. Loading libraries and the saved data
Just like we did in the extra exercises for Data Manipulation, load
the necessary libraries by copy-pasting the following lines into an R
script and running them in RStudio. If the package finbif
is not installed, install it first (note that the Notebooks course
environment has unusual behaviour with package installations and the
installations by users are not permanent).
# installing the ´finbif´ package (if already installed, skip this line)
install.packages("finbif")
#loading the packages ´finbif´ and ´tidyverse´
library(finbif)
library(tidyverse)
We will use the same data of barn swallow observations from the FinBIF database that we worked with in the Data Manipulation extra exercises and saved as an .rds file in the end.
Getting the data was covered in the Data manipulation extra
exercises. If you don’t have the data file
(swallows_data.rds
), please check there how to obtain the
data.
Now we use the command readRDS
to read in the saved
data. Because the data are saved as an R object in an .rds file, we
don’t need an additional import command here.
<- readRDS("/home/rstudio/my-work/swallows_data.rds") swallows_data
2. Data wrangling
First, we will do some data wrangling to format the data for
ggplot2
. Go through each step carefully to form an
understanding of what is happening as you are running through the code.
Make sure to copy and run each line!
2.1 Let’s make a copy of the data and format them so
that they are suitable for plotting with ggplot2
. First, we
will repeat the steps from yesterday to simplify the date format, remove
double entries in bio_province
and remove unlikely
observations in winter. Copy-paste these lines in your R script and run
them as they are.
# making a copy of the data
<- swallows_data
swallowplot
# setting format of the column ´date_time´ to year-month-day
$date_time = as.Date(format(swallowplot$date_time, "%Y-%m-%d"))
swallowplot
# removing double entries in the column ´bio_province´ by replacing text with the function gsub
$bio_province <- gsub(",.*", "", swallowplot$bio_province) swallowplot
2.2 Because we want to use the month in the column
date_time
for plotting, let’s do some additional formatting
of this column. What we do here is not particularly elegant and there
are specialized packages for dealing with dates in R but, well, it
works. In general there are always many different ways to do any data
wrangling step in R. Copy-paste these lines to your R script and run
them as they are.
# removing the year and the date from the column ´date_time´ with the function ´gsub´ to leave only the month
$date_time <- gsub("2023-", "", swallowplot$date_time)
swallowplot$date_time <- gsub("-.*", "", swallowplot$date_time)
swallowplot
# renaming the column ´date_time´ to ´month´, because the month is all we have left
<- rename(swallowplot, month = date_time)
swallowplot
# converting the column ´month´ from characters to numbers
$month <- as.numeric(swallowplot$month) swallowplot
2.3 Check the structure of the data. The beginning of the output should look something like the example below (although the exact appearance might differ, depending on when you downloaded the data).
str(swallowplot)
'data.frame': 5000 obs. of 13 variables:
Classes ‘finbif_occ’ and $ record_id : chr "http://tun.fi/JX.1644117#15" "http://tun.fi/HR.3211/186711511-U" "http://tun.fi/JX.1642538#25" "http://tun.fi/JX.1642818#3" ...
$ scientific_name : chr "Hirundo rustica Linnaeus, 1758" "Hirundo rustica Linnaeus, 1758" "Hirundo rustica Linnaeus, 1758" "Hirundo rustica Linnaeus, 1758" ...
$ abundance : int NA NA 5 NA 6 1 NA NA 1 2 ...
$ lat_wgs84 : num 60.4 60.2 60.4 61 60.4 ...
$ lon_wgs84 : num 23.1 24.9 22.2 21.4 22.2 ...
$ month : num 10 10 10 9 9 9 9 9 9 9 ...
$ coordinates_uncertainty: num 10000 1845 1 1000 1 ...
$ any_issues : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ requires_verification : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ requires_identification: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ record_reliability : chr "UNDEFINED" "RELIABLE" "UNDEFINED" "UNDEFINED" ...
$ record_quality : chr "Unassessed" "Community verified" "Unassessed" "Unassessed" ...
$ bio_province : chr "Varsinais-Suomi" "Uusimaa" "Varsinais-Suomi" "Varsinais-Suomi" ...
2.4 Next, let’s remove observations with missing
values for region (NA in the column bio_province
) and count
observations by region and by month.
<- swallowplot |>
swallowplot_counts filter(!is.na(bio_province)) |>
summarise(count = n(),
.by = c(month, bio_province))
Ok! Now we’re ready to create some plots.
3. Exercises
3.1 Make a bar plot of the observation counts by
region using swallowplot_counts
. Hints: use
geom_col()
. Check swallowplot_counts
to see
how the column with observations is called.
# Write the answer in your R script.
3.2 What about a similar bar plot of observations but by month?
# Write the answer in your R script.
3.3 Next, let’s combine the above and make a line
plot that shows month on the x axis, counts on the y axis, and the lines
are coloured by region (bio_province
). Hint: use
geom_line()
.
# Write the answer in your R script.
3.4. Not looking too bad, but let’s make the plot look nicer by labelling the y axis ‘observations’ and giving the plot a title ‘Barn swallows by region’
# Write the answer in your R script
4. Saving and exporting your script
If you don’t remember how to save and export your data from RStudio, follow the instructions set out in the Starting with Data exercise sheet (steps 7.3 and 7.4).
That’s it in terms of exercises! Depending on the course schedule, there may be a chance to go through the solutions during the beginning of Day 3. However, if you’re attending a two-day course and feel unsure about how to solve some of the problems above, you can have a look at the completed exercise sheets on eLearn.
Thanks for joining the course (so far!) and we hope you’ve found the contents useful!