Data visualization - independent exercises

1. Data wrangling for plotting

We will use the same data of barn swallow observations from the FinBIF database that we worked with in the Data Manipulation independent exercises. We have to do some additional data wrangling to make the data suitable for plotting.

If you are working to get the additional 0.5 ECTS, start from step 2 where you can load the data set ready for plotting.

If you have downloaded your own data or want to start from the original data, you can use these steps to produce the same modified data set that is loaded below.

swallows_data <- readRDS("/home/rstudio/shared/swallows_data.rds")

swallows <- swallows_data

swallows$month <- format(as.Date(swallows$date_time), '%b') 

swallows <- swallows |> filter(!month == "Jan")

counts_for_plotting <- swallows |> 
  filter(!is.na(bio_province)) |> 
  summarise(count = n(), 
            .by = c(month, bio_province))

2. Exercises

2.1. Start here if you are working on the exercises for the extra 0.5 credits. First, let’s make sure tidyverse is loaded and load the data formatted for plotting into R. The dataset is stored in the shared folder of our RStudio workspace in Noppe.

If you get stuck, check the lecture notes and exercises for data visualization.

library(tidyverse)

counts_for_plotting <- readRDS("/home/rstudio/shared/counts_for_plotting.rds")

Check the structure of the data. The beginning of the output should look something like the example below (although the exact appearance might differ, depending on when the data was downloaded).

# Write the answer in your R script.
'data.frame':   120 obs. of  3 variables:
 $ month       : chr  "Apr" "Apr" "Apr" "May" ...
 $ bio_province: chr  "Satakunta" "South Häme" "Varsinais-Suomi" "Uusimaa" ...
 $ count       : int  3 10 7 140 65 59 94 112 12 96 ...

2.2 Make a bar plot of the observation counts by month using counts_for_plotting. Hints: use geom_col(). Check counts_for_plotting for example by clicking it in the Environment panel to see how the column with counts is called. Month will be on the x axis, and counts on the y axis.

# Write the answer in your R script.

Ok, that looks weird because the months are in the alphabetical order. Let’s re-order them by turning month into a factor and defining the order:

counts_for_plotting$month <- factor(counts_for_plotting$month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

Run the previous plot again. The months should now be in the correct order.

2.3 Next, let’s look at the monthly observations by region and make a line plot that shows month on the x axis, counts on the y axis, and the lines are coloured by region (bio_province). Important: set both color = bio_province and group = bio_province in side the aes() function:
aes(x = ... , y = ... , colour = bio_province, group = bio_province). Additional hint: use geom_line().

# Write the answer in your R script.

2.4. Not looking too bad, but let’s make the plot look nicer by labelling the y axis ‘observations’ and giving the plot a title ‘Barn swallows by region’

# Write the answer in your R script.

Second task for the extra 0.5 ECTS: save the plot created in section 2.4 and return it as your answer for the plotting section.

For help on saving plots, see the data visualization lecture notes and exercises.

To export the plot file from RStudio: on the lower right panel of RStudio, choose the Files tab. Select the file to be exported. Click on More (next to a blue wheel) and choose Export. This will save the file on your own computer.

If you have problems with these steps, take a screenshot of the plot.

3. Saving and exporting your script

If you don’t remember how to save and export your data from RStudio, follow the instructions set out in the Starting with Data exercise sheet (steps 7.3 and 7.4).