Data visualization - independent exercises
1. Data wrangling for plotting
We will use the same data of barn swallow observations from the FinBIF database that we worked with in the Data Manipulation independent exercises. We have to do some additional data wrangling to make the data suitable for plotting.
If you are working to get the additional 0.5 ECTS, start from step 2 where you can load the data set ready for plotting.
If you have downloaded your own data or want to start from the original data, you can use these steps to produce the same modified data set that is loaded below.
swallows_data <- readRDS("/home/rstudio/shared/swallows_data.rds")
swallows <- swallows_data
swallows$month <- format(as.Date(swallows$date_time), '%b')
swallows <- swallows |> filter(!month == "Jan")
counts_for_plotting <- swallows |>
filter(!is.na(bio_province)) |>
summarise(count = n(),
.by = c(month, bio_province))2. Exercises
2.1. Start here if you are working on the
exercises for the extra 0.5 credits. First, let’s make sure
tidyverse is loaded and load the data formatted for
plotting into R. The dataset is stored in the shared folder
of our RStudio workspace in Noppe.
If you get stuck, check the lecture notes and exercises for data visualization.
Check the structure of the data. The beginning of the output should look something like the example below (although the exact appearance might differ, depending on when the data was downloaded).
'data.frame': 120 obs. of 3 variables:
$ month : chr "Apr" "Apr" "Apr" "May" ...
$ bio_province: chr "Satakunta" "South Häme" "Varsinais-Suomi" "Uusimaa" ...
$ count : int 3 10 7 140 65 59 94 112 12 96 ...2.2 Make a bar plot of the observation counts by
month using counts_for_plotting. Hints: use
geom_col(). Check counts_for_plotting for
example by clicking it in the Environment panel to see how the column
with counts is called. Month will be on the x axis, and counts on the y
axis.
Ok, that looks weird because the months are in the alphabetical
order. Let’s re-order them by turning month into a factor
and defining the order:
counts_for_plotting$month <- factor(counts_for_plotting$month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))Run the previous plot again. The months should now be in the correct order.
2.3 Next, let’s look at the monthly observations by
region and make a line plot that shows month on the x axis, counts on
the y axis, and the lines are coloured by region
(bio_province). Important: set both
color = bio_province and group = bio_province
in side the aes() function:
aes(x = ... , y = ... , colour = bio_province, group = bio_province).
Additional hint: use geom_line().
2.4. Not looking too bad, but let’s make the plot look nicer by labelling the y axis ‘observations’ and giving the plot a title ‘Barn swallows by region’
Second task for the extra 0.5 ECTS: save the plot created in section 2.4 and return it as your answer for the plotting section.
For help on saving plots, see the data visualization lecture notes and exercises.
To export the plot file from RStudio: on the lower right panel of RStudio, choose the Files tab. Select the file to be exported. Click on More (next to a blue wheel) and choose Export. This will save the file on your own computer.
If you have problems with these steps, take a screenshot of the plot.
3. Saving and exporting your script
If you don’t remember how to save and export your data from RStudio, follow the instructions set out in the Starting with Data exercise sheet (steps 7.3 and 7.4).