Navigating RStudio
This lesson is partially derived from Software Carpentry teaching materials available under the CC BY 4.0 license:
http://swcarpentry.github.io/r-novice-gapminder/
1. Introduction to RStudio
R is a programming language for handling, analysing and visualizing data. It is freely downloadable and works on all major platforms, including Windows, macOS and Linux. At its core, R code is made of text-based commands that can be run from a command prompt and without a graphical user interface. R can also be run via an open-source interface such as RStudio (see rstudio.com). RStudio provides a built in-editor and offers many advantages including ease of use, integration with version control and project management.
During this workshop we will be using a separate RStudio teaching environment, which lets us access a remote instance of R via a web browser.
2. Accessing the RStudio teaching environment on notebooks.csc.fi
To launch your own instance of R:
- Go to noppe.csc.fi and log in using the Haka or Virtu authentication service or your CSC account.
- In the top right corner, click “Join workspace” and enter the code provided by the instructors.
- You should see a workspace with an RStudio environment called “RStudio / Data analysis with R, Oct 2024”. Later you can find it under “My workspaces”.
- Click on the “Start session” button on the right to start RStudio.
Note: each RStudio session in this workspace lasts for 12 hours, after which it is automatically erased. We will cover ways to save your code and data for later work during this workshop!
3. Basic layout
When you first open RStudio, you will be greeted by three panels:
- The interactive R console (entire left)
- Environment / History / Connections / Tutorial (tabbed in upper right)
- Files / Plots / Packages / Help / Viewer (tabbed in lower right)
Your screen will look similar to this:
Once you open files, such as R scripts, an editor panel will also open in the top left.
4. Executing code in RStudio
There are two main ways one can work within RStudio:
- Test and play within the interactive R console
- This works well when doing small tests
- Quickly becomes laborious
- Write using the script editor
- Makes it possible to save your code for later (as an .R file)
- Far easier to keep track of long segments of code
RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can:
- Click on the
Run
button above the editor panel, or - Select
Run Selected Lines
from theCode
menu, or - Hit
Ctrl + Return
in Windows or Linux or⌘ + Return
on OS X.
If you have modified a block of code and wish to re-run it, you can
either highlight it and run the code again (using one of the options
above) or use the button Re-run the previous code region
.
This will run the previous code block including the changes you have
made.
5. Projects in RStudio
Many data analysis projects might be organised like this:
There are many reasons why we should always avoid this:
- It is hard to tell which version of your data is the original
- It gets really messy because it mixes files with various extensions together
- It probably takes a lot of time to actually find things, and relate the correct figures to the exact code that has been used to generate it
A good project layout will ultimately make your life easier:
- It will help ensure the integrity of your data
- It makes it simpler to share your code with someone else (a lab mate, collaborator, or supervisor)
- It allows you to easily upload your code with your manuscript submission
- It makes it easier to pick the project back up after a break
6. Hands-on: creating a self-contained RStudio project
We’re going to create a new project in RStudio:
- Click the “File” menu button, then “New Project”
- Click “New Directory”
- Click “New Project”
- Type in the name of the directory to store your project,
e.g. “RIntro”. Select the location to store the project directory with
“Browse” –> “Choose”. Choose the directory
my-work
. Note that R is case-sensitive. - Click the “Create Project” button
Now when we start R in this project directory, or open this project with RStudio, all of our work on this project will be entirely self-contained in this directory.
Switching between projects can be done from the “File” menu (“Open Project”).
7. Best practices for project organisation
Although there is no single best way to lay out a project, there are general principles to adhere to that will make project management easier:
- Treat data as read-only
- This is probably the most important goal of setting up a project. Data are typically time-consuming and / or expensive to collect. Working with them interactively where they can be modified means you are never sure of where the data came from, or how they have been modified.
- Separate raw and tidied data
- In many cases your data will be “messy”: it will need significant preprocessing to get into a format R (or any other programming language) will find useful. Storing these files in a separate folder, and creating a second “read-only” data folder to hold the “cleaned” data sets can prevent confusion between the two sets.
- Treat the generated output as disposable
- Anything generated by your scripts should be treated as disposable: it should all be possible to regenerate from your scripts.
Some general recommendations for organising a project:
- Keep all project files under the directory that was automatically created when you named the project.
- Put text documents associated with the project in a directory called
doc
. - Put raw data and metadata in a folder called
data
, and files generated during cleanup and analysis in a directory calledresults
. - Name all files to reflect their content or function.
We will cover topics including data import and export later during this course!