3 Projects
3.1 Project structure
Some example structures:
project
└───data/
└───derived/
└───raw/
└───R/
└───script/
└───graphics/
└───README.md
or
project/
└───README.md
└───input/
└───output/
└───R/
└───graphics/
Directories briefly explained:
-
data/holdsderived-dataandraw-data -
R/holds functions (reusuable chunks of code used throughout project) -
scripts/holds scripts (series of analysis steps usually including reading input and writing output/figures) -
graphics/orfigures/holds output figures and plots -
manuscript/orpaper/for storing the manuscript -
man/holds the documentation of your functions, usually automatically generated with {roxygen2}
We’ll use the SocCaribou project as an example throughout the following sections.
3.1.1 README
A README.md file should always be included and stores plenty of information about the project. For example:
- installation instructions
- TODOs
- description or purpose of the project
- authors
- news
- known bugs or limitations
- instructions for contributing
- links to license, Zenodo, etc.
READMEs can change throughout the course of a project’s lifecycle. Through the development of a project, it might have more TODOs, notes about project and be generally messier. As the project becomes more stable or mature, is meant to be more public-facing, or to accompany a publication, it should be organized more as a landing page for new users/readers.
For example, the SocCaribou README:
## Space-use and social organization in a gregarious ungulate: testing the conspecific attraction and resource dispersion
[](https://zenodo.org/badge/latestdoi/173167283)
- Authors:
- Mélissa Peignier
- [Quinn M.R. Webber](https://qwebber.weebly.com/)
- [Erin Koen](https://sites.google.com/site/erinlkoen/)
- [Michel P. Laforge](https://mammalspatialecology.weebly.com/)
- [Alec L. Robitaille](http://robitalec.ca)
- [Eric Vander Wal](http://weel.gitlab.io)
This repository contains the code accompanying the paper “Space-use and
social organization in a gregarious ungulate: testing the conspecific
attraction and resource dispersion”. Scripts are under `scripts/` and
reused functions are in `R/`. This project uses standard R package
structure and can therefore be installed with `devtools`. This also
helps declare external package dependencies required for the analysis.
Please note that while functions are included here, they are not tested
for use in other projects and may not be suitable (at least not in their
current version).
## Abstract
Animals use a variety of proximate cues to assess habitat quality when resources
vary spatiotemporally. Two non-mutually exclusive strategies to assess habitat
quality involve either direct assessment of landscape features or observation of
social cues from conspecifics as a form of information transfer about forage
resources. [...]Other examples:
More on README’s:
3.1.2 RStudio projects
Working in RStudio Projects helps us avoid setwd() hell. If you share an
RStudio Project (.RProj) with someone else, they can immediately use it
without changing working directories or paths to files.
Details: Efficient R / Project Management and Project oriented workflow
3.1.3 Code
3.1.3.1 Functions
Functional programming in R:
- https://adv-r.hadley.nz/fp.html
- https://r4ds.had.co.nz/functions.html
- https://books.ropensci.org/targets/functions.html
- https://robitalec.github.io/reproducible-workflows-workshop/exercises/functions/01-functions-introduction.html
Function naming:
3.1.4 Workflows
This section of the {targets} manual has a great overview of two approaches to managing interconnected analysis steps and data:
- script-based workflows
- function-oriented workflows
3.1.4.1 Function-oriented workflows
Read the link above, then see the section on {targets}.
3.1.4.2 Script-based workflows
R scripts can be organized into scripts/ and R/. The scripts/ folder
holds analysis scripts, often (ideally) numbered in the order they should
be run in. For example, in the SocCaribou project there’s 1-Data-Cleaning.R
then 2-HomeRangeAnalysis.R. See Jenny Bryan’s
project oriented workflow. The R/ folder is used for functions. Functions help us chunk out
larger project into manageable pieces of code.
It doesn’t have to be one or the other - if you have a discrete chunk of code
that needs to be applied multiple times, or might be useful in another project,
etc, just write it as a function and drop it into your R/ folder. In addition,
the R/ folder is recognized if you turn your project into a package or a
research compendium as the standard place and structure for functions.
See the SocCaribou project
that has both an R/ folder and a scripts/ folder.
SocCaribou
├── R
│ ├── dynamic_network.R
│ ├── hr_network.R
│ └── step_length.R
├── scripts
│ ├── 1-DataCleaning.R
│ ├── 2-HomeRangeAnalysis.R
│ ├── 3-SiteFidelityAnalysis.R
│ ├── 4-SocialNetworkAnalysis.R
│ ├── 5-Randomizations.R
│ ├── 6-MergeTidyFiles.R
│ └── 7-DataAnalysis.R
...
3.1.5 Data
Always start with raw data. Keep raw data raw in a raw-data/ or input/ folder.
Then use scripts to prepare and organize your data - never modify it by hand. This way, you can go always go back to the raw data in case anything changes in your preparation steps.
Output data can be sorted into output/ folders and
saved as an .Rds files using the base R functions:
saveRDS() and readRDS(). More details in
Efficient R / Binary file formats. Or if you use data.table, use fread and fwrite to read/write CSV files. If your dates are stored in the ISO standard format, they will be automatically converted to IDate by fread.
3.1.6 Metadata
Keep track of where the raw data comes from, how it was generated, by who, when,
etc using metadata. See the metadata
project and the metadata {targets} workflow. Use this project to write metadata for your data. Talk to Alec about
using it and contributing your own metadata to it!
3.1.7 License
For help choosing an open source license for code, see https://choosealicense.com/

Note, you can change an open source license for an existing project but typically you need to ask permission from everyone else that has contributed. (Not legal advice) but more here and here to start.
For data, different license types apply. See https://choosealicense.com/non-software/.
If you are publishing both code and data, include both licenses alongside the relevant objects. For example:
Project
| LICENSE.md (eg. GPL-3)
├── R
│ ├── dynamic_network.R
│ ├── hr_network.R
│ └── step_length.R
├── data
│ ├── LICENSE.md (eg. CC0)
│ ├── movement.csv
│ ├── habitat.csv
│ └── insect.csv
...
3.1.8 News
Keeping track of changes in plain English can be helpful for your current and future self, as well as collaborators. Try to use dates, describe your progress and reasoning, and link to any issues or pull requests.
Some recommendations and advice:
- for R packages: https://r-pkgs.org/other-markdown.html#sec-news
- Keep a changelog: https://keepachangelog.com/en/1.1.0/
3.3 Resources
- Efficient Setup and Efficient Input/Output in Efficient R
- {targets} manual
- Jenny Bryan’s What They Forgot