Learning Objectives
After this lesson, you should be able to:
- Describe tools and approaches to creating research objects
- Describe best practices for reproducible research
- Understand benefits of establishing version control
Research Objects¶
Broadly, Research Objects (RO) are defined as a method for identification, aggregation, and exchange of scholarly information.
Turing Way Guide to Reproducible Research
Governance Documents¶
Definition
Project Governance is the set of rules, procedures and policies that determine how projects are managed and overseen.
"The set of policies, regulations, functions, processes, and procedures and responsibilities that define the establishment, management and control of projects, programmes or portfolios." - APM (2012), open.edu
No matter how small, i.e., even single person-run projects, a good Project Governance structure can help keep work on track and headed toward a timely finish.
Establishing a project governance document at the onset of a project is a good way of setting boundaries, roles and responsibilities, pre-registration about what deliverables are expected, and what the consequences will be for breaking trust.
Example Governance Documents
Documentation¶
This website is rendered using GitHub Pages using MkDocs and the Material theme for MkDocs.
Other popular website generators for GitHub Pages are Jekyll Theme or Bootstrap.js.
ReadTheDocs.org has become a popular tool for developing web-based documentation. Think of RTD as "Continuous Documentation".
Bookdown is an open-source R package that facilitates writing books and long-form articles/reports with R Markdown.
Quarto is an open-source scientific and technical publishing system built on Pandoc
Confluence Wikis (CyVerse) are another tool for documenting your workflow.
Things to remember about Documentation
-
Documentation should be written in such a way that people who did not write the documentation can read and then use or read and then teach others in the applications of the material.
-
Documentation is best treated as a living document, but version control is necessary to maintain it
-
Technology changes over time, expect to refresh documentation every 3-5 years as your projects age and progress.
GitHub Pages
- You can pull templates from other GitHub users for your website, e.g. Jekyll themes
- GitHub pages are free, fast, and easy to build, but limited in use of subdomain or URLs.
ReadTheDocs
- publishing websites via ReadTheDocs.com costs money.
- You can work in an offline state, where you develop the materials and publish them to your localhost using Sphinx
- You can work on a website template in a GitHub repository, and pushes are updated in near real time using ReadTheDocs.com.
Material MkDocs
- publish via GitHub Actions
- Uses open source Material or ReadTheDocs Themes
Bookdown
- Bookdown websites can be hosted by RStudio Connect
- You can publish a Bookdown website using Github Pages
Quarto
- Build a website using Quarto's template builder
- Build with Github Pages
JupyterBook
- Based on Project Jupyter
ipynb
and MarkDown - Uses
conda
package management
GitBook
- GitBook websites use MarkDown syntax
- Free for open source projects, paid plans are available
GitHub¶
Containers¶
Orchestration¶
Digital Object Identifier¶
Hands-On¶
-
Log into GitHub
-
Import
Self-Assessment¶
Research Objects must include all components of research: governance document, manuals, documentation, research papers, analysis code, data, software containers
Answer
While a Research Object may include the entire kitchen sink from a research project, it does NOT always contain all of these things.
Fundamentally, a RO should contain enough information and detail to reproduce a scientific study from its linked or self-contained parts.
Components like large datasets may not be a part of the RO, but the code or analysis scripts should have the ability to connect to or stream those data.