# Learning R and Bioconductor {#learning-r-and-more} In this chapter, we outline various resources for learning R and Bioconductor. We provide a brief set of instructions for installing R on your own machine, and then cover how to get help for functions, packages, and Bioconductor-specific resources for learning more. ## The Benefits of R and Bioconductor [R](http://www.r-project.org/) is a high-level programming language that was initially designed for statistical applications. While there is much to be said [about R](https://www.r-project.org/about.html) as a programming language, one of the key advantages of using R is that it is highly extensible through _packages_. Packages are collections of functions, data, and documentation that extend the capabilities of base R. The ease of development and distribution of packages for R has made it a rich environment for many fields of study and application. One of the primary ways in which packages are distributed is through centralized repositories. The first R repository a user typically runs into is the [Comprehensive R Archive Network](https://cran.r-project.org/mirrors.html) (CRAN), which hosts over 13,000 packages to date, and is home to many of the most popular R packages. Similar to CRAN, [Bioconductor](https://bioconductor.org) is a repository of R packages as well. However, whereas CRAN is a general purpose repository, Bioconductor focuses on software tailored for genomic analysis. Furthermore, Bioconductor has stricter requirements for a package to be accepted into the repository. Of particular interest to us is the inclusion of [high quality documentation](#learning-more) and the use of common [data infrastructure](#data-infrastructure) to promote package interoperability. In order to use these packages from CRAN and Bioconductor, and start programming with R to follow along in these workflows, some knowledge of R is helpful. Here we outline resources to guide you through learning the basics. ## Learning R Online {#getting-started-with-r} To learn more about programming with R, we highly recommend checking out online courses offered by groups such as [Codecademy](https://www.codecademy.com), specifically the [Learn R series](https://www.codecademy.com/learn/learn-r). Codecademy is completely web-based, with a code editor/console that promotes an interactive learning experience. This makes it easy to get started without worrying about installing any software. Beyond just Codecademy, a foundational textbook resource for learning R is the [R for Data Science](https://r4ds.had.co.nz/) book. This book illustrates R programming through the exploration of various data science concepts - transformation, visualization, exploration, and more. While it primarily focuses on the [tidyverse](https://www.tidyverse.org/) ecosystem of packages, the concepts are translatable to any programming style. ## Running R Locally While learning R through online resources is a great way to start with R, as it requires minimal knowledge to start up, at some point, it will be desirable to have a local installation - on your own hardware - of R. This will allow you to install and maintain your own software and code, and furthermore allow you to create a personalized workspace. ### Installing R Prior to getting started with this book, some prior programming experience with R is helpful. Check out the [_Learning R and More_](#learning-r-and-more) chapter for a list of resources to get started with R and other useful tools for bioinformatic analysis. To follow along with the analysis workflows in this book on your personal computer, it is first necessary to install the [R](http://www.r-project.org/) programming language. Additionally, we recommend a graphical user interface such as [RStudio](http://www.rstudio.com/download) for programming in R and visualization. RStudio features many helpful tools, such as code completion and an interactive data viewer to name but two. For more details, please see the online book [_R for Data Science_ prerequisites](https://r4ds.had.co.nz/introduction.html#prerequisites) section for more information about installing R and using RStudio. #### For MacOS/Linux Users A special note for MacOS/Linux users: we highly recommend using a package manager to manage your R installation. This differs across different Linux distributions, but for MacOS we highly recommend the [Homebrew](https://brew.sh/) package manager. Follow the website directions to install homebrew, and install R via the commandline with `brew install R`, and it will automatically configure your installation for you. Upgrading to new R versions can be done by running `brew upgrade`. ### Installing R & Bioconductor Packages After installing R, the next step is to install R packages. In the R console, you can install packages from CRAN via the `install.packages()` function. In order to [install Bioconductor packages](https://www.bioconductor.org/install/), we will first need the _BiocManager_ package which is hosted on CRAN. This can be done by running: ```r install.packages("BiocManager") ``` The _BiocManager_ package makes it easy to install packages from the Bioconductor repository. For example, to install the [_SingleCellExperiment_](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) package, we run: ```r ## the command below is a one-line shortcut for: ## library(BiocManager) ## install("SingleCellExperiment") BiocManager::install("SingleCellExperiment") ``` Throughout the book, we can load packages via the `library()` function, which by convention usually comes at the top of scripts to alert readers as to what packages are required. For example, to load the _SingleCellExperiment_ package, we run: ```r library(SingleCellExperiment) ``` Many packages will be referenced throughout the book within the workflows, and similar to the above, can be installed using the `BiocManager::install()` function. ## Getting Help In (and Out) of R One of the most helpful parts of R is being able to get help _inside_ of R. For example, to get the manual associated with a function, class, dataset, or package, you can prepend the code of interest with a `?` to retrieve the relevant help page. For example, to get information about the `data.frame()` function, the _SingleCellExperiment_ class, the in-built _iris_ dataset, or for the _BiocManager_ package, you can type: ```r ?data.frame ?SingleCellExperiment ?iris ?BiocManager ``` Beyond the R console, there are myriad online resources to get help. The R for Data Science book has a great section dedicated to looking for help [outside of R](https://r4ds.had.co.nz/introduction.html#getting-help-and-learning-more). In particular, [Stackoverflow's R tag](https://stackoverflow.com/questions/tagged/r) is a helpful resource for asking and exploring general R programming questions. ## Bioconductor Help {#bioconductor-documentation} One of the key tenets of Bioconductor software that makes it stand out from CRAN is the required documentation of packages and workflows. In addition, Bioconductor hosts a Bioconductor-specific support site that has grown into a valuable resource of its own, thanks to the work of dedicated volunteers. ### Bioconductor Packages Each package hosted on Bioconductor has a dedicated page with various resources. For an example, looking at the [`scater`](https://bioconductor.org/packages/release/bioc/html/scater.html) package page on Bioconductor, we see that it contains: * a brief description of the package at the top, in addition to the authors, maintainer, and an associated citation * installation instructions that can be cut and paste into your R console * documentation - vignettes, reference manual, news Here, the most important information comes from the documentation section. Every package in Bioconductor is _required_ to be submitted with a _vignette_ - a document showcasing basic functionality of the package. Typically, these vignettes have a descriptive title that summarizes the main objective of the vignette. These vignettes are a great resource for learning how to operate the essential functionality of the package. The _reference manual_ contains a comprehensive listing of all the functions available in the package. This is a compilation of each function's _manual_, aka help pages, which can be accessed programmatically in the R console via `?`. Finally, the _NEWS_ file contains notes from the authors which highlight changes across different versions of the package. This is a great way of tracking changes, especially functions that are added, removed, or deprecated, in order to keep your scripts current with new versions of dependent packages. Below this, the _Details_ section covers finer nuances of the package, mostly relating to its relationship to other packages: * upstream dependencies (_Depends_, _Imports_, _Suggests_ fields): packages that are imported upon loading the given package * downstream dependencies (_Depends On Me_, _Imports Me_, _Suggests Me_): packages that import the given package when loaded For example, we can see that an entry called [_simpleSingle_](https://bioconductor.org/packages/release/workflows/html/simpleSingleCell.html) in the _Depends On Me_ field on the `scater` page takes us to a step-by-step workflow for low-level analysis of single-cell RNA-seq data. One additional _Details_ entry, the _biocViews_, is helpful for looking at how the authors annotate their package. For example, for the `scater` package, we see that it is associated with `DataImport`, `DimensionReduction`, `GeneExpression`, `RNASeq`, and `SingleCell`, to name but some of its many annotations. We cover _biocViews_ in more detail. ### biocViews To find packages via the Bioconductor website, one useful resource is the [BiocViews](https://bioconductor.org/packages/release/BiocViews.html) page, which provides a hierarchically organized view of annotations associated with Bioconductor packages. Under the ["Software"](https://bioconductor.org/packages/release/BiocViews.html#___Software) label for example (which is comprised of most of the Bioconductor packages), there exist many different views to explore packages. For example, we can inspect based on the associated ["Technology"](https://bioconductor.org/packages/release/BiocViews.html#___Technology), and explore ["Sequencing"](https://bioconductor.org/packages/release/BiocViews.html#___Sequencing) associated packages, and furthermore subset based on ["RNASeq"](https://bioconductor.org/packages/release/BiocViews.html#___RNASeq). Another area of particular interest is the ["Workflow"](https://bioconductor.org/packages/release/BiocViews.html#___Workflow) view, which provides Bioconductor packages that illustrate an analytical workflow. For example, the ["SingleCellWorkflow"](https://bioconductor.org/packages/release/BiocViews.html#___SingleCellWorkflow) contains the aforementioned tutorial, encapsulated in the _simpleSingleCell_ package. ### Bioconductor Forums The [Bioconductor support site](https://support.bioconductor.org/) contains a Stackoverflow-style question and answer support site that is actively contributed to from both users and package developers. Thanks to the work of dedicated volunteers, there are ample questions to explore to learn more about Bioconductor specific workflows. Another way to connect with the Bioconductor community is through [Slack](https://bioc-community.herokuapp.com), which hosts various channels dedicated to packages and workflows. The Bioc-community Slack is a great way to stay in the loop on the latest developments happening across Bioconductor, and we recommend exploring the "Channels" section to find topics of interest.