%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{1. Introduction to Bioconductor} # Introduction to Bioconductor useR! 2014
Author: Martin Morgan (mtmorgan@fhcrc.org), Sonali Arora
Date: 30 June, 2014 ```{r setup, echo=FALSE} knitr::opts_chunk$set(cache=TRUE) ``` ## R Language and environment for statistical computing and graphics - Full-featured programming language - Interactive and *interpretted* -- convenient and forgiving - Coherent, extensive documentation - Statistical, e.g. `factor()`, `NA` - Extensible -- CRAN, Bioconductor, github, ... Vector, class, object - Efficient _vectorized_ calculations on 'atomic' vectors `logical`, `integer`, `numeric`, `complex`, `character`, `byte` - Atomic vectors are building blocks for more complicated _objects_ - `matrix` -- atomic vector with 'dim' attribute - `data.frame` -- list of equal length atomic vectors - Formal _classes_ represent complicated combinations of vectors, e.g., the return value of `lm()`, below Function, generic, method - Functions transform inputs to outputs, perhaps with side effects, e.g., `rnorm(1000)` - Argument matching first by name, then by position - Functions may define (some) arguments to have default values - _Generic_ functions dispatch to specific _methods_ based on class of argument(s), e.g., `print()`. - Methods are functions that implement specific generics, e.g., `print.factor`; methods are invoked _indirectly_, via the generic. Introspection - General properties, e.g., `class()`, `str()` - Class-specific properties, e.g., `dim()` Help - `?print`: help on the generic print - `?print.data.frame`: help on print method for objects of class data.frame. Example ```{r} x <- rnorm(1000) # atomic vectors y <- x + rnorm(1000, sd=.5) df <- data.frame(x=x, y=y) # object of class 'data.frame' plot(y ~ x, df) # generic plot, method plot.formula fit <- lm(y ~x, df) # object of class 'lm' methods(class=class(fit)) # introspection ``` ## Bioconductor Analysis and comprehension of high-throughput genomic data - Statistical analysis: large data, technological artifacts, designed experiments; rigorous - Comprehension: biological context, visualization, reproducibility - High-throughput - Sequencing: RNASeq, ChIPSeq, variants, copy number, ... - Microarrays: expression, SNP, ... - Flow cytometry, proteomics, images, ... Packages, vignettes, work flows ![Alt Sequencing Ecosystem](SequencingEcosystem.png) - 824 packages - Discover and navigate via [biocViews][] - Package 'landing page' - Title, author / maintainer, short description, citation, installation instructions, ..., download statistics - All user-visible functions have help pages, most with runnable examples - 'Vignettes' an important feature in Bioconductor -- narrative documents illustrating how to use the package, with integrated code - 'Release' (every six months) and 'devel' branches Objects - Represent complicated data types - Foster interoperability - S4 object system - Introspection: `getClass()`, `showMethods(..., where=search())`, `selectMethod()` - 'accessors' and other documented functions / methods for manipulation, rather than direct access to the object structure - Interactive help - `method?"substr,"` to select help on methods, `class?D` for help on classes Example ```{r Biostrings, message=FALSE} require(Biostrings) # Biological sequences data(phiX174Phage) # sample data, see ?phiX174Phage phiX174Phage m <- consensusMatrix(phiX174Phage)[1:4,] # nucl. x position counts polymorphic <- which(colSums(m != 0) > 1) m[, polymorphic] ``` ```{r showMethods, eval=FALSE} showMethods(class=class(phiX174Phage), where=search()) ``` Exercise 1. Load the Biostrings package and phiX174Phage data set. What class is phiX174Phage? Find the help page for the class, and identify interesting functions that apply to it. 2. Discover vignettes in the Biostrings package with `vignette(package="Biostrings")`. Add another argument to the `vignette` function to view the 'BiostringsQuickOverview' vignette. 3. Navigate to the Biostrings landing page on http://bioconductor.org. Do this by visiting the biocViews page. Can you find the BiostringsQuickOverview vignette on the web site? 4. The following code loads some sample data, 6 versions of the phiX174Phage genome as a DNAStringSet object. ```{r phiX} library(Biostrings) data(phiX174Phage) ``` Explain what the following code does, and how it works ```{r consensusMatrix} m <- consensusMatrix(phiX174Phage)[1:4,] polymorphic <- which(colSums(m != 0) > 1) mapply(substr, polymorphic, polymorphic, MoreArgs=list(x=phiX174Phage)) ``` ## Summary Bioconductor is a large collection of R packages for the analysis and comprehension of high-throughput genomic data. Bioconductor relies on formal classes to represent genomic data, so it is important to develop a rudimentary comfort with classes, including seeking help for classes and methods. Bioconductor uses vignettes to augment traditional help pages; these can be very valuable in illustrating overall package use. [biocViews]: http://bioconductor.org/packages/release/BiocViews.html