useR! 2014
Author: Martin Morgan (mtmorgan@fhcrc.org), Sonali Arora
Date: 30 June, 2014
Language and environment for statistical computing and graphics
factor(), NAVector, class, object
logical,
integer, numeric, complex, character, bytematrix – atomic vector with 'dim' attributedata.frame – list of equal length atomic vectorslm(), belowFunction, generic, method
rnorm(1000)
print(). print.factor; methods are invoked indirectly, via the generic.Introspection
class(), str()dim()Help
?print: help on the generic print ?print.data.frame: help on print method for objects of class
data.frame.Example
x <- rnorm(1000)                   # atomic vectors
y <- x + rnorm(1000, sd=.5)
df <- data.frame(x=x, y=y)         # object of class 'data.frame'
plot(y ~ x, df)                    # generic plot, method plot.formula
 
fit <- lm(y ~x, df)                # object of class 'lm'
methods(class=class(fit))          # introspection
##  [1] add1.lm*           alias.lm*          anova.lm*         
##  [4] case.names.lm*     confint.lm         cooks.distance.lm*
##  [7] deviance.lm*       dfbeta.lm*         dfbetas.lm*       
## [10] drop1.lm*          dummy.coef.lm      effects.lm*       
## [13] extractAIC.lm*     family.lm*         formula.lm*       
## [16] hatvalues.lm*      influence.lm*      kappa.lm          
## [19] labels.lm*         logLik.lm*         model.frame.lm*   
## [22] model.matrix.lm    nobs.lm*           plot.lm*          
## [25] predict.lm         print.lm*          proj.lm*          
## [28] qr.lm*             residuals.lm       rstandard.lm*     
## [31] rstudent.lm*       simulate.lm*       summary.lm        
## [34] variable.names.lm* vcov.lm*          
## 
##    Non-visible functions are asterisked
Analysis and comprehension of high-throughput genomic data
Packages, vignettes, work flows
Objects
getClass(), showMethods(..., where=search()),
selectMethod()method?"substr,<tab>" to select help on methods, class?D<tab>
for help on classesExample
require(Biostrings)                     # Biological sequences
data(phiX174Phage)                      # sample data, see ?phiX174Phage
phiX174Phage
##   A DNAStringSet instance of length 6
##     width seq                                          names               
## [1]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA Genbank
## [2]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA RF70s
## [3]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA SS78
## [4]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA Bull
## [5]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA G97
## [6]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA NEB03
m <- consensusMatrix(phiX174Phage)[1:4,] # nucl. x position counts
polymorphic <- which(colSums(m != 0) > 1)
m[, polymorphic]
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## A    4    5    4    3    0    0    5    2    0
## C    0    0    0    0    5    1    0    0    5
## G    2    1    2    3    0    0    1    4    0
## T    0    0    0    0    1    5    0    0    1
showMethods(class=class(phiX174Phage), where=search())
Exercise
vignette(package="Biostrings"). Add another argument to the
vignette function to view the 'BiostringsQuickOverview' vignette.The following code loads some sample data, 6 versions of the phiX174Phage genome as a DNAStringSet object.
library(Biostrings)
data(phiX174Phage)
Explain what the following code does, and how it works
m <- consensusMatrix(phiX174Phage)[1:4,]
polymorphic <- which(colSums(m != 0) > 1)
mapply(substr, polymorphic, polymorphic, MoreArgs=list(x=phiX174Phage))
##         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## Genbank "G"  "G"  "A"  "A"  "C"  "C"  "A"  "G"  "C" 
## RF70s   "A"  "A"  "A"  "G"  "C"  "T"  "A"  "G"  "C" 
## SS78    "A"  "A"  "A"  "G"  "C"  "T"  "A"  "G"  "C" 
## Bull    "G"  "A"  "G"  "A"  "C"  "T"  "A"  "A"  "T" 
## G97     "A"  "A"  "G"  "A"  "C"  "T"  "G"  "A"  "C" 
## NEB03   "A"  "A"  "A"  "G"  "T"  "T"  "A"  "G"  "C"
Bioconductor is a large collection of R packages for the analysis and comprehension of high-throughput genomic data. Bioconductor relies on formal classes to represent genomic data, so it is important to develop a rudimentary comfort with classes, including seeking help for classes and methods. Bioconductor uses vignettes to augment traditional help pages; these can be very valuable in illustrating overall package use.