Skip to content.

bioconductor.org

Bioconductor is an open source and open development software project
for the analysis and comprehension of genomic data.

Sections

Lab4b.Rnw

% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % % \VignetteIndexEntry{Seattle Lab 4B} %\VignetteDepends{Biobase,affy} %\VignetteKeywords{Microarray} \documentclass[12pt]{article}

\usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref}

\textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle}

\bibliographystyle{plainnat}

\title{Lab 4B: An Introduction to Bioconductor's {\tt affy} package}

\begin{document}

\maketitle

In this lab, we demonstrate the main functions in the \verb+affy+ package for pre-processing Affymetrix microarray data. To load the package

<>= library(affy) @

For a more detailed introduction, consult the package vignettes which can be listed by the command {\tt openVignette("affy")}. A demo can also be accessed by {\tt demo(affy)}. A number of sample datasets are available in the package; to list these, type {\tt data(package="affy")}.

We will work mainly with the \verb+Dilution+ dataset. For a description of \verb+Dilution+, type {\tt ? Dilution}. To load this dataset

<>= data(Dilution) @

%%%%%%%%%%%%%%%%%%%%%%%%% %%% affy classes

One of the main classes in \verb+affy+ is the \verb+AffyBatch+ class. For details on this class consult the help file, {\tt ? AffyBatch}; methods for manipulating instances of this class are also described in the help file. Other classes include \verb+ProbeSet+ (PM and MM intensities for individual probe sets), \verb+Cdf+ (information contained in a CDF file), and \verb+Cel+ (single array cel intensity data). The object \verb+Dilution+ is an instance of the class \verb+AffyBatch+. Try the following commands to obtain information of this object

<>= class(Dilution) slotNames(Dilution) Dilution annotation(Dilution) @

For a description of the target samples hybridized to the arrays <>= phenoData(Dilution) pData(Dilution) @

The \verb+exprs+ slot contains a matrix with columns corresponding to arrays and rows to individual probes on the array. To obtain the matrix of intensities for all four arrays <>= e<-exprs(Dilution) nrow(Dilution)*ncol(Dilution) dim(e) @

You can access probe-level PM and MM intensities using <>= PM<-pm(Dilution) dim(PM) PM[1:5,] @

To get the probe set names (Affy IDs)

<>= gnames<-geneNames(Dilution) length(gnames) gnames[1:5] nrow(e)/length(gnames) @

As with other microarray objects in Bioconductor packages, you can use subsetting commands for {\tt AffyBatch} objects

<>= dil1<-Dilution[1] class(dil1) dil1 cel1<-Dilution[[1]] class(cel1) cel1 @

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reading in data

One of the main function for reading in Affymetrix data is \verb+ReadAffy+. It reads in data from \verb+CEL+ and \verb+CDF+ files and creates objects of class \verb+AffyBatch+. Using \verb+ReadAffy(widget=TRUE)+ provides widgets for interactive data input.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Diagnostic plots

To produce a spatial image of probe log intensities and probe raw intensities <>= # Log transformation image(Dilution[1]) @

<>= # No transformation image(cel1) @

To produce boxplots of probe log intensities <>= boxplot(Dilution,col=c(2,2,3,3)) @ Note that scanner effects seem stronger than concentration effect.

To produce density plots of probe log intensities <>= hist(Dilution, type="l", col=c(2,2,3,3), lty=rep(1:2,2), lwd=3) @

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Normalization

The \verb+affy+ package provides implementations for a number of methods for background estimation, probe-level normalization (e.g., quantile, curve-fitting (Bolstad et al., 2002)), and computation of expression measures (e.g., MAS 4.0, MAS 5.0, MBEI (Li \& Wong, 2001), RMA (Irizarry et al., 2003)). To list available methods for \verb+AffyBatch+ objects

<>= bgcorrect.methods normalize.AffyBatch.methods pmcorrect.methods express.summary.stat.methods @

The main normalization function is \verb+expresso+. You can select pre-processing methods interactively using widgets by typing {\tt expresso(Dilution, widget=TRUE)}. The function operates on objects of class \verb+AffyBatch+ and returns objects of class \verb+exprSet+.

\verb+rma+ provides a more efficient implementation of Robust Multi-array Average (RMA)

<>= rmaDil<-rma(Dilution) class(rmaDil) @

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % CDF data packages

Data packages for CDF information can be download from \url{www.bioconductor.org}|. These packages contain environment objects which provide mappings between AffyIDs and matrices of probe locations, with rows corresponding to probe-pairs and columns to PM and MM cels. CDF environments for HGU95Av2 and HGU133A chips are already in the package. For information on the environment object {\tt ? hgu95av2cdf}

<>= annotation(Dilution) data(hgu95av2cdf) pnames<-ls(env=hgu95av2cdf) length(gnames) gnames[1:5] get(gnames[1],env=hgu95av2cdf) @

You can also use the \verb+indexProbe+, \verb+pmindex+, and \verb+mmindex+ to get information on probe location

<>= plocs<-indexProbes(Dilution,which="both") plocs[[1]] pmindex(Dilution,genenames=gnames[1], xy=TRUE) pmindex(Dilution,genenames=gnames[1]) @

\end{document}

News
2010-05-21

Advanced R Programming for Bioinformatics course material now available

2010-04-23

Bioconductor 2.6, consisting of 389 packages and designed to work with R version 2.11, was released today.