--- title: "A.4 -- Visualization" author: Martin Morgan date: "16 - 17 May, 2016" output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > % \VignetteIndexEntry{A.4 -- Visualization} % \VignetteEngine{knitr::rmarkdown} --- ```{r style, echo = FALSE, results = 'asis'} options(width=100) knitr::opts_chunk$set( eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE"))) ``` ```{r echo=FALSE} suppressPackageStartupMessages({ library(RColorBrewer) library(ggplot2) }) ``` # Colors Colors should reflect the nature of the data and be carefully chosen to convey equivalent information to all viewers. The [RColorBrewer][] package provides an easy way to choose colors; see also the [colorbrewer2][] web site. ```{r colorbrewer} library(RColorBrewer) display.brewer.all() ``` We'll use a color scheme from the 'qualitative' series, to represent different levels of factors and for choice of colors. We'll get the first four colors. ```{r color-choice} palette <- brewer.pal(4, "Dark2") ``` [RColorBrewer]: https://cran.r-project.org/package=RColorBrewer [colorbrewer2]: http://colorbrewer2.org # 'Base' Graphics We'll illustrate 'base' graphics using the built-in `mtcars` data set ```{r mtcars} data(mtcars) # load the data set head(mtcars) ``` The basic model is to plot data, e.g., the relationshiop between miles per gallon and horsepower. ```{r plot-mtcars} plot(mpg ~ hp, mtcars) ``` The appearance can be influenced by arguments, see `?plot` then `?plot.default` and `par`. ```{r plot-mtcars-2} plot(mpg ~ hp, mtcars, pch=20, cex=2, col=palette[1]) ``` More complicated plots can be composed via a series of commands, e.g., to plot a linear regression, make the plot, and add the regression line using `abline()`. ```{r plot-mtcars-regression} plot(mpg ~ hp, mtcars) fit <- lm(mpg ~ hp, mtcars) abline(fit, col=palette[1], lwd=3) ``` # _ggplot2_ Graphics Start by loading the _ggplot2_ library ```{r ggplot2} library(ggplot2) ``` ## Basics Tell _ggplot2_ what to plot using `ggplot()` and `aes()`; we'll use the columns `hp` (horsepower) and `mpg` (miles per gallon). ```{r ggplot, eval=FALSE} ggplot(mtcars, aes(x=hp, y=mpg)) ``` Note the neutral gray background with white gridlines to provide unobtrusive orientation. Note the relatively small size of the axis and tick labels, to avoid distracting from the pattern provided by the data. _ggplot2_ uses different `geom_*` to add to the basic plot. Add points ```{r ggplot-point} ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() ``` Add a linear regression line and standard error... ```{r ggplot-lm, warning=FALSE} ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + geom_smooth(method=lm, col=palette[1]) ``` ...and a locally smoothed regression ```{r ggplot-smooth, warning=FALSE} ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + geom_smooth(method=lm, col=palette[1]) + geom_smooth(col=palette[2]) ``` ## Density plots To illustrate additional features, load the BRFSS data subset ```{r file.choose, eval=FALSE} path <- file.choose() ``` ```{r system.file, echo=FALSE} path <- system.file(package="BiocIntroRPCI", "extdata", "BRFSS-subset.csv") ``` ```{r brfss-input} brfss <- read.csv(path) ``` Plot the distribution of weights using `geom_density()` ```{r brfss-density, warning=FALSE} ggplot(brfss, aes(x=Weight)) + geom_density() ``` Plot the weights separately for each year, using `fill=factor(Year)` and `alpha=.5` arguments in the `aes()` argument ```{r brfss-density-by-year, warning=FALSE} ggplot(brfss, aes(x=Weight, fill=factor(Year))) + geom_density(alpha=0.5) ``` Americans are getting heavier, and the variation in weights is increasing. ## Facets Create separate panels for each sex using `facet_grid()`, with a formula describing the factor(s) to use for rows (left-hand side of the formula) and columns (right-hand side). ```{r brfss-facet, warning=FALSE} ggplot(brfss, aes(x=Weight, fill=factor(Year))) + geom_density() + facet_grid(Sex ~ .) ```