--- title: "Data Analysis" output: html_document --- Lab #5 Differential expression 1.) Get the GEO rat ketogenic brain data set (rat_KD.txt). [rat_KD.txt](http://tinyurl.com/data-uruguay) 2a.) Load into R, using read.table() function and header=T argument. ```{r} rat = read.table("rat_KD.txt", sep = "\t", header = T) dimnames(rat)[[1]] <- rat[,1] rat = rat[,-1] ``` 2b.) Set the gene names to row names and remove this first column. 3.) Use the t-test function to calculate the difference per gene between the control diet and ketogenic diet classes. (Hint: use the colnames() function to determine where one class ends and the other begins). ```{r} colnames(rat) t.test(rat[1,1:6], rat[1,7:11]) y <- t.test(rat[2,1:6], rat[2, 7:11]) dim(rat) ttestRat <- function(df, grp1, grp2) { x = df[grp1] y = df[grp2] x = as.numeric(x) y = as.numeric(y) results = t.test(x, y) results$p.value } rawpvalue = apply(rat, 1, ttestRat, grp1 = c(1:6), grp2 = c(7:11)) ``` 4.) Plot a histogram of the p-values. ```{r} hist(rawpvalue) ``` 5.) Log2 the data, calculate the mean for each gene per group. Then calculate the fold change between the groups (control vs. ketogenic diet). hint: log2(ratio) ```{r} ##transform our data into log2 base. rat = log2(rat) #calculate the mean of each gene per control group control = apply(rat[,1:6], 1, mean) #calcuate the mean of each gene per test group test = apply(rat[, 7:11], 1, mean) #confirming that we have a vector of numbers class(control) #confirming we have a vector of numbers class(test) #because our data is already log2 transformed, we can take the difference between the means. And this is our log2 Fold Change or log2 Ratio == log2(control / test) foldchange <- control - test ``` 6.) Plot a histogram of the fold change values. ```{r} hist(foldchange, xlab = "log2 Fold Change (Control vs Test)") ``` 7.) Transform the p-value (-1*log(p-value)) and create a volcano plot using ggplot2. ```{r} results = cbind(foldchange, rawpvalue) results = as.data.frame(results) results$probename <- rownames(results) library(ggplot2) volcano = ggplot(data = results, aes(x = foldchange, y = -1*log10(rawpvalue))) volcano + geom_point() ```