| Bioconductor limma for 2-color data |
|
Below is a brief guide that shows an example of using the Bioconductor limma package to analyze two color microarray data.
Using the R/Bioconductor limma package for 2-color microarray data
SlideNumber Name FileName Cy3 Cy5 1 A1 A014_5.gpr ref wild type 2 A2 A015_6.gpr ref wild type 3 A3 A016_7.gpr ref wild type 4 A4 A017_8.gpr ref wild type 5 B1 B014_0.gpr wild type ref 6 B2 B015_1.gpr wild type ref 7 B3 B016_2.gpr wild type ref 8 B4 B017_3.gpr wild type ref
RG$genes <- readGAL()
boxplot(data.frame(log2(RG$Gb)),main="Green background") Also, per array [,1] is first array: imageplot(log2(RG$Gb[,1]),RG$printer)
Within Array – default is print tip loess, unreliable for small arrays with less than, say, 150 spots per print-tip group (use global loess or robustspline) Between Array (separate channel) – important to avoid missing values in log-ratios which might arise from negative or zero background-corrected intensities. To see the need for between array normalization (after background correction), use: plotDensities(RG.b)
moderated t-statistic, computed for each probe and for each contrast. This has the same interpretation as an ordinary t-statistic except that the standard errors have been moderated across genes. The M-value (M) is the value of the contrast. Usually this represents a log2-fold change between two or more experimental conditions although sometimes it represents a log2-expression level. The A-value (A) is the average log2-expression level for that gene across all the arrays and channels in the experiment. The B-statistic (lods or B) is the log-odds that the gene is differentially expressed. Suppose for example that B = 1.5. The odds of differential expression is exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differentially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is differentially expressed. A B-statistic of zero corresponds to a 50-50 chance that the gene is differentially expressed. The eBayes() function computes one more useful statistic. The moderated F-statistic (F) combines the t-statistics for all the contrasts into an overall test of significance for that gene. The F-statistic tests whether any of the contrasts are non-zero for that gene, i.e., whether that gene is differentially expressed on any contrast. The denominator degrees of freedom is the same as that of the moderated-t. Its p-value is stored as fit$F.p.value. It is similar to the ordinary F-statistic from analysis of variance except that the denominator mean squares are moderated across genes.
The limma library has a volcanoplot function that takes as parameters an Ebayes fit object, number of top genes to label, names to use, etc.. There is also a venn function that takes an object created by the decideTests function (which takes a fit object). When labeling genes on a volcanoplot, use names=fitE$genes$Name parameter (without names=, the fitE$genes$ID column is used and these strings can be truncated to 8 chars) |