Home arrow Site Navigation arrow FAQs and HowTo's arrow Microarray Analysis arrow Bioconductor limma for 2-color data
Bioconductor limma for 2-color data | Print |
Below is a brief guide that shows an example of using the Bioconductor limma package to analyze two color microarray data.

Using the R/Bioconductor limma package for 2-color microarray data

  • Become familiar with R: variables, functions, help (?function). You will need to load the limma (and possibly limmaGUI) library.
  • Create “targets.txt”, a tab-delimited file that lists microarray data files and what is represented by each channel (SlideNumber, Name columns optional.  Name allows shorter labels in plots, etc.)  Example:

SlideNumber     Name   FileName                      Cy3      Cy5

1                      A1        A014_5.gpr                   ref        wild type

2                      A2        A015_6.gpr                   ref        wild type

3                      A3        A016_7.gpr                   ref        wild type

4                      A4        A017_8.gpr                   ref        wild type

5                      B1        B014_0.gpr                   wild type           ref

6                      B2        B015_1.gpr                   wild type           ref

7                      B3        B016_2.gpr                   wild type           ref

8                      B4        B017_3.gpr                   wild type           ref

 

  • Tell R where to look for your targets and data files.  In R, use File Menu ->‘change dir’
  • For your first trial, follow the steps in the limma usersguide (section 3.2, Sample limma session) to read the Targets file, datafiles, do background correction, normalization, create linear model, EBayes smoothing, and sort genes by B-statistic.  For a more in depth analysis, see steps 5 and beyond.  Note: With limma2.0/R2.5 the targets file “targets.txt” may need to be specified as “targets.txt.txt”; this can be seen by using the list.files(“path”) function.When performing successive steps of an analysis, it can be useful to name each object with a different name rather than overwriting objects (such as MA, fit, etc.)
  • Alternatively, use limmaGUI to do each of these steps via menu selections.  Be aware that sometimes limmaGUI windows get minimized and need to be clicked to come back into view.  Note: With limma2.0/R2.5 the Help files do not open from limmaGUI, but can be found online or in the installation directory. Also with the GUI it is not possible to backtrack and redo steps such as background subtraction once the data has been read in (i.e. steps are combined.)  However, plots are easier to generate in limmaGUI.
  • More about input data:  an optional spot types file can identify control spots. To use the name column, add an option to readTargets:  readTargets("targets.txt", row.names="Name") 
  • In read.maimages, source=”genepix” reads the Mean Foreground and Background values.  Alternatively, “genepix.median” can be used, or a custom column list can be specified.
  • Checking input (summaries of each array intensity range):  summary(RG$R), summary(RG$G)
  • Spot weights: 0 = ignore, 1 = normal wgt.  See wtflags() function and wt.fun parameter in read.maimages.  Spot quality weights are used in normalization by default.
  • Gene names: check  names(RG$genes)
            and if no “genes” column, use
                        RG$genes <- readGAL()
 
  • Quality Assessment: view MA plots of unnormalized data, especially with highlighting of Control Spots.  (An MA-plot of microarray data is a plot of log-ratio of two expression intensities versus the mean log-expression of the two. It is used to visualize intensity-dependent ratios of microarray data. The MA-plot is a graphical way to see log-ratios and fluorescence intensity at the same time.)  Also boxplots of Background intensities (before subtracting background):

boxplot(data.frame(log2(RG$Gb)),main="Green background")

             Also, per array [,1] is first array:

imageplot(log2(RG$Gb[,1]),RG$printer)

  • Normalization:

Within Array – default is print tip loess, unreliable for small arrays with less than, say, 150 spots per print-tip group (use global loess or robustspline)

Between Array (separate channel) – important to avoid missing values in log-ratios which might arise from negative or zero background-corrected intensities.  To see the need for between array normalization (after background correction), use:

                        plotDensities(RG.b)

  •  Technical Replicates: not independent, so use duplicateCorrelation
  • Paired Samples: use moderated paired t-test
  • Statistics: topTable

moderated t-statistic, computed for each probe and for each contrast. This has the same interpretation as an ordinary t-statistic except that the standard errors have been moderated across genes.

The M-value (M) is the value of the contrast. Usually this represents a log2-fold change between two or more experimental conditions although sometimes it represents a log2-expression level. The A-value (A) is the average log2-expression level for that gene across all the arrays and channels in the experiment.

The B-statistic (lods or B) is the log-odds that the gene is differentially expressed. Suppose for example that B = 1.5. The odds of differential expression is exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differentially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is differentially expressed. A B-statistic of zero corresponds to a 50-50 chance that the gene is differentially expressed.

The eBayes() function computes one more useful statistic. The moderated F-statistic (F) combines the t-statistics for all the contrasts into an overall test of significance for that gene. The F-statistic tests whether any of the contrasts are non-zero for that gene, i.e., whether that gene is differentially expressed on any contrast. The denominator degrees of freedom is the same as that of the moderated-t. Its p-value is stored as fit$F.p.value. It is similar to the ordinary F-statistic from analysis of variance except that the denominator mean squares are moderated across genes.

  • Visualization

The limma library has a volcanoplot function that takes as parameters an Ebayes fit object, number of top genes to label, names to use, etc..  There is also a venn function that takes an object created by the decideTests function (which takes a fit object).

When labeling genes on a volcanoplot, use names=fitE$genes$Name parameter (without names=, the fitE$genes$ID column is used and these strings can be truncated to 8 chars)