Saturday, September 1, 2012

Analysis - Tools - RUM-MultipleComparisons

Introduction

We routinely run the pipeline RUM-MultipleComparisons to assess RNA-Seq data.  Although the tool includes the work 'RUM' in the title, it can work with gene expression values from a variety of RNA-Seq tools.

We are still expanding what analyses RUM-MultipleComparisons performs but at the moment, it includes these basic steps.
  1. Assemble a table of the raw data
  2. Filter to consider just transcripts
  3. Performs quantile normalization of the values
  4. Does a series of k-means clustering of the data and displays results as heatmaps
  5. Generates MvA plots of averages for all conditions
  6. Generates MvA plots of replicates within a condition
  7. Tabulates fold-changes between average values for all conditions

What Files Should I Look At?

 First, take a look at the plot, Replicates and Kmeans-heatmap.pdf files so that you can see if the samples have good intra-condition consistency.  In addition, the heatmap file will help you see if the changes between conditions are consistent across samples, and roughly how many sets of expression patterns there are in the set.

Once you can see that the data is ok, turn to the Averages.tab file or the appropriate Kmeans-*-clusters.tab file to see gene IDs.  All of the tab files can be opened from within Excel which can be used to further filter the genes.  Gene lists can also be created for use with functional analysis.

How Do we Usually Run It?

 We usually focus on well-characterized RefSeqs, i.e., those with IDs like NM_* or NR_*.

What Does the Output Look Like?

 Plots

  • AllPairs-mva.png - a comparison of all samples in the data set.
  • Kmeans-heatmap.pdf - series of heatmaps using different numbers of clusters.  Yellow/white is high expression, red is low.
  • Pairs.pdf - MvA plots of all condition comparisons
  • Replicates-mva.pdf - MvA plots of replicates within a condition

Tables of Data

  • AllTranscriptReadCounts-sql.tab - initial raw data
  • AllTranscriptReadCounts.tab - data filtered to just transcripts
  • Averages.tab - averages over conditions with fold-changes for all comparison
  • Details-Lg2-Qn.tab - quantile normalized values for individual samples
  • Kmeans-04-clusters.tab - details of genes in each cluster.
  • Kmeans-05-clusters.tab
  • Kmeans-06-clusters.tab
  • ...
  • Kmeans-28-clusters.tab
  • Kmeans-29-clusters.tab
  • Kmeans-30-clusters.tab

No comments:

Post a Comment