Monday, December 17, 2012

Analysis - Tools - RUM

RUM

We use the RUM package from Grant et al to do the basic processing of RNA-Seq data.  RUM generates a set of files which we then process a bit further to make them visible in the TessLA browser and for other down-stream analyses.

Files

Here is a typical set of files produced for a RUM analysis:

26642352304 RUM.sam - all alignments
 9094911272 RUM_NU - non-unique alignments
   48715025 RUM_NU.bedGraph.gz - bedGraph format for display
    1002296 RUM_NU.bedGraph.gz.tbi - index of bedGraph format for display
  213897294 RUM_NU.cov - coverage data for non-uniquely mapping reads
 3687201046 RUM_Unique - unique alignments
   52517756 RUM_Unique.bedGraph.gz  - bedGraph format for display
     830599 RUM_Unique.bedGraph.gz.tbi - index of bedGraph format for display
  228549781 RUM_Unique.cov - coverage data for uniquely mapping reads
  122977785 feature_quantifications-max.tab
  122977785 feature_quantifications-max.tab-sorted
  122977785 feature_quantifications-min.tab
  122977785 feature_quantifications-min.tab-sorted
  111522316 feature_quantifications_RLB-GENOME-TAG - expression levels of transcript, exons, and introns.
    5417514 inferred_internal_exons.bed
    3126115 inferred_internal_exons.txt
   30203897 junctions_all.bed
   30203808 junctions_all.bed-sorted
   18500890 junctions_all.rum
    9149089 junctions_high-quality.bed
    9148994 junctions_high-quality.bed-sorted
      16384 log
       4142 mapping_stats.txt - summary of how many reads mapped to genome or transcripts
     289688 novel_inferred_internal_exons_quantifications_RLB-GENOME-TAG
      16384 postproc
 5438152292 quals.fa - read qualities
 5438152292 reads.fa - read sequences
        449 rum_RLB-GENOME-TAG_preproc.sh
        848 rumRLB-GENOME-TAG_proc.sh
       1837 rum_job_config
       3275 rum_job_report.txt
       1384 rum_runner.log
        125 rum_sge_job_ids


Analysis - Tools - Comparison

We run this tool to do basic differential analysis.  It is best used for RNA-Seq data, but can be used for other data as well.

Files

By default the files are created in Analysis/DiffExp.  In this directory, you may find multiple analyses which use different data and/or parameters.  Looking inside one of these directories, you will see 3 to 4 files called Compare.*, the most useful of which is Compare.tab.xls.

Compare.tab.xls

This file contains the comparison data.  The contents are somewhat flexible, but will follow this outline.

Each row is a transcript.  The first few columns contain the gene, transcript, and 'Best' (an indicator which guides you to the best transcript for each genes.)

The next set of columns are various comparisons.  Which comparisons are done depend on the experiment.  For each comparison there are 6 columns.
  1. MVA:M:Test:Control - log2 test/control fold change 
  2. MVA:A:Test:Control - log2 average expression
  3. EDGE:A:Test:Control - log2 average expression
  4. EDGE:M:Test:Control - log2 test/control fold change
  5. EDGE:pv:Test:Control  - 0-1 p-value
  6. EDGE:FDR:Test:Control - 0-1 FDR from p-value using Benjamini-Hochberg correction
The first word in each column title indicates the tool that is used to produce the data in the column.
  1. MVA is a simple MvA comparison with no statistical significance.
  2. EDGE is the EdgeR package which performs differential gene expression on RNA-Seq data.
The data that is passed to the analysis programs has been quantile normalized.
M values are the log2(Test/Control), so M=1 indicates 2-fold increase in expression.
A values  are log2 of the average expression between two conditions.  MvA and EdgeR use different units, MVA is usually Reads, whereas EdgeR values have been normalized to counts per million.

The next set of columns of the file are quantile normalized log2 versions of the 'raw' data for the individual samples.

The last set of columns are the 'raw' data which is usually reads.

Looking Deeper
Within each Comparison folder is another called 'Heatmap'.  See http://fgc-ngsc-cores.blogspot.com/2012/09/analysis-tools-rum-multiplecomparisons.html for details about the files in this folder.