Friday, September 7, 2012

FGC or NGSC - which core to use?

The Functional Genomics Core (FGC) and the Next-Generation Sequencing Core (NGSC) provide similar services, but with some important differences.
Here is a summary to help you decide which core is right for your project.

FGC

  • high-throughput sequencing for IDOM/DRC members
  • downstream data analysis for IDOM/DRC members as capacity allows
  • Agilent microarrays for IDOM, UPenn, and academic clients
  • limited RNA-Seq library prep for IDOM/DRC members

NGSC

  • high-throughput sequencing for UPenn and academic clients
  • standardized basic preliminary data analysis for UPenn and academic clients
  • limited RNA-Seq library prep for UPenn, and academic clients

For the NGSC and FGC, prices are higher for external clients.

To good news is that you talk to the same people no matter which core you use.

Saturday, September 1, 2012

Analysis - Tools - RUM-MultipleComparisons

Introduction

We routinely run the pipeline RUM-MultipleComparisons to assess RNA-Seq data.  Although the tool includes the work 'RUM' in the title, it can work with gene expression values from a variety of RNA-Seq tools.

We are still expanding what analyses RUM-MultipleComparisons performs but at the moment, it includes these basic steps.
  1. Assemble a table of the raw data
  2. Filter to consider just transcripts
  3. Performs quantile normalization of the values
  4. Does a series of k-means clustering of the data and displays results as heatmaps
  5. Generates MvA plots of averages for all conditions
  6. Generates MvA plots of replicates within a condition
  7. Tabulates fold-changes between average values for all conditions

What Files Should I Look At?

 First, take a look at the plot, Replicates and Kmeans-heatmap.pdf files so that you can see if the samples have good intra-condition consistency.  In addition, the heatmap file will help you see if the changes between conditions are consistent across samples, and roughly how many sets of expression patterns there are in the set.

Once you can see that the data is ok, turn to the Averages.tab file or the appropriate Kmeans-*-clusters.tab file to see gene IDs.  All of the tab files can be opened from within Excel which can be used to further filter the genes.  Gene lists can also be created for use with functional analysis.

How Do we Usually Run It?

 We usually focus on well-characterized RefSeqs, i.e., those with IDs like NM_* or NR_*.

What Does the Output Look Like?

 Plots

  • AllPairs-mva.png - a comparison of all samples in the data set.
  • Kmeans-heatmap.pdf - series of heatmaps using different numbers of clusters.  Yellow/white is high expression, red is low.
  • Pairs.pdf - MvA plots of all condition comparisons
  • Replicates-mva.pdf - MvA plots of replicates within a condition

Tables of Data

  • AllTranscriptReadCounts-sql.tab - initial raw data
  • AllTranscriptReadCounts.tab - data filtered to just transcripts
  • Averages.tab - averages over conditions with fold-changes for all comparison
  • Details-Lg2-Qn.tab - quantile normalized values for individual samples
  • Kmeans-04-clusters.tab - details of genes in each cluster.
  • Kmeans-05-clusters.tab
  • Kmeans-06-clusters.tab
  • ...
  • Kmeans-28-clusters.tab
  • Kmeans-29-clusters.tab
  • Kmeans-30-clusters.tab