Thursday, August 30, 2012

Analysis - Tools - ConnectSpanToGene

Introduction

We routinely run the ConnectSpanToGene tool to associate ChIP-Seq peaks with genes, but it can be run to associate any set of regions with genes.

ConnectSpanToGene takes a gene track, a span track (which has the peaks), and parameters (MaxDistBp, ToleranceBp, TolerancePct) controlling how far away from a span we will look for a gene.  It outputs a tab-delimited file containing the results, which we usually convert to an Excel file and load into the database (attached to the region track).

How Does it Work?

 The program considers each region in the region track.  It then reports any gene that overlaps with the span.  It then puts the nearby genes in order by the distance from the span to the transcription start site (TSS), with the closest TSS first.  If the distance (D) to the first gene is less than the distance threshold (MaxDistBp), then the gene is reported.  Any other genes that are closer to the span than D * (1 + TolerancePct/100) and (D +ToleranceBp)  are also reported.  This process is then repeated for each span in the span track.  The value of D is adjusted for each span.

 How do We Usually Run it?

We usually run with the following settings:

MaxDistBp    = 100,000
ToleranceBp  =  50,000
TolerancePct =      50

So the first TSS must be within 100KB.  Note that if the region overlaps the gene, then more distant TSSs may be reported as well.


What Does the Output Look Like?

 Here are the columns in the output file.
  1. Span-GenomeRelease - genome of the spans (and the genes)
  2. Span-Chromosome - chromosome of spans
  3. Span-BeginBp - begin of span
  4. Span-EndBp - end of span
  5. Span-Strand - '+' or '-' of span
  6. Span-Score - score of span - the meaning of the value depends on the span table.
  7. Span-Name - name of span
  8. GeneI - empty, or 1, 2, 3 etc. Is empty if there are no nearby genes
  9. AbsDistanceBp - absolute value of distance from span to gene TSS
  10. DistanceBp - distance from span to gene TSS,  Positive values are downstream of the TSS.
  11. Overlaps - yes/no, does the span overlap the gene.
  12. Gene-GenomeRelease - genome of the gene
  13. Gene-Chromosome - chromosome of the gene
  14. Gene-BeginBp - beginning of the gene
  15. Gene-EndBp - end of the gene
  16. Gene-Strand - '+' or '-' of gene
  17. Gene-Score - score of gene. Usually meaningless, but could be a differential expression value.
  18. Gene-Name - name of gene
  19. Span-Pvalue - may be empty, p-value for detection of the span
  20. Span-FDR - may be empty, FDR for detection of the span
  21. Span-ContentTag - extra stuff about the span.  Varies with the span table.


Monday, August 27, 2012

FAQ-GettingStarted

Getting Started

Here is what you need to do to get started with the core.
  1. Lab PI makes an account.
  2. Experiment investigators make accounts under the PI.
  3. The next weekday morning the core staff activate the accounts.
  4. Create a New Experiment.
  5. The Core staff read your description then either load the experiment or contact you to ask for clarification.
  6. Once the core has loaded the investigation, you can bring your samples.
  7. We do quality checks on your samples and let you know how they look.
  8. We sequence the samples.
  9. We do the data analysis you requested.
 For subsequent experiments, pick up at step 4.


Thursday, August 9, 2012

How the Core Works

Overview

Here are major steps involved in doing an experiment with the NGSC.

  1. Create accounts at the NGSC or FGC website using the Create New Account link.
    • The PI of the lab should create the first account.
    • The investigator(s) who will actually do the experiment should then create their account(s).
    • Creating a PI's account works best on Safari on a mac - we are fixing this!
    • Investigators should make sure they pick their PI from the list.
    • We will active the accounts within a day or so.
  2. If necessary meet with the technical director to help design the experiment.
    • Use the Appointment Calendar link to identify a time to meet.
    • Send him an email to propose the time.
  3. Create an experiment use the Create New Experiment form.
    • This is a basic description of the experiment you want to do.
    • The web page indicates the info we are looking for.
    • Please include your billing information.
  4. We will formalize the experiment description and load it into the database.
    • This process takes a few days.
    • We may send emails to clarify details of the experiment.
  5. Once we have loaded the experiment, bring your samples to the core.
    • Please do not bring the samples until the experiment has been loaded.
    • Be prepared to provide a source ID and a sample name for each sample
  6. We will check the quality of the samples and report any problems.
  7. We will schedule the libraries for sequencing and report our progress.
  8. Data analysis will follow sequencing as soon as possible.