Thursday, August 30, 2012

Analysis - Tools - ConnectSpanToGene

Introduction

We routinely run the ConnectSpanToGene tool to associate ChIP-Seq peaks with genes, but it can be run to associate any set of regions with genes.

ConnectSpanToGene takes a gene track, a span track (which has the peaks), and parameters (MaxDistBp, ToleranceBp, TolerancePct) controlling how far away from a span we will look for a gene.  It outputs a tab-delimited file containing the results, which we usually convert to an Excel file and load into the database (attached to the region track).

How Does it Work?

 The program considers each region in the region track.  It then reports any gene that overlaps with the span.  It then puts the nearby genes in order by the distance from the span to the transcription start site (TSS), with the closest TSS first.  If the distance (D) to the first gene is less than the distance threshold (MaxDistBp), then the gene is reported.  Any other genes that are closer to the span than D * (1 + TolerancePct/100) and (D +ToleranceBp)  are also reported.  This process is then repeated for each span in the span track.  The value of D is adjusted for each span.

 How do We Usually Run it?

We usually run with the following settings:

MaxDistBp    = 100,000
ToleranceBp  =  50,000
TolerancePct =      50

So the first TSS must be within 100KB.  Note that if the region overlaps the gene, then more distant TSSs may be reported as well.


What Does the Output Look Like?

 Here are the columns in the output file.
  1. Span-GenomeRelease - genome of the spans (and the genes)
  2. Span-Chromosome - chromosome of spans
  3. Span-BeginBp - begin of span
  4. Span-EndBp - end of span
  5. Span-Strand - '+' or '-' of span
  6. Span-Score - score of span - the meaning of the value depends on the span table.
  7. Span-Name - name of span
  8. GeneI - empty, or 1, 2, 3 etc. Is empty if there are no nearby genes
  9. AbsDistanceBp - absolute value of distance from span to gene TSS
  10. DistanceBp - distance from span to gene TSS,  Positive values are downstream of the TSS.
  11. Overlaps - yes/no, does the span overlap the gene.
  12. Gene-GenomeRelease - genome of the gene
  13. Gene-Chromosome - chromosome of the gene
  14. Gene-BeginBp - beginning of the gene
  15. Gene-EndBp - end of the gene
  16. Gene-Strand - '+' or '-' of gene
  17. Gene-Score - score of gene. Usually meaningless, but could be a differential expression value.
  18. Gene-Name - name of gene
  19. Span-Pvalue - may be empty, p-value for detection of the span
  20. Span-FDR - may be empty, FDR for detection of the span
  21. Span-ContentTag - extra stuff about the span.  Varies with the span table.


No comments:

Post a Comment