Introduction
We routinely run the ConnectSpanToGene tool to associate ChIP-Seq peaks with genes, but it can be run to associate any set of regions with genes.ConnectSpanToGene takes a gene track, a span track (which has the peaks), and parameters (MaxDistBp, ToleranceBp, TolerancePct) controlling how far away from a span we will look for a gene. It outputs a tab-delimited file containing the results, which we usually convert to an Excel file and load into the database (attached to the region track).
How Does it Work?
The program considers each region in the region track. It then reports any gene that overlaps with the span. It then puts the nearby genes in order by the distance from the span to the transcription start site (TSS), with the closest TSS first. If the distance (D) to the first gene is less than the distance threshold (MaxDistBp), then the gene is reported. Any other genes that are closer to the span than D * (1 + TolerancePct/100) and (D +ToleranceBp) are also reported. This process is then repeated for each span in the span track. The value of D is adjusted for each span.How do We Usually Run it?
We usually run with the following settings:MaxDistBp = 100,000
ToleranceBp = 50,000
TolerancePct = 50
So the first TSS must be within 100KB. Note that if the region overlaps the gene, then more distant TSSs may be reported as well.
What Does the Output Look Like?
Here are the columns in the output file.- Span-GenomeRelease - genome of the spans (and the genes)
- Span-Chromosome - chromosome of spans
- Span-BeginBp - begin of span
- Span-EndBp - end of span
- Span-Strand - '+' or '-' of span
- Span-Score - score of span - the meaning of the value depends on the span table.
- Span-Name - name of span
- GeneI - empty, or 1, 2, 3 etc. Is empty if there are no nearby genes
- AbsDistanceBp - absolute value of distance from span to gene TSS
- DistanceBp - distance from span to gene TSS, Positive values are downstream of the TSS.
- Overlaps - yes/no, does the span overlap the gene.
- Gene-GenomeRelease - genome of the gene
- Gene-Chromosome - chromosome of the gene
- Gene-BeginBp - beginning of the gene
- Gene-EndBp - end of the gene
- Gene-Strand - '+' or '-' of gene
- Gene-Score - score of gene. Usually meaningless, but could be a differential expression value.
- Gene-Name - name of gene
- Span-Pvalue - may be empty, p-value for detection of the span
- Span-FDR - may be empty, FDR for detection of the span
- Span-ContentTag - extra stuff about the span. Varies with the span table.
No comments:
Post a Comment