Friday, March 29, 2013

Analysis - Tools - ngsc-hitsclip-RLB

Our data processing for a single HITS-CLIP or miRNA-Seq library follows the same steps.

Trim adapter sequence from reads

Since the miRNA sequences are shorter than the 36 or 50 nt that we sequence, we trim off the 3' adapter sequence from the 3' end of the reads.  We allow up to 1 mismatch per 4nt of adapter sequence.

Align Trimmed Reads

The trimmed reads are aligned using bowtie to (1) miRNA hairpins (mirBase), (2) RefSeq transcripts, and (3) whole genome.  We allow up to 3 mismatches and require a unique best match. The alignment counts and percentages are tallied to assess the  quality of the library, i.e., how much is coming from miRNAs versus degraded mRNA.

Quantitate Mature Forms

We count the number of  aligned trimmed reads that overlap with the annotated mature forms. If there is no mature annotated for either arm of a precursor hairpin, then we do a naive inference and take the mature form as half of the hairpin.

Differential Analysis

Differential expression or differential loading analysis is done using the read counts which are converted to RPM, quantile normalized, then analyzed by SamR (older data sets) or EdgeR (newer data sets, when there are at least 2 replicates) to generate p-values and FDRs.

Footprint