Friday, March 29, 2013

Analysis - Tools - ngsc-hitsclip-RLB

Our data processing for a single HITS-CLIP or miRNA-Seq library follows the same steps.

Trim adapter sequence from reads

Since the miRNA sequences are shorter than the 36 or 50 nt that we sequence, we trim off the 3' adapter sequence from the 3' end of the reads.  We allow up to 1 mismatch per 4nt of adapter sequence.

Align Trimmed Reads

The trimmed reads are aligned using bowtie to (1) miRNA hairpins (mirBase), (2) RefSeq transcripts, and (3) whole genome.  We allow up to 3 mismatches and require a unique best match. The alignment counts and percentages are tallied to assess the  quality of the library, i.e., how much is coming from miRNAs versus degraded mRNA.

Quantitate Mature Forms

We count the number of  aligned trimmed reads that overlap with the annotated mature forms. If there is no mature annotated for either arm of a precursor hairpin, then we do a naive inference and take the mature form as half of the hairpin.

Differential Analysis

Differential expression or differential loading analysis is done using the read counts which are converted to RPM, quantile normalized, then analyzed by SamR (older data sets) or EdgeR (newer data sets, when there are at least 2 replicates) to generate p-values and FDRs.

Footprint




Wednesday, February 20, 2013

WebSite - Charges

The Charges tab on the front page of the website lists the charges that have been incurred for the investigations the core is conducting.

Scope

You only see charges for investigation that you have access to.  PIs will therefore see charges for all of their investigations.  Post-docs and grad students will only see charges for their projects, not  for others in the lab unless they are part of the investigation.

Timing

After an initial break-in period, we will load charges fairly soon after the service has been provided. Charges may be before it has been decided whether the service is billable, i.e., whether a run or lane has failed or not.  This will be adjusted on an on-going basis until, or even after an invoice has been created.

What do we charge for?

The cores charge for quality assessments, making libraries, microarrays, sequencing, basic data processing (RLB), more advanced informatics, as well as CPU charges from the PGFI cluster.

What's in a charge?

The core, FGC or NGSC, is set depending on the PI's affiliation and the service, e.g., microarrays are always through FGC, but other services may be FGC or NGSC.

The invoice number is set once an invoice is prepared for billing.

The investigation should be clear as well as the Service code (what we did) and the service type.

A service is free if it was performed, but failed due to a problem with the sequencers or something we did wrong.

The % Billed is used to (1) handle the fractional charges from PGFI CPU usage, (2) split services between invoices.

The '$' is the list price.  '$/Item' is the charge after the Free and % Billed is considered. This is what you will be charged.

The Instance Key is the name of the thing that the service was performed on or with, e.g.,
FGC0396/1 for a lane in a sequencing run
7890/7890 for bioA of sample 7890

msrx_FGC0350_3_hg18_rd for aligning and loading data from FGC0350/3 to hg18.

The description will usually be fairly cryptic unless we edit it by hand.

Started On and Ended On are the dates over which the service was performed.




Friday, February 8, 2013

FAQ Sample Queue

Introduction 

The sample queue currently covers only the steps from sample drop off to sequencing.  It does not yet automatically handle resequencing, but we are working on that. There is a diagram of our process at the bottom of the page.

This queue contains samples for both the NGSC and the FGC cores. Due to the time it takes to compete the various steps and the amount of manual review required, we will only update the queue once or twice a day.

QC

Quality control involves performing at least an Agilent bioanalyzer run and a qubit run. Additional runs may be needed if these fail or give contradictory results.  We are evaluating the use of the Kapa system for QC but this is still preliminary and is only being applied to samples selectively.

We track the various evaluations in a spreadsheet, the upload values once all checks have been done and we are confident in the results.  Thus in many cases work is being done on a sample that may it be reflected in the queue.

This queue covers, RNA and DNA, libraries just being sequenced, as well as libraries that need to get resequenced due to bad sequencing results.

After the bioanalyzer step, you will receive an email indicating the results, but further processing may be going to refine the concentration.

After QC samples are either marked BAD and processed no further, or are marked GOOD and move to library prep, pooling, or ready for sequencing depending on the sample type.

Extraction
Extraction of RNA or DNA from cells is rare.  Any RNA or DNA extracted will become a new sample and move to the QC queue.

Library Prep

Library prep covers making libraries from RNA or genomic DNA. These are typically time-consuming process that take away or two to process 8 samples.

The libraries produced are new samples that move to the QC queue.  Usually a bioA is done immediately, so he QC is to get the precise molarities.

Microarrays

Microarrays are performed by the FGC core. These typically take about 2 to 3 three person-days to perform. Currently Agilent is having trouble with their manufacturing and is not shipping any arrays.

Pooling

Pooling covers the dilution and mixing steps necessary to sequence on or more libraries.  We are very careful with this step as it is essential to achieving maximum read counts and even distribution across all libraries in a pool.  We qubit the dilutions and redilute if necessary.  Entries in this stage represent individual libraries which will be pooled to take up just a lane or two, usually.

Waiting to be Sequenced

This queue is the set of pools or individual samples that are ready to go. When a sample or pool will be sequenced in more than one lane, the queue entry will indicate this.

Sequencing Now

These samples are in runs that are going on 'now'.  This include runs that are just finishing or have recently finished.  The end of a run may also be defined manually when a run is having trouble.

Recently Sequenced

Recently sequenced is just a list of the recent runs, in case you missed your sample going through the queue.

Outline of sample flow through the NGSC or FGC Cores.


Wednesday, January 23, 2013

Survey - NGSC/FGC Sample Queue Privacy

This survey was conducted from Jan 22 to Jan 25 to determine what information NGSC/FGC users wanted displayed about samples in the sample queue, a feature that we are adding to the website.  By Jan 23 we had 86 respondents (which is fantastic!) which allows us to get a very clear picture of what users want.

To summarize,
  1. most people want to see the principle investigator and investigator's names.
  2. a significant percentage do not want investigation name or experiment or assay names visible
  3. a significant percentage do not want the sample name visible
  4. most people do want to see the date the sample was submitted.
Just a few people used the comment box and none of the comments altered the results above.  The details are below. Despite the relatively large percentage of Don't Care responses, there were very few respondents that put Don't Care for each question.

So we will be including just these four sample identification columns in the queue.  Other data, such as time in the queue or estimated time until finishing will be added once we can estimate them with reasonable accuracy.
  1. PI
  2. investigator
  3. submission date
  4. anonymous sample ID
With these answers in hand, we will implement the sample queue as quickly as we can.

Thanks for your feedback!

Question        | NO!     | no      | eh      | yes     | YES!
1. PI             2.3%  2   5.8%  5  25.6% 22  47.7% 41  18.6% 16
2. Investigator   2.3%  2   5.8%  5  29.1% 25  40.7% 35  22.1% 19
3. Investigation 14.0% 12  25.6% 22  25.6% 22  26.7% 23   8.1%  7
4. Cond & Assay  19.8% 17  22.1% 19  26.7% 23  24.4% 21   7.0%  6
5. Sample Name   10.6%  9  21.2% 18  25.9% 22  29.4% 25  12.9% 11
6. Submit Date    0.0%  0   4.7%  4  22.1% 19  33.7% 29  39.5% 34


Monday, January 21, 2013

FAQ Barcodes

When multiplexing libraries, it is essential to pick barcodes that work together.

Check the instructions in your library kit carefully to find the barcode selection guide.  It may not be obvious at first.

If you are using the FGC/NGSC's multiplexed ChIP-seq protocol, you will find the legal combinations on page 3.