Wednesday, October 24, 2012

FAQ What Does It Cost

FAQ - What Does It Cost?

The FGC and NGSC cores each have a price sheet posted under the 'Pricing' option in the getting started menu.

The FGC only serves clients in the IDOM/DRC, expect for microarrays which are
open to UPenn and other academic institutions.

In both cases, academics institutions outside of Penn pay a higher amount to make up for the lack of grant overhead.

For maximum flexibility there are separate charges for 
  1. sample and library quantification 
  2. library preparation
  3. sequencing, charged by the lane
  4. standard analyses, charged by the sample
  5. advanced analyses, only from the FGC.
The exact cost of an experiment depends on many factors, so it is best estimated with the help of the FGC or NGSC technical director after submitting an experiment request.

FAQ Old TessLA Browser

FAQ Old TessLA Browser

Never fear, it's here.

We also have a button on the front page and a link in the Results menu at the top of the page.

Users should begin to use only the portals that are named for the PI, e.g., Eric-Thepi-Lab.

FAQ Where Is My Old Data

  1. The cores keep tape back ups of every run.
  2. Most runs are kept on our sequencing server for a few weeks.
  3. The fastq files and subsequent analyses are stored on the PGFI cluster.
    1. The website can be used to download any of these files.
    2. Except for very old microarray data that we have archived almost all data is available this way.
    3. People with PGFI accounts can access these files directly.
  4. Many analyses are attached to tracks in the browser and can be downloaded using our genome browser.

What the Future Holds

The volume of data is quite large and is now a major part of our operating expenses.  Therefore in the future we will be moving to a more aggressive purging of on-line data storage.  We will advise you as plans move forward for this.

FAQ Downloading Data

See this entry.

We have also added a big Data Download button on the home page to take you to your (PI's) download area.

FAQ How Are My Samples Doing?

We will be extending the website and our notification system to provide more information but for now, here's what to expect.
  1. Sample Quality Checks
    1. takes a few days
    2. core staff will usually send an email with the results
  2. Scheduling a Run
    1. you may have to wait for a flow cell to be full to sequenced
    2. this time can vary widely depending on the type of run and our workload.
  3. Sequencing
    1. once the flow cell is scheduled here are the approximate times
      1. clustering - 0.5 days
      2. sequencing
        1. 50SR - 2 days
        2. 100SR - 4 days
        3. 50PE - 4 days
        4. 100PE - 11 to 14 days
  4. Fastq generation
    1. usually done the week day the run finishes
  5. Basic Analysis
    1. ChIP-Seq or HITS-CLIP alignments and peak calling
      1. takes 1 to 2 days
    2. RNA-Seq
      1. take about a week

FAQ When Can I Bring My Samples In?

  1. Do not bring samples in until we have told you that the investigation is ready for samples.
  2. We accept samples Tue-Thurs from 11-12 and 3-4. On Mondays we only accept samples from 3-4 and on Fridays we only accept samples from 11-12. Also, check the Lab Calendar (found on our website) before coming in to see if there are any events scheduled which would prevent us from accepting samples.
  3. If you already have an investigation and want to extend it to include new conditions or assays, then submit a new experiment request.  We will process the request the next morning and once you hear from us you can bring the samples in.

FAQ Does the Core Make Libraries?

FAQ Does the Core Make Libraries?

We offer library preparation services for:

  1. RNA-Seq using
    1. Illumina truSeq kits 200ng+ of total RNA

  2. Agilent SureSelect Exome Capture
We can train you to make other libraries, but do not offer these services.

Why You Should Make Your Own Libraries

We are planning on purchasing a robot to automate this process, but for now library prep is a time-consuming task that can take a while for us to complete.

FAQ How to Make Libraries

FAQ How to Make Libraries

Wow that's a big question!  It's too much to handle in one FAQ, so here are links to individual pages for various library types.


FAQ Getting Started

FAQ - How do I Get Started?

The basic things you need to do are:

  1. Read the other FAQs about experiment design and library prep
  2. If you still have questions, check the Consultation Calendar and propose a time to meet to discuss your questions.
  3. Have your PI create an account (this only needs to be done once.)
  4. Create an account for yourself (this only needs to be done once.)
  5. Submit an experiment request.
  6. The next morning we will review experiment requests and contact you to resolve any questions.
  7. We will notify you that we can accept samples.
  8. Bring samples by at either 11-12 or 2-3 Monday to Friday.

FAQ Which Core To Use


There are a few Cores at Penn that do DNA sequencing.  Here is what we offer


NGSC

The NGSC has 3 Illumina hiSeq2000s and a miSeq. Here is what these machines are good for:


hiSeq2000

The hiSeq2000 is good for these techniques (and their many variations) RNA-Seq, ChIP-Seq, miR-Seq, HITS-CLIP, exome capture, BIS-Seq, and whole genome sequencing in mammals.
There are two aspects of ultra-high throughput sequencing that are important counts and coverage.  Counts are important for RNA-Seq, ChIP-Seq, miR-Seq, and HITS-CLIP.  Coverage is important for exome capture, BIS-Seq, and whole genome sequencing.   The hiSeq2000 generates sequence for about 200 million fragments per lane.  For each fragment the hiSeq can produce single or paired-end 50bp or 100bp sequences.  Using 100bp pair-end sequencing, you can get up to 40Gb per lane.
In many cases a single lane can generate more counts or coverage than a sample needs.  In that case, it is important to use multiplexed adapters so we can sequence multiple samples per lane.  Multiplexed adapters are generally a good idea is they allow samples to be test sequenced for quality, then sequenced deeper as needed.

Technique
Typical Volume
Samples per Lane
RNA-Seq
30 to 200 million reads
1 to 6
ChIP-Seq
30 to 100 million reads
2 to 6
miR-Seq
10 million reads
20
HITS-CLIP
30 million reads
6
Exome capture
20-30x coverage
5 to 20
BIS-Seq
20-30x coverage
1/3
Genome Sequencing
20-30x coverage
1/3

miSeq

The miSeq uses the same libraries as the hiSeq2000. It generates only about 15 million fragments per lane, but runs very quickly, and can generate reads as long as 150bp (or longer).
It is good for sample testing, miR-Seq, amplicon sequencing, or the techniques above applied to small, e.g., bacterial genomes.

Thursday, October 4, 2012

Analysis - ChIP-Seq - Regional Enrichment

Introduction

This tool is used to identify statistically significant enrichment on regions that are defined relative to annotated regions of the genome, rather than to regions defined by the ChIP-Seq data itself.  It is typically used when the pattern of the ChIP-Seq target is so diffuse that standard peak callers have a difficult time identifying regions of enrichment.  In this case we will use regions that are of a priori interest, such as promoters, gene bodies, CpG islands , etc. that are likely regions to contain enrichment for the ChIP target.  The ngsc-chipseq-RegionalEnrichment tool counts reads on these pre-defined regions of interest for both a ChIP and a control (usually input) sample, then uses a Fisher exact test and Benjamini-Hochberg correction to assess the enrichment of the ChIP signal on the region.  A new track will be loaded with the enrichment ratio as the score and the p-value and FDR.

Details

A pseudocount of 1 is added to each region when computing the Fisher test.