Saturday, April 28, 2012

Downloading Data

The Basics

We have configured the website to use your account usernames and passwords to access all files by PI (if you are a PI or lab member) or by investigation if you have collaborator status.

We have updated the download area to provide all data at a URL link this:

  https://fgc.genomics.upenn.edu/Experiments/PI

You will be prompted for a user name and password, enter the credentials you use to log in to the rest of the site.

If you don't find the data you are looking for, check the deprecated links desribed below and let us know of the omission.

How the Files are Organized

Under each PI are a number of folders that correspond to experiments (the old style) or investigations and (under that) studies (which is the new style.)

Within each experiment or investigation, there are few common places to look for data.

'Raw' Data

This data is not really raw, but has not undergone any real analysis beyond alignment

  • basic/Fastq - files of read sequence in FASTQ format
  • basic/Fasta - files of read sequence in FASTA format
  • basic/Solexa - older style sequence or ELAND output files
  • basic/Export - alignment information split into unique and not-usable (repeat) 
  • basic/BedFiles - BED file format of uniquely aligning reads and perhaps SHP output files

Analysis Results

These are typically places under the Analysis folder.  Different types of analysis are organized in subfolder under that.

Tips and Tricks

WIG files for Profiles

At the moment, we generate profile data at full resolution, i.e., including the changes resulting from each read.   Sometimes, these files can be found in basic/BedFiles with names in the form FGCNNNN_s_L-ucsc.ushp.tab.gz, e.g., FGC0138_s_7-ucsc.ushp.tab.gz.  These files are not in WIG format - they are just chromosome, begin, end, score.  They can be converted to WIG format with relatively simple programs.

WIG format data can be obtained using the download feature in the TessLA browser.  However, because modern data sets are so large this approach is not very effective.

Depcreated URLs

Because the files sit on two different filesystems you have to look at two, slightly, different URLs:

  https://fgc.genomics.upenn.edu/Experiments-1/PI

or


  https://fgc.genomics.upenn.edu/Experiments-2/PI

where, of course, you replace PI with your PI's last name.  Note the https, that's important.


2 comments:

  1. Is is possible to access the data through ftp?

    ReplyDelete
  2. We do not offer FTP access, but many graphical and command-line programs can accomplish downloads using HTTP.

    ReplyDelete