Monday, April 9, 2012

Bacterial Contamination

The Problem

A ChIP-seq, RNA-Seq, or other library type looks good and sequences well, but has a very alignment percentage, i.e., much less than 50%.  Sometimes this is due to poor library construction that results in artefactual sequences, but once in a while the library is good but contains a large amount of bacterial sequence and relatively little of the intended species.

Confirming the Diagnosis

In the blastn section of the NCBI BLAST website () there is a 'Whole-genome shotgun contis (wgs)' database which contains sequence from a wide range of organisms.  If you convert 20 or so of your reads to FASTA format, you can paste them into the search window and see what species they match.  If you get a bunch of excellent matches to a species (other than the one you had hoped for) then that's the problem. The exact species can depend on the source of the contamination, i.e., local water, reagents, or from the bacteria in the host species, e.g., gut flora/fauna or dirt etc.

Tracking Down the Source

For bacterial contamination the easiest way to find the source is to do PCR on water samples to see if you can find ribosomal sequence.  The primers below hit the 16rDNA gene in a wide variety of species but do not match mammals.

The forward primer, 5'-TCCTACGGGAGGCAGCAGT-3'
the reverse primer, 5'-GGACTACCAGGGTATCTAATCCTGTT-3'
and the probe, (6-FAM)-5'-CGTATTACCGCGGCTGCTGGCAC-3'

Something's Fishy!

Another possible problem is the use of Salmon DNA as a blocking agent in ChIP-Seq experiments.  This is rare these days as we warn against it and the issue is more widely reported.  The blast search above should identify this problem as well.

No comments:

Post a Comment