[mira_talk] multiple bacteria strains in my sequencing run

  • From: Scott Christley <schristley@xxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 24 Jun 2015 15:35:52 -0500

Hello,

I have an Illumina paired-end 2x150 sequencing run of about 30 million reads
for a wildtype bacteria sample. The sample came from a gut microbiome and
Enterococcus faecalis was extracted using a selection culture plate. It is my
belief that this sample actually contains a mixture of multiple strains of E.
faecalis. This is okay though, in fact this is very much what I’m interested
in. I want to be able to study this natural mixture of strains and analyze the
genomic variation. I have a question about Mira’s output and whether my
interpretation of the assembly is correct. Also I’m curious if anybody has
comments on my process.

I first aligned (bowtie2) all my reads to a reference genome, which was about
70% of the reads. Then I took the unaligned reads and aligned them to a set of
plasmids, etc., to remove that stuff. Then the remaining unaligned reads I
gave to mira to assemble. The result is about 20k+ contigs, the default long
contig filter gives a few hundred contigs. I’ve gone and aligned many of these
contigs to the reference genome, and quite a few mapped to genes.

My question is, am I correct in assuming that these assemblies are valid
alternative sequences for genes? That is, they could be sequences for other
strains in my sample?

If I had a sample which I knew had (say) 5 strains within, where each strain
had a different sequence for a gene, will Mira provide me with 5 separate
assemblies (presuming each gene was distinct enough)?

thanks!
Scott


--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html

Other related posts: