Hello,
I have an Illumina paired-end 2x150 sequencing run of about 30 million reads
for a wildtype bacteria sample. The sample came from a gut microbiome and
Enterococcus faecalis was extracted using a selection culture plate. It is my
belief that this sample actually contains a mixture of multiple strains of E.
faecalis. This is okay though, in fact this is very much what I’m interested
in. I want to be able to study this natural mixture of strains and analyze the
genomic variation. I have a question about Mira’s output and whether my
interpretation of the assembly is correct. Also I’m curious if anybody has
comments on my process.
I first aligned (bowtie2) all my reads to a reference genome, which was about
70% of the reads. Then I took the unaligned reads and aligned them to a set of
plasmids, etc., to remove that stuff. Then the remaining unaligned reads I
gave to mira to assemble. The result is about 20k+ contigs, the default long
contig filter gives a few hundred contigs. I’ve gone and aligned many of these
contigs to the reference genome, and quite a few mapped to genes.
My question is, am I correct in assuming that these assemblies are valid
alternative sequences for genes? That is, they could be sequences for other
strains in my sample?
If I had a sample which I knew had (say) 5 strains within, where each strain
had a different sequence for a gene, will Mira provide me with 5 separate
assemblies (presuming each gene was distinct enough)?
thanks!
Scott
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html