[mira_talk] Re: multiple bacteria strains in my sequencing run

  • From: Scott Christley <schristley@xxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 29 Jun 2015 15:03:19 -0500

Thanks Chris! When I hear the word repeat, I tend to think about low
complexity regions, transposons and so forth in eukaryotic genomes, which made
we want to double check. I will definitely give those sections of the manual a
careful read.

cheers
Scott

On Jun 29, 2015, at 12:35 PM, Chris Hoefler <hoeflerb@xxxxxxxxx> wrote:

<quote>If I had a sample which I knew had (say) 5 strains within, where each
strain had a different sequence for a gene, will Mira provide me with 5
separate assemblies (presuming each gene was distinct enough)?</quote>

Short answer: yes

Longer answer:
Mira is designed to distinguish between repeats with single nucleotide
variations. In the context of a single organism, Mira will assemble
repetitive regions into separate contigs if it detects differences in those
repeats. In the context of multiple organisms (or a single organism with
multiple chromosome copies), nearly identical contigs that differ by single
nucleotides and originate from different organisms (chromosomal copies) will
be assembled separately. The caveat to this is a combination of coverage
depth, coverage consistency, and sequencing errors. To distinguish between
sequencing error and true variations, Mira relies on kmer frequencies which
are heavily influenced by coverage variations. So if you have too much
coverage, too little coverage, or large differences in coverage,
repeats/variations can be missed or sequencing errors can be called as
repeats. So while in an ideal scenario you would get 5 assembled genes for 5
organisms in a pool, in reality you will likely get more or less than that.

That said, Mira will do everything it can to avoid misassemblies. So if there
is sufficient evidence of two non-identical gene copies, it won't assemble
them together. Mira also makes heavy use of tags to let you know how it makes
decisions regarding contig building and breaking. So definitely look at the
tags when you do your analysis (SRMc and SROr are probably the ones to focus
on the most).

There is a lot of good information in the manual about the tags and how Mira
makes decisions regarding potential repeats. Sections of particular interest,
3.7 Tags used in the assembly by MIRA and EdIt.
3.8 Where reads end up: contigs, singlets, debris
3.9 Detection of bases distinguishing non-perfect repeats and SNP discovery
3.11.2 Ploidy and repeats
3.11.3 Handling of repeats
9.2 First look: the assembly info
9.5 Places of importance in a de-novo assembly



On Wed, Jun 24, 2015 at 3:35 PM, Scott Christley <schristley@xxxxxxx> wrote:
Hello,

I have an Illumina paired-end 2x150 sequencing run of about 30 million reads
for a wildtype bacteria sample. The sample came from a gut microbiome and
Enterococcus faecalis was extracted using a selection culture plate. It is
my belief that this sample actually contains a mixture of multiple strains of
E. faecalis. This is okay though, in fact this is very much what I’m
interested in. I want to be able to study this natural mixture of strains
and analyze the genomic variation. I have a question about Mira’s output and
whether my interpretation of the assembly is correct. Also I’m curious if
anybody has comments on my process.

I first aligned (bowtie2) all my reads to a reference genome, which was about
70% of the reads. Then I took the unaligned reads and aligned them to a set
of plasmids, etc., to remove that stuff. Then the remaining unaligned reads
I gave to mira to assemble. The result is about 20k+ contigs, the default
long contig filter gives a few hundred contigs. I’ve gone and aligned many
of these contigs to the reference genome, and quite a few mapped to genes.

My question is, am I correct in assuming that these assemblies are valid
alternative sequences for genes? That is, they could be sequences for other
strains in my sample?

If I had a sample which I knew had (say) 5 strains within, where each strain
had a different sequence for a gene, will Mira provide me with 5 separate
assemblies (presuming each gene was distinct enough)?

thanks!
Scott


--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html



--
Chris Hoefler, PhD
Postdoctoral Research Associate
Straight Lab
Texas A&M University
2128 TAMU
College Station, TX 77843-2128

Other related posts: