[mira_talk] Assembly rearrangements in the face of repeats

  • From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 07 Aug 2011 22:39:50 -0400

I have an interesting and difficult assembly that I'm attempting with Mira. I'm working with a bacteria that has a large number of Non Ribosomal Peptide Synthases (NRPS) and Poly Ketide Synthases (PKS) and there are many domain and gene duplications that have occurred during the course of evolution. The bacteria has a GC content in excess of 70%.


I have one gene in this bacteria that has a large number of domains, some of which are exactly duplicated (>500bp) in the gene. From the chemical structure of the compound made by this gene, I have a good idea of what the domain structure ought to be.

We have an extensive collection of data, both 454 and Illumina, for this bacteria. For Illumina, we have paired end data of various lengths. I've been experimenting with different combinations of data to see if I can get a complete assembly of the gene of interest above.

Just recently, I started a 'normal' Mira run using 3.4rc2, and I enabled intermediate FASTA output at every pass. On the second pass, Mira generated my gene with the expected pattern of domains. However, on succeeding passes, it eliminated some of the repetitive sequences, and at the end of the run, I had lost about 30% of the expected domains.

Has anyone else run into issues like these? How can I control the decision making with regard to repeats? Is there any way of having Mira report a graph of the possible assemblies (like Allpaths). (BTW, I don't have data that is suitable for Allpaths).

Thanks. --Bob

begin:vcard
fn:Robert Bruccoleri
n:Bruccoleri;Robert
org:Audacious Energy, LLC and Congenomics, LLC
adr:;;;;;;USA
email;internet:bruc@xxxxxxx
title:President
version:2.1
end:vcard

Other related posts: