[mira_talk] 454/PacBio hybrid & PacBio mapping assemblies with Mira

  • From: Stephen LeGrande <stlegrande@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 05 May 2013 23:32:01 +0200

Hi,

I have to assemble three plant mitochondrial genomes (3 lines from the same species). The expected genome sizes lie around 500 kb (either as a single circular molecule and/or a few smaller sub-genomic ones). We have good quality 454 data for all the three genotypes (250,000 to 350,000 reads for each lines, up to 150x genome coverage). Assembling the 454 reads results in 8 to 15 large contigs, the largest is being around 200 kb in each lines. In addition to the 454 data, we recently have obtained PacBio sequences at about 200x genome coverage for all three lines. The mean length of the PacBio reads lies between 1 and 1.5 kb, the longest reads are nearly 10 kb long.

I am using error corrected PacBio reads for the assemblies.
(Error correction is a separate issue and maybe later on I will start a new thread about this. My question now concerns mapping assemblies using PacBio data.)

Hybrid assemblies using 454 plus error corrected PacBio data work fine with Mira. The contigs from the hybrid assemblies are generally longer than thats from the 454-only assemblies. It is very nice to see how Pacbio reads bridges over sequences that were on separate, shorter contigs when using just 454 data. Interestingly, I get even longer contigs from PacBio-only assemblies. However, some gaps can still could't be filled up, and several discrepancies can be seen when comparing PacBio-only and hybrid(PB+454) contigs.

I came to the idea to investigate different versions of problematic contigs by re-mapping PacBio reads onto them and looking up which configurations are better supported.

And finally I am now coming to my proper question:
While mapping assemblies using long contigs as backbones and 454 sequences as short reads generally work fine - until now, I have been unable to map PacBio reads onto the same backbones - even when using just one single contig as reference.

Mira stops every time at a the same stage of the assembly:
==================================.
.
Filtering forward skims.
.
.
Done.
Filtering complement skims.
.
.
Done.
Done all filtering.
.
Making alignments.

Aligning possible forward matches:
[0%]
====================================================
Mira quits with core dump every times at this point. This only happens with PacBio reads.

In the manifest file I mostly have just default PacBio settings. I have played a little bit around with changing alignment- and backbone parameters - without any success.

I am using Mira 3.9.15 and have plenty of RAMs on a Linux cluster.


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: