[mira_talk] 454/PacBio hybrid & PacBio mapping assemblies with Mira

From: Stephen LeGrande <stlegrande@xxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Sun, 05 May 2013 23:32:01 +0200

Hi,

I have to assemble three plant mitochondrial genomes (3 lines from thesame species).The expected genome sizes lie around 500 kb (either as a single circularmolecule and/or a few smaller sub-genomic ones).We have good quality 454 data for all the three genotypes (250,000 to350,000 reads for each lines, up to 150x genome coverage).Assembling the 454 reads results in 8 to 15 large contigs, the largestis being around 200 kb in each lines.In addition to the 454 data, we recently have obtained PacBio sequencesat about 200x genome coverage for all three lines. The mean length ofthe PacBio reads lies between 1 and 1.5 kb, the longest reads are nearly10 kb long.


I am using error corrected PacBio reads for the assemblies.

(Error correction is a separate issue and maybe later on I will start anew thread about this. My question now concerns mapping assemblies usingPacBio data.)

Hybrid assemblies using 454 plus error corrected PacBio data work finewith Mira. The contigs from the hybrid assemblies are generally longerthan thats from the 454-only assemblies. It is very nice to see howPacbio reads bridges over sequences that were on separate, shortercontigs when using just 454 data. Interestingly, I get even longercontigs from PacBio-only assemblies.However, some gaps can still could't be filled up, and severaldiscrepancies can be seen when comparing PacBio-only andhybrid(PB+454) contigs.

I came to the idea to investigate different versions of problematiccontigs by re-mapping PacBio reads onto them and looking up whichconfigurations are better supported.


And finally I am now coming to my proper question:

While mapping assemblies using long contigs as backbones and 454sequences as short reads generally work fine - until now, I have beenunable to map PacBio reads onto the same backbones - even when usingjust one single contig as reference.


Mira stops every time at a the same stage of the assembly:
==================================.
.
Filtering forward skims.
.
.
Done.
Filtering complement skims.
.
.
Done.
Done all filtering.
.
Making alignments.

Aligning possible forward matches:
[0%]
====================================================

Mira quits with core dump every times at this point. This only happenswith PacBio reads.

In the manifest file I mostly have just default PacBio settings. I haveplayed a little bit around with changing alignment- and backboneparameters - without any success.


I am using Mira 3.9.15 and have plenty of RAMs on a Linux cluster.


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Follow-Ups:
- [mira_talk] Re: 454/PacBio hybrid & PacBio mapping assemblies with Mira
  - From: Bastien Chevreux

[mira_talk] 454/PacBio hybrid & PacBio mapping assemblies with Mira

Other related posts: