[mira_talk] Re: misassembly problems

Thank you Jan, Bastien,

the -highlyrepetitive switch helped a lot improving my assembly and
reducing misassemblies.
I am now happy to swim in "correct" repeats. 
bye

Giuseppe




> On Friday 27 March 2009 Giuseppe D'Auria wrote:
> > I assembled a really complicate microbial genomes full of IS (full I
> > mean really full). I found several, I think, misassembled reads. The
> > project is half-plate GS-FLX20 Paired-Ends assembly. No much complicated
> > for mira (less than 4h in accurate mode), these are the parameters:
> 
> Hi Guiseppe,
> 
> I learned the hard way that some bacteria really are almost as awful as 
> eukaryotic genomes. IS can be one cause, multiple phages/prophage in
high copy 
> number within the genome another
> 
> > [...]
> > I decided to increase the 'nop' to 12 and 'rbl' to 6 whit the hope this
> > can improve my previous attempt I applied just using standard parameters
> > (accurate mode).
> 
> Presently, going beyond 7 or 8 passes probably does not help too much, 
> bacteria I've seen tend to stabilise quite quickly. To be honest, the
7 passes 
> of "accurate" 454 assemblies are also more a feature that was pretty
useful 
> for GS20 sequences, FLX generally would need less (but I still keep it
as many 
> people still have some GS20 data).
> 
> 
> > Go to the problem.
> > I found several contigs whit reads probably erroneously assembled (look
> > at the light-blue A at position 8430).
> 
> That's a weakness of the current assembly engine: if it does not
recognise a 
> repeat correctly, it is to lenient in handling the sequences and the
result is 
> what you see.
> 
> Actually, this is my current development area since a few weeks and I
think 
> that sometime in April, I'll be ready to launch a version with a new
assembly 
> engine which should be ... pretty good, according to first results.
> 
> > I said misassembled because the
> > respective forward or reverse partner is in another contig and if I
> > disassemble and try to join again (manually) it make sense. The problem
> > is that this events causes wrong contigs whit big problem when I go to
> > Gap4 (people call it finishing .... ironic ???).
> > Can I fix parameters in order to avoid this kinds of errors in the
> > contigs, if yes which one?.
> 
> At the moment, your best option for the assembly is, as Jan suggested,
the -
> highlyrepetitive switch. It can help a lot there. One of the major
helper flags 
> that get set by -highlyrepetitive is the option to mask nasty repeats
during 
> skim (-SK:mnr). You might perhaps want to adapt -SK:rt as the current
default 
> value of "8" could be a bit too harch (try 4 first, if there are still
too many 
> misassemblies, increase in steps of 2).
> 
> There's another thing you could do: if you find such cases of obvious 
> misassembly in gap4, mark the bases of *all* reads in the column that
shows a 
> misassembly with the tag "SRMr". Then, once done, convert the gap
database 
> back to CAF and use this as input for a de-novo assembly (switching of
all 
> clippings this time). MIRA will use the newly marked bases as repeat
markers 
> and won't do the same mistake.
> 
> Regards,
>   Bastien
> 
> 
> 
> -- 
> You have received this mail because you are subscribed to the
mira_talk mailing list. For information on how to subscribe or
unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
> 
> 


--
******************************************
Dr. Giuseppe D'Auria

Cavanilles Institute for
Biodiversity and Evolutionary Biology
University of Valencia
"Poligono de la Coma" s/n
46980 Paterna (Valencia),Spain

web: http://www.uv.es/cavanilles/genevol/
tel: +34 9635 43646
******************************************




--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: