[mira_talk] Re: misassembly problems
- From: "Giuseppe D'Auria"<Giuseppe.Dauria@xxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Tue, 31 Mar 2009 08:53:18 +0200 (CEST)
Thank you Jan, Bastien,
the -highlyrepetitive switch helped a lot improving my assembly and
reducing misassemblies.
I am now happy to swim in "correct" repeats.
bye
Giuseppe
> On Friday 27 March 2009 Giuseppe D'Auria wrote:
> > I assembled a really complicate microbial genomes full of IS (full I
> > mean really full). I found several, I think, misassembled reads. The
> > project is half-plate GS-FLX20 Paired-Ends assembly. No much complicated
> > for mira (less than 4h in accurate mode), these are the parameters:
>
> Hi Guiseppe,
>
> I learned the hard way that some bacteria really are almost as awful as
> eukaryotic genomes. IS can be one cause, multiple phages/prophage in
high copy
> number within the genome another
>
> > [...]
> > I decided to increase the 'nop' to 12 and 'rbl' to 6 whit the hope this
> > can improve my previous attempt I applied just using standard parameters
> > (accurate mode).
>
> Presently, going beyond 7 or 8 passes probably does not help too much,
> bacteria I've seen tend to stabilise quite quickly. To be honest, the
7 passes
> of "accurate" 454 assemblies are also more a feature that was pretty
useful
> for GS20 sequences, FLX generally would need less (but I still keep it
as many
> people still have some GS20 data).
>
>
> > Go to the problem.
> > I found several contigs whit reads probably erroneously assembled (look
> > at the light-blue A at position 8430).
>
> That's a weakness of the current assembly engine: if it does not
recognise a
> repeat correctly, it is to lenient in handling the sequences and the
result is
> what you see.
>
> Actually, this is my current development area since a few weeks and I
think
> that sometime in April, I'll be ready to launch a version with a new
assembly
> engine which should be ... pretty good, according to first results.
>
> > I said misassembled because the
> > respective forward or reverse partner is in another contig and if I
> > disassemble and try to join again (manually) it make sense. The problem
> > is that this events causes wrong contigs whit big problem when I go to
> > Gap4 (people call it finishing .... ironic ???).
> > Can I fix parameters in order to avoid this kinds of errors in the
> > contigs, if yes which one?.
>
> At the moment, your best option for the assembly is, as Jan suggested,
the -
> highlyrepetitive switch. It can help a lot there. One of the major
helper flags
> that get set by -highlyrepetitive is the option to mask nasty repeats
during
> skim (-SK:mnr). You might perhaps want to adapt -SK:rt as the current
default
> value of "8" could be a bit too harch (try 4 first, if there are still
too many
> misassemblies, increase in steps of 2).
>
> There's another thing you could do: if you find such cases of obvious
> misassembly in gap4, mark the bases of *all* reads in the column that
shows a
> misassembly with the tag "SRMr". Then, once done, convert the gap
database
> back to CAF and use this as input for a de-novo assembly (switching of
all
> clippings this time). MIRA will use the newly marked bases as repeat
markers
> and won't do the same mistake.
>
> Regards,
> Bastien
>
>
>
> --
> You have received this mail because you are subscribed to the
mira_talk mailing list. For information on how to subscribe or
unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
>
>
--
******************************************
Dr. Giuseppe D'Auria
Cavanilles Institute for
Biodiversity and Evolutionary Biology
University of Valencia
"Poligono de la Coma" s/n
46980 Paterna (Valencia),Spain
web: http://www.uv.es/cavanilles/genevol/
tel: +34 9635 43646
******************************************
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: