[mira_talk] Re: mira 3.0.1 crashes in mapping mode

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 30 Mar 2010 01:04:13 +0200

On Montag 29 März 2010 Tony Travis wrote:
> Looks like I spoke too soon: We are still having problems doing mapping
> assemblies of ~40K Sanger reads + 1.2M 454 reads. MIRA crashes with an
> [...]
> error complaining that 'rr_####<n>####' is longer than 29900 base-pairs:
> [...]
> Are 'rr_####<n>####' synthetic reads from suspected repeat regions?

Errrm, not quite. RR stands for "Rail-Reads". Synthetic reads which make the 
overall life in mapping a bit easier when (mis)using an assembly engine that 
was originally built for de-novo.

But ... I'm wondering about the length of the rails MIRA built. Say, some of 
your Sanger/454 reads wouldn't be some artificial "reads" with a length of >= 
15k or so?

Can you please send me the complete output log? Then I might know more ... I 
don't think it's a bug but it might be something where I can put in some 
additional checks.

In the mean time: restart an assembly and force -SB:brl:bro to fixed values. 
For "real" data with a Sanger/454 mix I think that -SB:brl=2000:bro=1000 
should be ok.

> [...]
> I'm not surprised that the number of padded bases is different, but I am
> surprised that MIRA has deleted bases from the read! 

MIRA has an integrated editor. It actually *edits* reads (Sanger, 454 and 
Solexa) if the situation allows for.

On the other hand, the ACE / phdball combo has no way whatsoever to actually 
model these edits. This is the main reason why ACE is *evil* and should not be 
used. 

> This makes it
> impossible to use a 'phd.ball' created from the original 'phd' files, or
> the fasta.screen and fasta.screen.qual created by phredPhrap, to load
> quality values for the Sanger reads when using Consed to view an 'ace'
> file produced by a MIRA assembly. Consed complains that the bases in the

Welcome to the club. I perhaps should try to contact David about supporting 
other input formats ...

> The chromatogram 'scf' files are not accessible to MIRA, and I did not
> expect any automatic editing of the reads to be done:
> >   Edit options (-ED):
> >         Automatic contig editing (ace)              :  [san]  no
> >                                                        [454]  yes
> >      Sanger only:
> >         Strict editing mode (sem)                   : no
> >         Confirmation threshold in percent (ct)      : 50
> 
> Am I misunderstanding something about how MIRA manages its reads?

No, not at all. I might need to look up the exact circumstances, but I think 
what happened was this:

- a hybrid contig was built.
- MIRA sees it is allowed to use contig editing with 454 reads
- it uses 454 reads (only) to build editing hypotheses
- some of them are: "delete this whole column"
- unfortunately, Sanger reads also cover this column and therefore the
  corresponding base there also gets delete.
- hence, some Sanger reads also get edited.

I do agree that this is somewhat surprising ... need to make this clearer in 
the docs.

> What I want to do is examine a MIRA assembly in Consed with library and
> quality information included: Normally, a 'phd.ball' file is used for
> this purpose. Newbler, for example, creates a 'phd.ball' automatically
> when producing an 'ace' file. Could MIRA create a 'phd.ball' to load
> quality scores and library info into Consed for the MIRA ace file?

I suppose it could. I never found the time to analyse the format of these 
things though ... there are quite a number of other things I need to tackle 
first. I spoiled: as I use gap4 with CAF (and caf2gap), I don't have any of 
these problems :-)

Regards,
  Bastien

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: