On Montag 29 März 2010 Tony Travis wrote: > Looks like I spoke too soon: We are still having problems doing mapping > assemblies of ~40K Sanger reads + 1.2M 454 reads. MIRA crashes with an > [...] > error complaining that 'rr_####<n>####' is longer than 29900 base-pairs: > [...] > Are 'rr_####<n>####' synthetic reads from suspected repeat regions? Errrm, not quite. RR stands for "Rail-Reads". Synthetic reads which make the overall life in mapping a bit easier when (mis)using an assembly engine that was originally built for de-novo. But ... I'm wondering about the length of the rails MIRA built. Say, some of your Sanger/454 reads wouldn't be some artificial "reads" with a length of >= 15k or so? Can you please send me the complete output log? Then I might know more ... I don't think it's a bug but it might be something where I can put in some additional checks. In the mean time: restart an assembly and force -SB:brl:bro to fixed values. For "real" data with a Sanger/454 mix I think that -SB:brl=2000:bro=1000 should be ok. > [...] > I'm not surprised that the number of padded bases is different, but I am > surprised that MIRA has deleted bases from the read! MIRA has an integrated editor. It actually *edits* reads (Sanger, 454 and Solexa) if the situation allows for. On the other hand, the ACE / phdball combo has no way whatsoever to actually model these edits. This is the main reason why ACE is *evil* and should not be used. > This makes it > impossible to use a 'phd.ball' created from the original 'phd' files, or > the fasta.screen and fasta.screen.qual created by phredPhrap, to load > quality values for the Sanger reads when using Consed to view an 'ace' > file produced by a MIRA assembly. Consed complains that the bases in the Welcome to the club. I perhaps should try to contact David about supporting other input formats ... > The chromatogram 'scf' files are not accessible to MIRA, and I did not > expect any automatic editing of the reads to be done: > > Edit options (-ED): > > Automatic contig editing (ace) : [san] no > > [454] yes > > Sanger only: > > Strict editing mode (sem) : no > > Confirmation threshold in percent (ct) : 50 > > Am I misunderstanding something about how MIRA manages its reads? No, not at all. I might need to look up the exact circumstances, but I think what happened was this: - a hybrid contig was built. - MIRA sees it is allowed to use contig editing with 454 reads - it uses 454 reads (only) to build editing hypotheses - some of them are: "delete this whole column" - unfortunately, Sanger reads also cover this column and therefore the corresponding base there also gets delete. - hence, some Sanger reads also get edited. I do agree that this is somewhat surprising ... need to make this clearer in the docs. > What I want to do is examine a MIRA assembly in Consed with library and > quality information included: Normally, a 'phd.ball' file is used for > this purpose. Newbler, for example, creates a 'phd.ball' automatically > when producing an 'ace' file. Could MIRA create a 'phd.ball' to load > quality scores and library info into Consed for the MIRA ace file? I suppose it could. I never found the time to analyse the format of these things though ... there are quite a number of other things I need to tackle first. I spoiled: as I use gap4 with CAF (and caf2gap), I don't have any of these problems :-) Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html