On Sunday 06 March 2011 15:15:03 Martin Mokrejs wrote: > 2. I sometimes see that actually a nucleotide from a homopolymer is moved > into +2 position, like: GGGGGA becomes GGGGAG. Again, I suspect this is > because the basecaller tries to reflect signal from previous flows. That's a "carry forward". See http://www.454.com/downloads/enabling-technologies/454_nature_article.pdf and from a later time perhaps http://genomebiology.com/content/pdf/gb-2007-8-7-r143.pdf > I am thinking of just dropping all n and N's from my 454 data and see what > happens with the assembly. ;-) Hmmm, nice experiment. Please tell whether you see an improvement. > What happens if user provides xml info as exported by e.g. sff_extract but > provides fasta sequences subsequently changed (converted more to lowercase > or vice versa). MIRA fill take all clipping info it can and apply "rightmost leftclip / leftmost rightclip" > Is it better to ignore the xml file or after interpreting > the xml clip points also try to extend the clippings based on the > lower-casing in fasta sequence files? (I suppose if I want to just use > lower-case clipping I would NOT provide any xml traceinfo). Depends on your use case. > That is bad, even worse because after me running mdust I have sequences > like: > > tgatgtgctgactgtgactgcAAATGCXXXXGATGCTGACTAAAtgcatcagXXXXXactgactgtgac Yup, this is why I suddenly got suspicious, went back to the code to look and then corrected for that ... and told about it. > I wonder if mira could print a note into its logfiles that the input > sequences contain N and X in upper/lower case and that the casing does > make a difference. Could save us a bit. No. > Similarly, if the clip positions seen in xml traceinfo do not match > lowercasing positions in fasta files ... Again, a good sanity check is > always helpful. Nope ... "rightmost leftclip / leftmost rightclip". There *could* be very good reasons for differeing clip info. > I think that did not answer my question. But from you example above, it > looks the internal, low-quality region is excised and the flanking > sequences are joined. Wow. No. > Or is the sequence within the masked region and > everything downstream clipped away? Or is the whole read discarded? These > were my questions. ;) Either upstream or downstream is discarded. Or none. Depends how far within the read the stretch is and whether MIRA attains it when observing -CL:mbcmfg and -CL:mbcmeg. Those stretches which cannot be reached remain as is in the sequence. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html