[mira_talk] Re: polyA masking using mira internal routines

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 6 Mar 2011 16:27:13 +0100

On Sunday 06 March 2011 15:15:03 Martin Mokrejs wrote:
> 2. I sometimes see that actually a nucleotide from a homopolymer is moved
> into +2 position, like: GGGGGA becomes GGGGAG. Again, I suspect this is
> because the basecaller tries to reflect signal from previous flows.

That's a "carry forward". See 

  http://www.454.com/downloads/enabling-technologies/454_nature_article.pdf

and from a later time perhaps 

  http://genomebiology.com/content/pdf/gb-2007-8-7-r143.pdf

> I am thinking of just dropping all n and N's from my 454 data and see what
> happens with the assembly. ;-)

Hmmm, nice experiment. Please tell whether you see an improvement.

> What happens if user provides xml info as exported by e.g. sff_extract but
> provides fasta sequences subsequently changed (converted more to lowercase
> or vice versa).

MIRA fill take all clipping info it can and apply "rightmost leftclip / 
leftmost rightclip"

> Is it better to ignore the xml file or after interpreting
> the xml clip points also try to extend the clippings based on the
> lower-casing in fasta sequence files? (I suppose if I want to just use
> lower-case clipping I would NOT provide any xml traceinfo).

Depends on your use case.

> That is bad, even worse because after me running mdust I have sequences
> like:
> 
> tgatgtgctgactgtgactgcAAATGCXXXXGATGCTGACTAAAtgcatcagXXXXXactgactgtgac

Yup, this is why I suddenly got suspicious, went back to the code to look and 
then corrected for that ... and told about it.

> I wonder if mira could print a note into its logfiles that the input
> sequences contain N and X in upper/lower case and that the casing does
> make a difference. Could save us a bit.

No.

> Similarly, if the clip positions seen in xml traceinfo do not match
> lowercasing positions in fasta files ... Again, a good sanity check is
> always helpful.

Nope ...  "rightmost leftclip / leftmost rightclip". There *could* be very 
good reasons for differeing clip info.

> I think that did not answer my question. But from you example above, it
> looks the internal, low-quality region is excised and the flanking
> sequences are joined. Wow.

No.

> Or is the sequence within the masked region and
> everything downstream clipped away? Or is the whole read discarded? These
> were my questions. ;)

Either upstream or downstream is discarded. Or none. Depends how far within 
the read the stretch is and whether MIRA attains it when observing -CL:mbcmfg 
and -CL:mbcmeg. Those stretches which cannot be reached remain as is in the 
sequence.

B.

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: