[mira_talk] Re: 5' trimming of partial adapters

  • From: Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 19 Jul 2011 18:35:43 -0500

Thanks for the credit John but that wasn't quite what I was getting at.
Pyrosequencing is error prone as are most comparable technologies.  The 454
software makes some compensation for this and allows something like a 2 bp
mismatch when detecting MID tag sequences.  This might also be the case for
detecting the B adaptor sequence if the data was generated with older
chemistry (i.e. prior to the Rapid Library MID tags).  My understanding is
that they specifically chose the RL MID sequences so that even with a 2 bp
mismatch they could still be unambiguously assigned.  However, we've been
messing around with protocol modifications and have come across instances
where the ligation messes up the end of the adaptor sequence so that it
gets truncated and has more than the 2 bp mismatch that is needed for
filtering.  Subsequently these don't get trimmed automatically and need
some tender loving care to make things right.   However, your solution is
ultimately the best one to take.  If the first or last few bases of the
sequences look like they are crap they probably are so just get rid of
them.

Bob - when it comes to Illumina data we're in the same boat.  We just had
ours installed and are working up our first run.  Until I've had a chance
to work with the data I really can't say much.  The rest of the group have
far more experience in this area than I do.

Shaun





From:   Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
To:     mira_talk@xxxxxxxxxxxxx
Date:   2011-07-19 05:26 PM
Subject:        [mira_talk] Re: 5' trimming of partial adapters
Sent by:        mira_talk-bounce@xxxxxxxxxxxxx



Dear John,
    Any suggestions with regard to Illumina reads?

    Regards,
    Bob

John Nash wrote:
      My colleague, Shaun Tyler (also on this list), tells me that with 454
      sequencing, there can be concatenation of the end adaptors to make
      dimers. In my hands, the second mer is often missing a base or two,
      and it's not removed by the primary clipping.  sff_extract usually
      screams at me when that happens, and so I re-invoke it with "
      --min_left_clip=16" or somesuch.

      John



      On 2011-07-19, at 6:00 PM, Robert Bruccoleri wrote:

            In some of the genome assembly projects that I'm working on, I
            see an uneven GC content at the beginning (first 10 bases) of
            my reads. Since the library preparation is expected to be
            unbiased, uneven GC content suggests that there is a
            contaminant sequence at the beginning of some of my reads.

            Let's assume for the sake of argument that the contaminant
            sequence is a short subsequence of an adapter, but it's too
            short to identify by sequence similarity. Does anyone have any
            ideas about how to handle the problem besides trimming the 5'
            end? Does the option -CL:possible_vector_leftover_clip handle
            this type of problem?

            Thanks. --Bob

            <bruc.vcf>

[attachment "bruc.vcf" deleted by Shaun Tyler/HC-SC/GC/CA]

GIF image

Other related posts: