[mira_talk] Re: 5' trimming of partial adapters

  • From: Robin Kramer <kodream@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 20 Jul 2011 06:53:52 -0600

This is news to me.  I have seldom found Illumina adapters in anything but
the standard orientation.  Sometimes there are reads that consist entirely
of an adapter and whatever else can be found.  Illumina isn't like 454 which
has free floating adapter chemistry, that can get chained together in any
orientation, and possibly multiple times.  Illumina adapters are fixed to a
surface prior to ligation.  Generally with Illumina it isn't a problem if
they are found we discard the whole read, since the proportion of adapter
contaminated sequences is so low, unless it is a small RNA type experiment,
which we would expect adapters in every read, then something like 454
adapter trimming has to be applied.

For assembly this poses much less of a problem, since there are seldom if
ever the forward and reverse complement of an adapter causing miss-joined
assemblies, though still annoying if they aren't trimmed or filtered in some
way they can end up at the ends of contigs.

Sincerely yours,

Robin

On Wed, Jul 20, 2011 at 2:03 AM, WATSON Mick <mick.watson@xxxxxxxxxxxxxxx>wrote:

> I can confirm that you will almost certainly find Illumina adapters and
> primers, sometimes merged together, sometimes whole, sometimes partial, in
> different orientations and all throughout the reads J****
>
> ** **
>
> If you find a nice solution, let me know ;)****
>
> ** **
>
> *From:* mira_talk-bounce@xxxxxxxxxxxxx [mailto:
> mira_talk-bounce@xxxxxxxxxxxxx] *On Behalf Of *Shaun Tyler
> *Sent:* 20 July 2011 00:36
>
> *To:* mira_talk@xxxxxxxxxxxxx
> *Subject:* [mira_talk] Re: 5' trimming of partial adapters****
>
> ** **
>
> Thanks for the credit John but that wasn't quite what I was getting at.
> Pyrosequencing is error prone as are most comparable technologies. The 454
> software makes some compensation for this and allows something like a 2 bp
> mismatch when detecting MID tag sequences. This might also be the case for
> detecting the B adaptor sequence if the data was generated with older
> chemistry (i.e. prior to the Rapid Library MID tags). My understanding is
> that they specifically chose the RL MID sequences so that even with a 2 bp
> mismatch they could still be unambiguously assigned. However, we've been
> messing around with protocol modifications and have come across instances
> where the ligation messes up the end of the adaptor sequence so that it gets
> truncated and has more than the 2 bp mismatch that is needed for filtering.
> Subsequently these don't get trimmed automatically and need some tender
> loving care to make things right. However, your solution is ultimately the
> best one to take. If the first or last few bases of the sequences look like
> they are crap they probably are so just get rid of them.
>
> Bob - when it comes to Illumina data we're in the same boat. We just had
> ours installed and are working up our first run. Until I've had a chance to
> work with the data I really can't say much. The rest of the group have far
> more experience in this area than I do.
>
> Shaun ****
>
> ** **
>
>
> [image: Inactive hide details for Robert Bruccoleri ---2011-07-19 05:26:29
> PM---Dear John, Any suggestions with regard to Illumina]Robert Bruccoleri
> ---2011-07-19 05:26:29 PM---Dear John, Any suggestions with regard to
> Illumina reads?
>
> From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
> To: mira_talk@xxxxxxxxxxxxx
> Date: 2011-07-19 05:26 PM
> Subject: [mira_talk] Re: 5' trimming of partial adapters
> Sent by: mira_talk-bounce@xxxxxxxxxxxxx****
> ------------------------------
>
>
>
>
> Dear John,
> Any suggestions with regard to Illumina reads?
>
> Regards,
> Bob
>
> John Nash wrote: ****
>
> My colleague, Shaun Tyler (also on this list), tells me that with 454
> sequencing, there can be concatenation of the end adaptors to make dimers.
> In my hands, the second mer is often missing a base or two, and it's not
> removed by the primary clipping. sff_extract usually screams at me when that
> happens, and so I re-invoke it with " --min_left_clip=16" or somesuch.
>
> John
>
>
>
> On 2011-07-19, at 6:00 PM, Robert Bruccoleri wrote:****
>
> In some of the genome assembly projects that I'm working on, I see an
> uneven GC content at the beginning (first 10 bases) of my reads. Since the
> library preparation is expected to be unbiased, uneven GC content suggests
> that there is a contaminant sequence at the beginning of some of my reads.
>
> Let's assume for the sake of argument that the contaminant sequence is a
> short subsequence of an adapter, but it's too short to identify by sequence
> similarity. Does anyone have any ideas about how to handle the problem
> besides trimming the 5' end? Does the option
> -CL:possible_vector_leftover_clip handle this type of problem?
>
> Thanks. --Bob
>
> <bruc.vcf>****
>
> [attachment "bruc.vcf" deleted by Shaun Tyler/HC-SC/GC/CA] ****
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>

Other related posts: