[mira_talk] Re: 454 cleaning

  • From: Jordi Durban <jordi.durban@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 11 Nov 2010 21:41:02 +0100

Hi,
I sent some sff files to SRA database (not released yet). According to my
experience, sff files are not trimmed for low-quality bases callings or
putative MID tags of a multiplex assay.  It's a donstream process in the 454
processing data. But I think all adaptor sequnces are trimmec in the 454
output.
You should ask the people who sent the files.
I thought univec could be useful. What about SeqClean??
Hope this helps.
2010/11/11 Robin Kramer <kodream@xxxxxxxxx>

> Thanks,
>
> I think we might be able to contact the people who submitted the data
> to find out more.
>
> I did use those sequences to search through the assembled contigs and
> they did occur more places, mostly at the ends of the contigs, and of
> course when I blast those against NR the ends were matching some other
> odd sequence, not similar to the rest of the sequence.
>
> I also sent the suspected adapters through univec but there were not
> significant matches.
>
> Looking at the pileups, it does look like in most cases the adapter
> was trimmed, because there is a sharp drop in coverage at the
> suspected adapter site, leaving what looks to be adapters that had
> errors in them during sequencing.
>
> I don't know at what stage that trimming happened.
>
> Sincerely yours,
>
> Robin
>
>
>
> On 11/11/10, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
> > On Donnerstag 11 November 2010 Robin Kramer wrote:
> >> [...]
> >> Is there any consensus on recleaning the 454 adapters?
> >
> > Hello Robin,
> >
> > I'd use either SSAHA2 as described in the manual or, perhaps, SMALT (the
> > successor). Note that only the next public release will include a patch
> to
> > read SMALT results.
> >
> >> I don't even know what the sequences would be to expect.
> >
> > For that I also have no idea. I'd look at the contigs you described and
> take
> > the "adaptor looking" sequence and compare it to known adaptors /
> linkers,
> > perhaps you get lucky and find a hit.
> >
> > If not, several possibilities:
> > 1) if the adaptor is long enough and present often enough, it may have
> been
> > flagged by the repeat recognising routines. With the fragments you have,
> > search the "*_info_readrepeats.lst" file whether you find them more
> often,
> > extract similar sequences "by hand" and you should be able to pretty
> quickly
> > reconstruct the polluting sequence(s).
> > 2) if the above yields nothing, you can try to ignore them by telling
> MIRA
> > to
> > use only Smith-Waterman overlaps of a certain length (-AL:mo=40). On the
> > downside, this might split low coverage / weakly linked transcripts
> > 3) optionally (together with 2 above), try also switching on -CL:pec=yes
> > It's
> > switched off per default for EST assemblies because you will loose most
> of
> > the
> > low coverage transcripts and probably also the ends of transcripts, but
> you
> > will get rid of a lot of noise.
> >
> > Hope that helps,
> >   Bastien
> >
> > --
> > You have received this mail because you are subscribed to the mira_talk
> > mailing list. For information on how to subscribe or unsubscribe, please
> > visit http://www.chevreux.org/mira_mailinglists.html
> >
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>



-- 
Jordi

Other related posts: