Hi, I sent some sff files to SRA database (not released yet). According to my experience, sff files are not trimmed for low-quality bases callings or putative MID tags of a multiplex assay. It's a donstream process in the 454 processing data. But I think all adaptor sequnces are trimmec in the 454 output. You should ask the people who sent the files. I thought univec could be useful. What about SeqClean?? Hope this helps. 2010/11/11 Robin Kramer <kodream@xxxxxxxxx> > Thanks, > > I think we might be able to contact the people who submitted the data > to find out more. > > I did use those sequences to search through the assembled contigs and > they did occur more places, mostly at the ends of the contigs, and of > course when I blast those against NR the ends were matching some other > odd sequence, not similar to the rest of the sequence. > > I also sent the suspected adapters through univec but there were not > significant matches. > > Looking at the pileups, it does look like in most cases the adapter > was trimmed, because there is a sharp drop in coverage at the > suspected adapter site, leaving what looks to be adapters that had > errors in them during sequencing. > > I don't know at what stage that trimming happened. > > Sincerely yours, > > Robin > > > > On 11/11/10, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > > On Donnerstag 11 November 2010 Robin Kramer wrote: > >> [...] > >> Is there any consensus on recleaning the 454 adapters? > > > > Hello Robin, > > > > I'd use either SSAHA2 as described in the manual or, perhaps, SMALT (the > > successor). Note that only the next public release will include a patch > to > > read SMALT results. > > > >> I don't even know what the sequences would be to expect. > > > > For that I also have no idea. I'd look at the contigs you described and > take > > the "adaptor looking" sequence and compare it to known adaptors / > linkers, > > perhaps you get lucky and find a hit. > > > > If not, several possibilities: > > 1) if the adaptor is long enough and present often enough, it may have > been > > flagged by the repeat recognising routines. With the fragments you have, > > search the "*_info_readrepeats.lst" file whether you find them more > often, > > extract similar sequences "by hand" and you should be able to pretty > quickly > > reconstruct the polluting sequence(s). > > 2) if the above yields nothing, you can try to ignore them by telling > MIRA > > to > > use only Smith-Waterman overlaps of a certain length (-AL:mo=40). On the > > downside, this might split low coverage / weakly linked transcripts > > 3) optionally (together with 2 above), try also switching on -CL:pec=yes > > It's > > switched off per default for EST assemblies because you will loose most > of > > the > > low coverage transcripts and probably also the ends of transcripts, but > you > > will get rid of a lot of noise. > > > > Hope that helps, > > Bastien > > > > -- > > You have received this mail because you are subscribed to the mira_talk > > mailing list. For information on how to subscribe or unsubscribe, please > > visit http://www.chevreux.org/mira_mailinglists.html > > > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- Jordi