>Bonus question: are PB adaptor sequences listed somewhere on the net? The only place I found some are in the metadata >XML files, and they told me > ATCTCTCTCttttcctcctcctccgttgttgttgttGAGAGAGAT > >Are there others? I think that is the only adapter in use at the moment. Do either of these help? https://s3.amazonaws.com/files.pacb.com/pdf/Guide_Pacific_Biosciences_Template_Preparation_and_Sequencing.pdf http://www.smrtcommunity.com/servlet/servlet.FileDownload?file=00P7000000HYU49EAH This is the only official documentation from PacBio that I could find about their adapter sequences and barcodes. >Background: I'm working on the read improvement routines atm and I think that in the 49 PB reads I took as initial test set (out of >30k from the E.coli Nature paper), already two reads show such an inversion where there should be none … ergo it's a sequencing artefact and 4% of reads like this will wreak havoc with most assembly algorithms. I hate situations like these. How long are these chimeras? The worst offenders can probably be removed by filtering read lengths and quality scores. But apparently these artifacts do appear in longer reads at a non-negligible level as a result of the way the libraries are constructed. http://www.microbiomejournal.com/content/1/1/10 The PacBioToCA paper puts the number at ~2.5%. HGAP gets rid of these during the preassembly step by looking at the quality of the error correction. If there is a chimeric seed reed, the short reads won't align across the junction of the inversion, resulting in a "coverage gap" in the preassembler alignment. These gaps are identified by a low consensus quality in the middle of the read. A filtering script splits the read at this low quality region and trims the ends back to the high quality region. That way you don't have to get rid of the read entirely and can still make use of the non-inverted portions. On Fri, Aug 16, 2013 at 2:50 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Aug 15, 2013, at 1:34 , Matthew D. Pagel <pagel@xxxxxxx> wrote: > > Is there a quick-and-dirty algorithm out there for identifying > inversions from > > one subread to the next within a single PB read > > I'd have a more pressing, but similar question at the moment: is there a > way of easily identifying reads which for such a FR structure but where the > PB algorithms apparently did not recognise an adapter? > > Background: I'm working on the read improvement routines atm and I think > that in the 49 PB reads I took as initial test set (out of >30k from the > E.coli Nature paper), already two reads show such an inversion where there > should be none … ergo it's a sequencing artefact and 4% of reads like this > will wreak havoc with most assembly algorithms. I hate situations like > these. > > Bonus question: are PB adaptor sequences listed somewhere on the net? The > only place I found some are in the metadata XML files, and they told me > ATCTCTCTCttttcctcctcctccgttgttgttgttGAGAGAGAT > > Are there others? > > B. > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- Chris Hoefler, PhD Postdoctoral Research Associate Straight Lab Texas A&M University 2128 TAMU College Station, TX 77843-2128