Is this an argument for, when using PacBioToCA (which you'll have to build from source) to correct the filtered reads with CCS or NGS data, using the -maxGap command line parameter or maxUncorrectedGap value in the spec file set to a length 1-2 bp less than the length of the adaptor? On Fri, Aug 16, 2013 07:58 PM Chris Hoefler <hoeflerb@xxxxxxxxx> wrote: > >>Bonus question: are PB adaptor sequences listed somewhere on the net? The >only place I found some are in the metadata >XML files, and they told me >> ATCTCTCTCttttcctcctcctccgttgttgttgttGAGAGAGAT >> >>Are there others? > >I think that is the only adapter in use at the moment. > >Do either of these help? >https://s3.amazonaws.com/files.pacb.com/pdf/Guide_Pacific_Biosciences_Template_Preparation_and_Sequencing.pdf >http://www.smrtcommunity.com/servlet/servlet.FileDownload?file�P7000000HYU49EAH > >This is the only official documentation from PacBio that I could find about >their adapter sequences and barcodes. > >>Background: I'm working on the read improvement routines atm and I think >that in the 49 PB reads I took as initial test set (out of >30k from the >E.coli Nature paper), already two reads show such an inversion where there >should be none … ergo it's a sequencing artefact and 4% of reads like this >will wreak havoc with most assembly algorithms. I hate situations like >these. > >How long are these chimeras? The worst offenders can probably be removed by >filtering read lengths and quality scores. But apparently these artifacts >do appear in longer reads at a non-negligible level as a result of the way >the libraries are constructed. >http://www.microbiomejournal.com/content/1/1/10 > >The PacBioToCA paper puts the number at ~2.5%. HGAP gets rid of these >during the preassembly step by looking at the quality of the error >correction. If there is a chimeric seed reed, the short reads won't align >across the junction of the inversion, resulting in a "coverage gap" in the >preassembler alignment. These gaps are identified by a low consensus >quality in the middle of the read. A filtering script splits the read at >this low quality region and trims the ends back to the high quality region. >That way you don't have to get rid of the read entirely and can still make >use of the non-inverted portions. > > >On Fri, Aug 16, 2013 at 2:50 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > >> On Aug 15, 2013, at 1:34 , Matthew D. Pagel <pagel@xxxxxxx> wrote: >> > Is there a quick-and-dirty algorithm out there for identifying >> inversions from >> > one subread to the next within a single PB read >> >> I'd have a more pressing, but similar question at the moment: is there a >> way of easily identifying reads which for such a FR structure but where the >> PB algorithms apparently did not recognise an adapter? >> >> Background: I'm working on the read improvement routines atm and I think >> that in the 49 PB reads I took as initial test set (out of >30k from the >> E.coli Nature paper), already two reads show such an inversion where there >> should be none … ergo it's a sequencing artefact and 4% of reads like this >> will wreak havoc with most assembly algorithms. I hate situations like >> these. >> >> Bonus question: are PB adaptor sequences listed somewhere on the net? The >> only place I found some are in the metadata XML files, and they told me >> ATCTCTCTCttttcctcctcctccgttgttgttgttGAGAGAGAT >> >> Are there others? >> >> B. >> -- >> You have received this mail because you are subscribed to the mira_talk >> mailing list. For information on how to subscribe or unsubscribe, please >> visit http://www.chevreux.org/mira_mailinglists.html >> > > > >-- >Chris Hoefler, PhD >Postdoctoral Research Associate >Straight Lab >Texas A&M University >2128 TAMU >College Station, TX 77843-2128 _______________________________________________________ Matt Pagel Graduate Student - Laboratory of Don Bryant Penn State Biochemistry, Microbiology and Molecular Biology 104 S. Frear (mail); 223 S. Frear (biological material) 230 S. Frear (office) State College, PA 16802 -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html