On Aug 17, 2013, at 1:58 , Chris Hoefler <hoeflerb@xxxxxxxxx> wrote: > Do either of these help? > https://s3.amazonaws.com/files.pacb.com/pdf/Guide_Pacific_Biosciences_Template_Preparation_and_Sequencing.pdf > http://www.smrtcommunity.com/servlet/servlet.FileDownload?file=00P7000000HYU49EAH > > This is the only official documentation from PacBio that I could find about > their adapter sequences and barcodes. See? And I didn't find them, though I did use Google with quite a number of different keywords. Probably the wrong ones. My thanks to Matthew and you :-) > How long are these chimeras? The worst offenders can probably be removed by > filtering read lengths and quality scores. But apparently these artifacts do > appear in longer reads at a non-negligible level as a result of the way the > libraries are constructed. > http://www.microbiomejournal.com/content/1/1/10 > > The PacBioToCA paper puts the number at ~2.5%. HGAP gets rid of these during > the preassembly step by looking at the quality of the error correction. If > there is a chimeric seed reed, the short reads won't align across the > junction of the inversion, resulting in a "coverage gap" in the preassembler > alignment. These gaps are identified by a low consensus quality in the middle > of the read. Does it show I did not read the PacBioToCA paper (yet)(intentionally)? I want to develop own ideas for "best practice" when learning the characteristics of new sequencing technologies before looking at what others have done. But feel free to cite from papers when appropriate :-) Incidentally, the above strategy crossed my mind sometime this week when discovering those chimeras. I discarded it after some more thoughts because I think it will lead to too many "false positives," i.e., one would break otherwise perfect reads. I do have an idea how to make it differently, but I'll need to work out a couple of things first. B.