[mira_talk] Re: PacBio CCS questions

From: "Matthew D. Pagel" <pagel@xxxxxxx>
To: Chris Hoefler <hoeflerb@xxxxxxxxx>, mira_talk@xxxxxxxxxxxxx
Date: Sat, 17 Aug 2013 04:22:55 -0400

Is this an argument for, when using PacBioToCA (which you'll have to build from
source) to correct the filtered reads with CCS or NGS data, using the -maxGap
command line parameter or maxUncorrectedGap value in the spec file set to a
length 1-2 bp less than the length of the adaptor?

On Fri, Aug 16, 2013 07:58 PM Chris Hoefler <hoeflerb@xxxxxxxxx> wrote:
>
>>Bonus question: are PB adaptor sequences listed somewhere on the net? The
>only place I found some are in the metadata >XML files, and they told me
>>   ATCTCTCTCttttcctcctcctccgttgttgttgttGAGAGAGAT
>>
>>Are there others?
>
>I think that is the only adapter in use at the moment.
>
>Do either of these help?
>https://s3.amazonaws.com/files.pacb.com/pdf/Guide_Pacific_Biosciences_Template_Preparation_and_Sequencing.pdf
>http://www.smrtcommunity.com/servlet/servlet.FileDownload?file�P7000000HYU49EAH
>
>This is the only official documentation from PacBio that I could find about
>their adapter sequences and barcodes.
>
>>Background: I'm working on the read improvement routines atm and I think
>that in the 49 PB reads I took as initial test set (out of >30k from the
>E.coli Nature paper), already two reads show such an inversion where there
>should be none … ergo it's a sequencing artefact and 4% of reads like this
>will wreak havoc with most assembly algorithms. I hate situations like
>these.
>
>How long are these chimeras? The worst offenders can probably be removed by
>filtering read lengths and quality scores. But apparently these artifacts
>do appear in longer reads at a non-negligible level as a result of the way
>the libraries are constructed.
>http://www.microbiomejournal.com/content/1/1/10
>
>The PacBioToCA paper puts the number at ~2.5%. HGAP gets rid of these
>during the preassembly step by looking at the quality of the error
>correction. If there is a chimeric seed reed, the short reads won't align
>across the junction of the inversion, resulting in a "coverage gap" in the
>preassembler alignment. These gaps are identified by a low consensus
>quality in the middle of the read. A filtering script splits the read at
>this low quality region and trims the ends back to the high quality region.
>That way you don't have to get rid of the read entirely and can still make
>use of the non-inverted portions.
>
>
>On Fri, Aug 16, 2013 at 2:50 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
>
>> On Aug 15, 2013, at 1:34 , Matthew D. Pagel <pagel@xxxxxxx> wrote:
>> > Is there a quick-and-dirty algorithm out there for identifying
>> inversions from
>> > one subread to the next within a single PB read
>>
>> I'd have a more pressing, but similar question at the moment: is there a
>> way of easily identifying reads which for such a FR structure but where the
>> PB algorithms apparently did not recognise an adapter?
>>
>> Background: I'm working on the read improvement routines atm and I think
>> that in the 49 PB reads I took as initial test set (out of >30k from the
>> E.coli Nature paper), already two reads show such an inversion where there
>> should be none … ergo it's a sequencing artefact and 4% of reads like this
>> will wreak havoc with most assembly algorithms. I hate situations like
>> these.
>>
>> Bonus question: are PB adaptor sequences listed somewhere on the net? The
>> only place I found some are in the metadata XML files, and they told me
>>    ATCTCTCTCttttcctcctcctccgttgttgttgttGAGAGAGAT
>>
>> Are there others?
>>
>> B.
>> --
>> You have received this mail because you are subscribed to the mira_talk
>> mailing list. For information on how to subscribe or unsubscribe, please
>> visit http://www.chevreux.org/mira_mailinglists.html
>>
>
>
>
>-- 
>Chris Hoefler, PhD
>Postdoctoral Research Associate
>Straight Lab
>Texas A&M University
>2128 TAMU
>College Station, TX 77843-2128


_______________________________________________________
Matt Pagel
Graduate Student - Laboratory of Don Bryant
Penn State Biochemistry, Microbiology and Molecular Biology
104 S. Frear (mail); 223 S. Frear (biological material)
230 S. Frear (office)
State College, PA 16802


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Follow-Ups:
- [mira_talk] Re: PacBio CCS questions
  - From: Bastien Chevreux

References:
- [mira_talk] PacBio CCS questions
  - From: Matthew D. Pagel
- [mira_talk] Re: PacBio CCS questions
  - From: Bastien Chevreux

[mira_talk] Re: PacBio CCS questions

Other related posts: