[mira_talk] Re: BAC vector sequece masking for de novo assembly using PacBio C2

  • From: Juan Pascual Anaya <jpascualanaya@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 27 May 2013 09:23:08 +0900

thank you guys for all this information!

I'll reply point by point.

@bastien & peter: 1234xxxxxxxxxxx5678 is exactcly what I have, and taking
into account that xxxxxxx is the sequence of the BAC vector (not the
sequencing platform vector), 1234 and 5678 are kind of paired, yes... but
"absolute" paired, since they are the extremes of the BAC insert (~120 Kb
long). I could use that information somehow to see if the final assembly is
correct (i.e., the sequence of 1234 and 5678 should be at the end, but in
reverse-complement for one of them: e.g.,
5'end-5678yyyyyyyyyyyyyyyyyyyyy4321-3'end.

For me, just treating those 1234 and 5678 as separate read would be enough,
I think.

@Juan: Sequencing every BAC independently would have required to bar-code
each sample, and to make a library per each. I pooled the BACs to save
money. I'll do exactly what you propose, align with SSAHA2 and use the
coordinated to split the extrems into two reads.

@Evan: I don't think you can get raw PacBio reads. Anyway, I have no
problem with the vector/adaptors used for sequencing, but with my specific
BAC vector (I have up to 15 clones from a BAC library that would like to
sequence). There are several datasets to download in the PacBio webpage:
http://pacbiodevnet.com/

Having said this, my level of programming is (below) beginner, so it will
take time until I get to write some decent script, but as soon as I do it,
I'll post it here so everyone can use it.

Thank you very much guys!!
Champi (eveyone calls me like that, so can you :)

PS: I have gotten a long reply in Seqanswers that seems interesting if you
want to have a look:
http://seqanswers.com/forums/showthread.php?t=30439&referrerid=23528

Other related posts: