[mira_talk] BAC vector sequece masking for de novo assembly using PacBio C2

  • From: Juan Daniel Montenegro Cabrera <jdmontenegroc@xxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Sat, 25 May 2013 08:13:35 -0500

Hi Juan,

I feel tour pain. I understand you want to assemble all your BACs together.
Would it be too crazy to assemble each BAC separately and remove the vector
only after the assemble is complete? Then, you coul simply align your clean
assembled BACs to produce the larger contigs you are looking for.

I too agree with Peter about playing around with sff_extract. I have found
that using ssaha2 to mask unwanted large sequences can be very helpful. And
you could write your own script to use ssaha2 info to split your
vector-containing sequences and renaming them as paired end ( like .f and
.r or whatever you like that does not have a special behavior within mira).

You will surely find a solution soon enough.

Regards,

Juan Montenegro

El sábado, 25 de mayo de 2013, Bastien Chevreux escribió:

> On May 25, 2013, at 14:35 , Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx> wrote:
> > Would it make sense to think of these two fragments on either side of
> the vector as paired reads? Much like Roche 545 where the sequence is
> circularised with an adapter - one might even be able to repurpose existing
> code for handling that (eg sff_extract pair extract  logic).
>
> Oh, neat idea. Indeed, exploring sff_extract might be a good idea, though
> I cannot predict how good it would really be.
>
> B.
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: