[mira_talk] Re: BAC vector sequece masking for de novo assembly using PacBio C2

  • From: Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Sat, 25 May 2013 13:35:56 +0100

On Saturday, May 25, 2013, Bastien Chevreux wrote:

> On May 25, 2013, at 13:55 , Juan Pascual Anaya 
> <jpascualanaya@xxxxxxxxx<javascript:_e({}, 'cvml', 
> 'jpascualanaya@xxxxxxxxx');>>
> wrote:
>
> However, how would MIRA use a read with a sequence masked in the middle?
> Would it use the two extremes as independent reads (wanted effect)? Or
> would join the two extremes (unwanted effect)?
>
>
> MIRA would use it as one read with some undetermined sequence in the
> middle. Which, I presume, is also something unwanted.
>
> If I understood you correctly, you have something like this:
>
> 1234xxxxxxxxxxx5678
>
> with the numbers being (wanted) bases and the x being masked unwanted
> cloning vector. If I further understood you correctly, the above should be
> treated as two independent reads, i.e.
>
> >r1
> 1234
> >r2
> 5678
>
> As you pointed out, joining the reads to "12345678" is not correct, but
> would "56781234" (or in case of slight overlaps, "5678234") be correct?
>

Would it make sense to think of these two fragments on either side of the
vector as paired reads? Much like Roche 545 where the sequence is
circularised with an adapter - one might even be able to repurpose
existing code
for handling that (eg sff_extract pair extract  logic).

Peter

Other related posts: