[mira_talk] Re: SSAHA2 vector screen

  • From: Andy <mirabilis@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 23 Mar 2010 16:37:46 -0700

There are no N's at the beginning of the sequences.

Cheers

On Tue, Mar 23, 2010 at 2:00 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Dienstag 23 März 2010 Andy wrote:
> > [...]
> > Error! The length of read SOLEXA1_0001:1:74:16647:1029#0/1 (101) does not
> > match the length given in the SSAHA2 file (101)
> > SSAHA2 line: ALIGNMENT::08 27 SOLEXA1_0001:1:74:16647:1029#0/1 pFLC-I 29
> 2
> > 735 762 C 28 100 101
> >
> > Error! The length of read SOLEXA1_0001:1:74:17068:1020#0/1 (101) does not
> > match the length given in the SSAHA2 file (101)
> > SSAHA2 line: ALIGNMENT::50 100 SOLEXA1_0001:1:74:17068:1020#0/1
> pCMVSPORT6
> > 101 2 811 910 C 100 100 101
> >
> > Error! The length of read SOLEXA1_0001:1:74:18037:1025#0/1 (101) does not
> > match the length given in the SSAHA2 file (101)
> > SSAHA2 line: ALIGNMENT::14 32 SOLEXA1_0001:1:74:18037:1025#0/1 pFLC-I 33
> 2
> > 739 770 C 32 100 101
> >
> > Error! The length of read SOLEXA1_0001:1:74:18037:1025#0/1 (101) does not
> > match the length given in the SSAHA2 file (101)
> > SSAHA2 line: ALIGNMENT::14 27 SOLEXA1_0001:1:74:18037:1025#0/1 pFLC-I 29
> 2
> > 735 762 C 28 100 101
> >
> > Error! The length of read SOLEXA1_0001:1:74:18070:1023#0/1 (101) does not
> > match the length given in the SSAHA2 file (101)
> > SSAHA2 line: ALIGNMENT::50 84 SOLEXA1_0001:1:74:18070:1023#0/1 pCMVSPORT6
> >  18 101 797 880 F 84 100 101
>
> Uh oh ... I have the bad feeling that something is broken with the logic I
> implemented. Just to be sure: can you please look at the reads in question
> and
> tell me whether they start with a 'N'?
>
> > I'm guessing that SSAHA2 thinks that the reads are 101bp long but mira
> > thinks that they're 100bp?
>
> Long story short: for some aesthetical reasons, mira adds an 'n' in front
> of
> most Solexa reads ... except when there's aleready a 'n'. An the clipping
> routine doesn't account for this special case ... yet. Just need to be sure
> before I start fixing this.
>
> > I noticed also that mira is doing some filtering of the Solexa reads,
> what
> > do these mean?
> > Solexa: Filter out T (hard) SOLEXA1_0001:1:15:12586:2781#0/1
> > Solexa: Filter out T (hard) SOLEXA1_0001:1:15:12588:8448#0/1
> > Solexa: Filter out (A hard) SOLEXA1_0001:1:15:12595:11576#0/1
> > Solexa: Filter out (A hard) SOLEXA1_0001:1:15:12611:16949#0/1
> > Solexa: Filter out T (hard) SOLEXA1_0001:1:15:12628:14575#0/1
> > Solexa: Filter out (A hard) SOLEXA1_0001:1:15:12634:12867#0/1
>
> Need to document that.
>
> Hard: a run of 20 consecutive A or 20 T leads to discarding the read. You
> see
> this a lot with bad / low qual Solexa reads.
>
> Soft: same as above, but with 12 bases and total % of the same base in
> complete read >= 80%
>
> Yes I know, this sometimes discards good reads, especially at poly-A poly-T
> sites. Then again: not clipping creates all sorts of very interesting
> problems
> I prefer not to have :-)
>
> Regards,
>   Bastien
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: