There are no N's at the beginning of the sequences. Cheers On Tue, Mar 23, 2010 at 2:00 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Dienstag 23 März 2010 Andy wrote: > > [...] > > Error! The length of read SOLEXA1_0001:1:74:16647:1029#0/1 (101) does not > > match the length given in the SSAHA2 file (101) > > SSAHA2 line: ALIGNMENT::08 27 SOLEXA1_0001:1:74:16647:1029#0/1 pFLC-I 29 > 2 > > 735 762 C 28 100 101 > > > > Error! The length of read SOLEXA1_0001:1:74:17068:1020#0/1 (101) does not > > match the length given in the SSAHA2 file (101) > > SSAHA2 line: ALIGNMENT::50 100 SOLEXA1_0001:1:74:17068:1020#0/1 > pCMVSPORT6 > > 101 2 811 910 C 100 100 101 > > > > Error! The length of read SOLEXA1_0001:1:74:18037:1025#0/1 (101) does not > > match the length given in the SSAHA2 file (101) > > SSAHA2 line: ALIGNMENT::14 32 SOLEXA1_0001:1:74:18037:1025#0/1 pFLC-I 33 > 2 > > 739 770 C 32 100 101 > > > > Error! The length of read SOLEXA1_0001:1:74:18037:1025#0/1 (101) does not > > match the length given in the SSAHA2 file (101) > > SSAHA2 line: ALIGNMENT::14 27 SOLEXA1_0001:1:74:18037:1025#0/1 pFLC-I 29 > 2 > > 735 762 C 28 100 101 > > > > Error! The length of read SOLEXA1_0001:1:74:18070:1023#0/1 (101) does not > > match the length given in the SSAHA2 file (101) > > SSAHA2 line: ALIGNMENT::50 84 SOLEXA1_0001:1:74:18070:1023#0/1 pCMVSPORT6 > > 18 101 797 880 F 84 100 101 > > Uh oh ... I have the bad feeling that something is broken with the logic I > implemented. Just to be sure: can you please look at the reads in question > and > tell me whether they start with a 'N'? > > > I'm guessing that SSAHA2 thinks that the reads are 101bp long but mira > > thinks that they're 100bp? > > Long story short: for some aesthetical reasons, mira adds an 'n' in front > of > most Solexa reads ... except when there's aleready a 'n'. An the clipping > routine doesn't account for this special case ... yet. Just need to be sure > before I start fixing this. > > > I noticed also that mira is doing some filtering of the Solexa reads, > what > > do these mean? > > Solexa: Filter out T (hard) SOLEXA1_0001:1:15:12586:2781#0/1 > > Solexa: Filter out T (hard) SOLEXA1_0001:1:15:12588:8448#0/1 > > Solexa: Filter out (A hard) SOLEXA1_0001:1:15:12595:11576#0/1 > > Solexa: Filter out (A hard) SOLEXA1_0001:1:15:12611:16949#0/1 > > Solexa: Filter out T (hard) SOLEXA1_0001:1:15:12628:14575#0/1 > > Solexa: Filter out (A hard) SOLEXA1_0001:1:15:12634:12867#0/1 > > Need to document that. > > Hard: a run of 20 consecutive A or 20 T leads to discarding the read. You > see > this a lot with bad / low qual Solexa reads. > > Soft: same as above, but with 12 bases and total % of the same base in > complete read >= 80% > > Yes I know, this sometimes discards good reads, especially at poly-A poly-T > sites. Then again: not clipping creates all sorts of very interesting > problems > I prefer not to have :-) > > Regards, > Bastien > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html >