On Dienstag 08 Juni 2010 bio5yz wrote: > Here are 2 examples, from our previous pairwise alignment (done after > known repeats were masked): > > Read : SETTMV_2SR9W01AWCRM obtained 4675 hits in region (107,289) and 4488 > hits in region (329,400). > This was assembled to 4 member contig SETTMV_rep_c7735 by MIRA. > > Read: SETTMV_2SR9W01B13M0 obtained 5171 hits in region (0,132) and 7138 > hits in region (155,472). > This read was assembled to 2 member contig SETTMV_rep_c7339 by MIRA. > > The purpose of this exercise was originally to determine chimeras in a > dataset that would be used for expression analysis later. > > A large number of the reads that these 2 hit were found in the debris > files. There are definitely repeat regions within all these sequences Nasty repeat mask, that's pretty sure. Here you have it: > RD SETTMV_2SR9W01AWCRM > [...] > RT MNRr 41 99 > RT MNRr 117 170 > RT MNRr 172 406 > [...] From the 406 bases, only a couple of stretches are not masked as nasty (first 40 bases, 18 bases at pos 99 and 2 bases at 170. The rest is masked. I suppose that the non-masked areas contain rare splices or sequencing error or adaptor remnants (at the front). > RD SETTMV_2SR9W01CBHGG > RT MNRr 15 80 > RT MNRr 98 403 > RT MNRr 405 446 Same thing, almost completely masked. And the remaining reads also. MIRA doesn't skim masked areas (but does SW alignment on them), so if some reads have rare events (SNP, splice variants, errors) etc. it will find an overlap there, but not on the rest. Look in "*_assembly/*_d_log/*nasty*" files for more info what was masked in which reads. > I was wondering if there was a way to trace what reads were debried and for > what reason. Long standing feature request, but currently no time to implement such a thing. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html