Thanks very much for your explanation Bastian! My main issue now is that I cannot find any *nasty* files in my MIRA directories for any of my 3 trial runs made. There are no files containing *nasty* anything in the *_log directories. I guess these are the files that i really need to determine the location, stretches, members of the masked repeats. Would it be possible to generate this log after MIRA has run to completion. Do I need to turn on a specific verbose param for it to print? Again my log file is saying "Mask nasty repeats (mnr) : no" although it does seem to be masking the repeat regions. Should I be turning this on manually? Cheers, Michael On Tue, Jun 8, 2010 at 4:32 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Dienstag 08 Juni 2010 bio5yz wrote: > > Here are 2 examples, from our previous pairwise alignment (done after > > known repeats were masked): > > > > Read : SETTMV_2SR9W01AWCRM obtained 4675 hits in region (107,289) and > 4488 > > hits in region (329,400). > > This was assembled to 4 member contig SETTMV_rep_c7735 by MIRA. > > > > Read: SETTMV_2SR9W01B13M0 obtained 5171 hits in region (0,132) and 7138 > > hits in region (155,472). > > This read was assembled to 2 member contig SETTMV_rep_c7339 by MIRA. > > > > The purpose of this exercise was originally to determine chimeras in a > > dataset that would be used for expression analysis later. > > > > A large number of the reads that these 2 hit were found in the debris > > files. There are definitely repeat regions within all these sequences > > Nasty repeat mask, that's pretty sure. Here you have it: > > > RD SETTMV_2SR9W01AWCRM > > [...] > > RT MNRr 41 99 > > RT MNRr 117 170 > > RT MNRr 172 406 > > [...] > > From the 406 bases, only a couple of stretches are not masked as nasty > (first > 40 bases, 18 bases at pos 99 and 2 bases at 170. The rest is masked. I > suppose > that the non-masked areas contain rare splices or sequencing error or > adaptor > remnants (at the front). > > > RD SETTMV_2SR9W01CBHGG > > RT MNRr 15 80 > > RT MNRr 98 403 > > RT MNRr 405 446 > > Same thing, almost completely masked. And the remaining reads also. MIRA > doesn't skim masked areas (but does SW alignment on them), so if some reads > have rare events (SNP, splice variants, errors) etc. it will find an > overlap > there, but not on the rest. > > Look in "*_assembly/*_d_log/*nasty*" files for more info what was masked in > which reads. > > > I was wondering if there was a way to trace what reads were debried and > for > > what reason. > > Long standing feature request, but currently no time to implement such a > thing. > > B. > > > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html >