[mira_talk] Re: repeat clusters

  • From: bio5yz <bio5yz@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 9 Jun 2010 11:31:21 -0600

Thanks very much for your explanation Bastian!
My main issue now is that I cannot find any *nasty* files in my MIRA
directories for any of my 3 trial runs made. There are no files containing
*nasty* anything in the *_log directories. I guess these are the files that
i really need to determine the location, stretches, members of the masked
repeats. Would it be possible to generate this log after MIRA has run to
completion. Do I need to turn on a specific verbose param for it to print?
 Again my log file is saying "Mask nasty repeats (mnr)                    :
no" although it does seem to be masking the repeat regions. Should I be
turning this on manually?

Cheers,

Michael


On Tue, Jun 8, 2010 at 4:32 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Dienstag 08 Juni 2010 bio5yz wrote:
> >     Here are 2 examples, from our previous pairwise alignment (done after
> > known repeats were masked):
> >
> > Read : SETTMV_2SR9W01AWCRM  obtained 4675 hits in region (107,289) and
> 4488
> > hits in region (329,400).
> > This was assembled to 4 member contig SETTMV_rep_c7735 by MIRA.
> >
> > Read: SETTMV_2SR9W01B13M0 obtained 5171 hits in region (0,132) and 7138
> >  hits in region (155,472).
> > This read was assembled to 2 member contig SETTMV_rep_c7339 by MIRA.
> >
> > The purpose of this exercise was originally to determine chimeras in a
> > dataset that would be used for expression analysis later.
> >
> > A large number of the reads that these 2 hit were found in the debris
> >  files. There are definitely repeat regions within all these sequences
>
> Nasty repeat mask, that's pretty sure. Here you have it:
>
> > RD      SETTMV_2SR9W01AWCRM
> > [...]
> > RT      MNRr 41 99
> > RT      MNRr 117 170
> > RT      MNRr 172 406
> > [...]
>
> From the 406 bases, only a couple of stretches are not masked as nasty
> (first
> 40 bases, 18 bases at pos 99 and 2 bases at 170. The rest is masked. I
> suppose
> that the non-masked areas contain rare splices or sequencing error or
> adaptor
> remnants (at the front).
>
> > RD      SETTMV_2SR9W01CBHGG
> > RT      MNRr 15 80
> > RT      MNRr 98 403
> > RT      MNRr 405 446
>
> Same thing, almost completely masked. And the remaining reads also. MIRA
> doesn't skim masked areas (but does SW alignment on them), so if some reads
> have rare events (SNP, splice variants, errors) etc. it will find an
> overlap
> there, but not on the rest.
>
> Look in "*_assembly/*_d_log/*nasty*" files for more info what was masked in
> which reads.
>
> > I  was wondering if there was a way to trace what reads were debried and
> for
> >  what reason.
>
> Long standing feature request, but currently no time to implement such a
> thing.
>
> B.
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: