[mira_talk] Re: big file in the log diretory

  • From: Stephanie Pearl <pearlsa110@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 23 Mar 2011 17:29:09 -0400

So, it also has come to my attention that my "strains" aren't closely
related enough to be assembled in the manner in which I am trying to -- they
are actually closely related species, ~4000 years diverged. So I guess the
messy command line is now a moot point.

The goal for my project is to assemble 3 different closely related species
(1 of which has already been assembled by someone else -- this is the one
with the Sanger reads) for further analysis. I had thought that the mixed
assembly would use information from each set of ESTs and produce 3
differently assembled outputs for each set of reads, but perhaps that's not
the case? Would you just recommend a de novo assembly for each of the three
sets of reads?

On Wed, Mar 23, 2011 at 5:15 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

>  On Wednesday 23 March 2011 20:21:04 Stephanie Pearl wrote:
>
> > I was talking some more with our computing staff and it turns out that my
>
> > job was actually running on a node with just 16 GB of RAM. Since my job
> was
>
> > using 19 GB, it was swapping memory to disk and therefore slowing the
>
> > program down. So it appears that my problem is solved. Thanks for the
>
> > assistance, though!
>
> Well, one problem down. However, I recommend that you really switch to at
> least 3.2.1. I do not think you would regret it, quite the contrary.
>
>  > The command line that I entered for my assembly (which is still
> running)
>
> > was:
>
> >
>
> > mira -project=hybridmulti -job=denovo,est,accurate,sanger,454
>
> > -noclipping=454 -notraceinfo -fasta -CO:asir=yes -GE:not=4
>
> > SANGER_SETTINGS -LR:wqf=no -AS:bdq=30:epoq=no:mrl=50 -CL:qc=no:bsqc=no
>
> > -AL:egp=yes:egpl=10 -ED:ace=no 454_SETTINGS -LR:lsd=yes -AS:mrl=50
>
> > -AL:egp=no:mrs=94 -ED:ace=no -OUT:sssip=yes
>
> Hmmmmm ... I don't like that command line. At all. At least not for the job
> you are trying to do.
>
> You wrote you had 3 strains (2 in 454, one in Sanger). Just out of
> curiosity: may I ask how closely related these strains are? And now on to
> the real problematic areas:
>
> 1) why do you use "-CO:asir=yes", but no "-SB:lsd:yes". You should be able
> to assign a strain to each read, right? By doing that and telling MIRA about
> this ("-SB:lsd=yes"), you enable MIRA to find out all by itself what is a
> SNP and what is a real repeat. Then you do not need "-CO:asir=yes" anymore
> (it is counterproductive for most use cases).
>
> 2) Why are you switching of the automatic editor for 454? You rob MIRA of
> one of the most potent improvement tools it has, not good.
>
> 3) No qualities for the Sanger reads? Ouch ... why's that?
>
> 4) maybe a problem: "454_SETTINGS -AL:egp=no" will cluster together repeats
> which have indels of >= 3 bases, e.g.:
>
> actgtgactgactgactgtgactgatgac
>
> actgtgactga******gtgactgatgac
>
> Depending on what you want to do, you may or may not want this. For a
> de-novo assembly of closely related strains, I would not do that though.
>
> B.
>
>

Other related posts: