So, it also has come to my attention that my "strains" aren't closely related enough to be assembled in the manner in which I am trying to -- they are actually closely related species, ~4000 years diverged. So I guess the messy command line is now a moot point. The goal for my project is to assemble 3 different closely related species (1 of which has already been assembled by someone else -- this is the one with the Sanger reads) for further analysis. I had thought that the mixed assembly would use information from each set of ESTs and produce 3 differently assembled outputs for each set of reads, but perhaps that's not the case? Would you just recommend a de novo assembly for each of the three sets of reads? On Wed, Mar 23, 2011 at 5:15 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Wednesday 23 March 2011 20:21:04 Stephanie Pearl wrote: > > > I was talking some more with our computing staff and it turns out that my > > > job was actually running on a node with just 16 GB of RAM. Since my job > was > > > using 19 GB, it was swapping memory to disk and therefore slowing the > > > program down. So it appears that my problem is solved. Thanks for the > > > assistance, though! > > Well, one problem down. However, I recommend that you really switch to at > least 3.2.1. I do not think you would regret it, quite the contrary. > > > The command line that I entered for my assembly (which is still > running) > > > was: > > > > > > mira -project=hybridmulti -job=denovo,est,accurate,sanger,454 > > > -noclipping=454 -notraceinfo -fasta -CO:asir=yes -GE:not=4 > > > SANGER_SETTINGS -LR:wqf=no -AS:bdq=30:epoq=no:mrl=50 -CL:qc=no:bsqc=no > > > -AL:egp=yes:egpl=10 -ED:ace=no 454_SETTINGS -LR:lsd=yes -AS:mrl=50 > > > -AL:egp=no:mrs=94 -ED:ace=no -OUT:sssip=yes > > Hmmmmm ... I don't like that command line. At all. At least not for the job > you are trying to do. > > You wrote you had 3 strains (2 in 454, one in Sanger). Just out of > curiosity: may I ask how closely related these strains are? And now on to > the real problematic areas: > > 1) why do you use "-CO:asir=yes", but no "-SB:lsd:yes". You should be able > to assign a strain to each read, right? By doing that and telling MIRA about > this ("-SB:lsd=yes"), you enable MIRA to find out all by itself what is a > SNP and what is a real repeat. Then you do not need "-CO:asir=yes" anymore > (it is counterproductive for most use cases). > > 2) Why are you switching of the automatic editor for 454? You rob MIRA of > one of the most potent improvement tools it has, not good. > > 3) No qualities for the Sanger reads? Ouch ... why's that? > > 4) maybe a problem: "454_SETTINGS -AL:egp=no" will cluster together repeats > which have indels of >= 3 bases, e.g.: > > actgtgactgactgactgtgactgatgac > > actgtgactga******gtgactgatgac > > Depending on what you want to do, you may or may not want this. For a > de-novo assembly of closely related strains, I would not do that though. > > B. > >