[mira_talk] assembly parameters and more

  • From: "Davide Sassera (davide.sassera)" <davide.sassera@xxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 11 Mar 2009 11:10:02 +0100

<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {mso-style-parent:"";
        margin:0cm;
        margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:12.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";}
@page Section1
        {size:595.3pt 841.9pt;
        margin:70.85pt 2.0cm 2.0cm 2.0cm;
        mso-header-margin:35.4pt;
        mso-footer-margin:35.4pt;
        mso-paper-source:0;}
div.Section1
        {page:Section1;}
 /* List Definitions */
 @list l0
        {mso-list-id:1881046959;
        mso-list-type:hybrid;
        mso-list-template-ids:-1563775208 68157455 68157465 68157467 68157455 
68157465 68157467 68157455 68157465 68157467;}
@list l0:level1
        {mso-level-tab-stop:36.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
-->


Dear All,

I’m
currently assembling a 1,5 Mb bacterial genome.

I have half
a titanium plate (520K reads), half a normal gs-flx plate (270k reads) and a
little bit of sanger just to spice it up.

I know it’s
a lot of sequences for such a small genome but we were afraid of having issues
with contaminating DNA and chimeras formed by whole genome amplification, both
of which in fact we have.

 

I’m working
on a 3.16Ghz dual core with 16GB of ram, and since I was sure my system was
going to handle the data, I went with an ultra accurate assembly:

 

mira
-project=050309 -job=denovo,accurate,454,sanger,genome -GE:not=2
-SK:pr=97:mnr=YES -AS::rbl=8:nop=10:urdsip:8:klrs=1 454_SETTINGS -AS:mrl=30

 

Now: it
seems things are harder than I thought, after 6 days it is still doing the
second step and it’s swapping 12gigs!!!

 

Now on with
the questions:

Am I being too strict with the
     parameters I’m using?Do the assembly steps take all
     the same time? It seems that step 2 is taking much longer than step 1Do 
all step take the same
     memory? Again it seems the second step is more demandingIf my assumptions 
are correct I
     will either wait for months or the assembly will stop for lack of memory
     soon, right?So what should I do now? Restart
     with softer parameters? Wait for 4-5 steps to be completed, quit the mira
     and use the latest caf I get? Stop bothering you guys (Bastien above all) 
with stupid
     questions?

 

Thank you
in advance, I really feel that the constant updates and this competent and
relaxed mailing list make Mira stand above all the competitors 

 

Davide 

Davide Sassera

DIPAV

Università degli Studi di Milano

Milano, Italy



----- Messaggio Originale -----
Da: Andreas Petzold <andpet@xxxxxxxxxxxxxx>
Data: Mercoledi', Marzo 11, 2009 10:56 am
Oggetto: [mira_talk] provide known repeats
A: mira_talk@xxxxxxxxxxxxx

> Hi Bastien,
> 
> the last assembly worked fine and now I have at least 3 % of my 
> fish genome (simply too low coverage and to few data but I have 
> to work with that). But I have another question (and maybe 
> another feature): Is it possible to provide a file that contains 
> already known repeats that should be considered for assembly ? 
> Or if I masked the read with RepeatMasker first can mira use the 
> information for assembly ?
> 
> On the other hand, is it neccessary ? Would it improve the 
> repeat tackling ?
> 
> Greets,
> 
> Andreas
> 
> -- 
> 
> Andreas Petzold
> Genome Analysis
> Fritz Lipmann Institute
> Beutenbergstrasse 11, D-07745 Jena
> voice : ++49-3641-656488
> fax   : ++49-3641-656488
> email : andpet@xxxxxxxxxxxxxx
> 
> -- 
> You have received this mail because you are subscribed to the 
> mira_talk mailing list. For information on how to subscribe or 
> unsubscribe, please visit 
> http://www.chevreux.org/mira_mailinglists.html

Other related posts: