[mira_talk] assembly parameters and more
- From: "Davide Sassera (davide.sassera)" <davide.sassera@xxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 11 Mar 2009 11:10:02 +0100
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0cm;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
@page Section1
{size:595.3pt 841.9pt;
margin:70.85pt 2.0cm 2.0cm 2.0cm;
mso-header-margin:35.4pt;
mso-footer-margin:35.4pt;
mso-paper-source:0;}
div.Section1
{page:Section1;}
/* List Definitions */
@list l0
{mso-list-id:1881046959;
mso-list-type:hybrid;
mso-list-template-ids:-1563775208 68157455 68157465 68157467 68157455
68157465 68157467 68157455 68157465 68157467;}
@list l0:level1
{mso-level-tab-stop:36.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
-->
Dear All,
I’m
currently assembling a 1,5 Mb bacterial genome.
I have half
a titanium plate (520K reads), half a normal gs-flx plate (270k reads) and a
little bit of sanger just to spice it up.
I know it’s
a lot of sequences for such a small genome but we were afraid of having issues
with contaminating DNA and chimeras formed by whole genome amplification, both
of which in fact we have.
I’m working
on a 3.16Ghz dual core with 16GB of ram, and since I was sure my system was
going to handle the data, I went with an ultra accurate assembly:
mira
-project=050309 -job=denovo,accurate,454,sanger,genome -GE:not=2
-SK:pr=97:mnr=YES -AS::rbl=8:nop=10:urdsip:8:klrs=1 454_SETTINGS -AS:mrl=30
Now: it
seems things are harder than I thought, after 6 days it is still doing the
second step and it’s swapping 12gigs!!!
Now on with
the questions:
Am I being too strict with the
parameters I’m using?Do the assembly steps take all
the same time? It seems that step 2 is taking much longer than step 1Do
all step take the same
memory? Again it seems the second step is more demandingIf my assumptions
are correct I
will either wait for months or the assembly will stop for lack of memory
soon, right?So what should I do now? Restart
with softer parameters? Wait for 4-5 steps to be completed, quit the mira
and use the latest caf I get? Stop bothering you guys (Bastien above all)
with stupid
questions?
Thank you
in advance, I really feel that the constant updates and this competent and
relaxed mailing list make Mira stand above all the competitors
Davide
Davide Sassera
DIPAV
Università degli Studi di Milano
Milano, Italy
----- Messaggio Originale -----
Da: Andreas Petzold <andpet@xxxxxxxxxxxxxx>
Data: Mercoledi', Marzo 11, 2009 10:56 am
Oggetto: [mira_talk] provide known repeats
A: mira_talk@xxxxxxxxxxxxx
> Hi Bastien,
>
> the last assembly worked fine and now I have at least 3 % of my
> fish genome (simply too low coverage and to few data but I have
> to work with that). But I have another question (and maybe
> another feature): Is it possible to provide a file that contains
> already known repeats that should be considered for assembly ?
> Or if I masked the read with RepeatMasker first can mira use the
> information for assembly ?
>
> On the other hand, is it neccessary ? Would it improve the
> repeat tackling ?
>
> Greets,
>
> Andreas
>
> --
>
> Andreas Petzold
> Genome Analysis
> Fritz Lipmann Institute
> Beutenbergstrasse 11, D-07745 Jena
> voice : ++49-3641-656488
> fax : ++49-3641-656488
> email : andpet@xxxxxxxxxxxxxx
>
> --
> You have received this mail because you are subscribed to the
> mira_talk mailing list. For information on how to subscribe or
> unsubscribe, please visit
> http://www.chevreux.org/mira_mailinglists.html
Other related posts: