[mira_talk] Re: assembly options for non-redundant contigs
- From: Richard Gregory <R.Gregory@xxxxxxxxxxxxxxx>
- To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
- Date: Wed, 03 Jun 2009 19:15:45 +0100
Hi Laurent,
Thanks for the suggestion. Have tested your options on the pre-Titanium
dataset, which I'm using as the benchmark because I have an assembly
using V2.6.15 to compare to. Extending the table from my previous email:
number total number of
of reads bases contigs
169796 2865603 8540 V2.9.15
149758 6167756 24376 V2.9.43
132790 6409873 25099 V2.9.43_Laurent
Looking at contigs >500 bp, V2.9.49 with your options produced 1269
contigs, slightly fewer than the V2.9.49 options I was using.
The real test as far as I'm concerned is reassembly with something else,
such as cap3. Do the contigs assembly or are the kept separate. For this
V2.9.15 is easily the least redundant.
Mira Mira Cap3
Contigs Contigs Contigs
In Used Out
8540 2277 630 V2.9.15
24376 17545 1167 V2.9.43
25099 18261 1146 V2.9.43_Laurent
Richard
Laurent MANCHON wrote:
-- Hi Richard,
this is the commandline i use with 454 cDNA Titanium sequences (with
Mira 2.9.43):
mira -project=myproject -job=denovo,est,normal,454 -SK:mnr=yes -SK:rt=4
-GE:not=2 -CO:asir=yes -CO:mr 454_SETTINGS
-AL:mrs=95:egp=yes:egpl=reject_codongaps:megpp=100 -LR:mxti=no
-CO:rodirs=10 -AL:mo=60 -CL:cpat=yes
input: 434802 sequences
output: 51559 contigs and 197939 singlets
Laurent --
Richard Gregory a écrit :
Hi All,
We been using Mira for a while now, handles cDNA much better than
Newbler and gives us more confidence in the result.
The latest batch of data is proving to be a problem, the current
project requires contigs that contain all similar reads and not be
split into multiple contigs of minor (or maybe large) differences. We
are having trouble finding the options to achieve this. Does anybody
know if/how this can be done?
The sequence data is 1.5 plates of 454 Titanium cDNA and half a plate
of pre-Titanium cDNA. Have tried many options, genome or est,
--noclippings, -SK:mmhr=1, -DP:ure=no, -AS:ard=no, -AL:egp=no,
-AS:sd=no, -CO:mr=no, -AL:mrs=55, -SK:mnr=yes, -SK:hss=1:pr=70, but
the result is basically the same. The better option set was -fasta
-job=denovo,est,draft,454 --noclippings -SK:mmhr=1 -DP:ure=no
-AS:ard=no -AS:sd=no -GE:not=4 -SK:hss=1:pr=70 , but this was
marginally better and doesn't achieve the desired result.
The only clue comes previous assemblies with earlier versions of Mira,
which produced much less redundancy, ie, was ~8000 contigs, now
V2.9.43 produces ~18000. Mapping this onto a reference showed ~1500
contigs could be the same gene. Assembling the ~1500 contigs with
cap3 produced ~3 contigs, one containing hundreds of contigs.
Thanks,
Richard
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: