[mira_talk] Re: assembly options for non-redundant contigs

From: Laurent Manchon <lmanchon@xxxxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Wed, 03 Jun 2009 21:08:00 +0200

-- Richard,

which parameters do you use with cap3 ?

In the past i have made test to compare results from Mira and Tgicl(TIGR software using cap3)and as you said the results performed by Tgicl were very different inquality and quantity (less contigs than Mira).But to assemble 450000 cDNA reads cap3 need a lot of memory, and i hadalways segmentation fault.So, today i use Mira which is able to treat big input. Many othersassembler programs exist and it takes too much

time to compare them each others to establish which is the best.

Laurent --


Richard Gregory a écrit :

Hi Laurent,
Thanks for the suggestion. Have tested your options on thepre-Titanium dataset, which I'm using as the benchmark because I havean assembly using V2.6.15 to compare to. Extending the table from myprevious email:
number     total    number of
of reads   bases     contigs
169796    2865603      8540    V2.9.15
149758    6167756     24376    V2.9.43
132790    6409873     25099    V2.9.43_Laurent
Looking at contigs >500 bp, V2.9.49 with your options produced 1269contigs, slightly fewer than the V2.9.49 options I was using.
The real test as far as I'm concerned is reassembly with somethingelse, such as cap3. Do the contigs assembly or are the kept separate.For this V2.9.15 is easily the least redundant.
 Mira     Mira     Cap3
Contigs  Contigs  Contigs
  In      Used      Out
 8540     2277      630   V2.9.15
24376    17545     1167   V2.9.43
25099    18261     1146   V2.9.43_Laurent


Richard


Laurent MANCHON wrote:
-- Hi Richard,
this is the commandline i use with 454 cDNA Titanium sequences (withMira 2.9.43):
mira -project=myproject -job=denovo,est,normal,454 -SK:mnr=yes-SK:rt=4 -GE:not=2 -CO:asir=yes -CO:mr 454_SETTINGS-AL:mrs=95:egp=yes:egpl=reject_codongaps:megpp=100 -LR:mxti=no-CO:rodirs=10 -AL:mo=60 -CL:cpat=yes
input: 434802 sequences
output: 51559 contigs and 197939 singlets

Laurent --



Richard Gregory a écrit :
Hi All,
We been using Mira for a while now, handles cDNA much better thanNewbler and gives us more confidence in the result.
The latest batch of data is proving to be a problem, the currentproject requires contigs that contain all similar reads and not besplit into multiple contigs of minor (or maybe large) differences.We are having trouble finding the options to achieve this. Doesanybody know if/how this can be done?
The sequence data is 1.5 plates of 454 Titanium cDNA and half aplate of pre-Titanium cDNA. Have tried many options, genome or est,--noclippings, -SK:mmhr=1, -DP:ure=no, -AS:ard=no, -AL:egp=no,-AS:sd=no, -CO:mr=no, -AL:mrs=55, -SK:mnr=yes, -SK:hss=1:pr=70, butthe result is basically the same. The better option set was -fasta-job=denovo,est,draft,454 --noclippings -SK:mmhr=1 -DP:ure=no-AS:ard=no -AS:sd=no -GE:not=4 -SK:hss=1:pr=70 , but this wasmarginally better and doesn't achieve the desired result.
The only clue comes previous assemblies with earlier versions ofMira, which produced much less redundancy, ie, was ~8000 contigs,now V2.9.43 produces ~18000. Mapping this onto a reference showed~1500 contigs could be the same gene. Assembling the ~1500 contigswith cap3 produced ~3 contigs, one containing hundreds of contigs.
Thanks,

Richard




--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Follow-Ups:
- [mira_talk] Re: assembly options for non-redundant contigs
  - From: Sven Klages

References:
- [mira_talk] assembly options for non-redundant contigs
  - From: Richard Gregory
- [mira_talk] Re: assembly options for non-redundant contigs
  - From: Laurent MANCHON
- [mira_talk] Re: assembly options for non-redundant contigs
  - From: Richard Gregory

[mira_talk] Re: assembly options for non-redundant contigs

Other related posts: