[mira_talk] Re: assembly options for non-redundant contigs

  • From: Richard Gregory <R.Gregory@xxxxxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Wed, 03 Jun 2009 02:00:05 +0100

Hi Bastien,

The "good" results were from V2.9.15 . Making sure this effect was real, I've just tried assembling an old dataset using 2.9.43 and exactly the same input reads. This example uses pre-Titanium reads, a pool of two samples of relatively degraded cdna with an average read length 120bp.

number     total    number of
of reads   bases     contigs
169796    2865603      8540    V2.9.15
149758    6167756     24376    V2.9.43

Looking at contigs >500 bp, V2.9.15 produced 1159 contigs and V2.9.43 produced 1298 contigs.

V2.9.15 was assembled with:
mira -project=cmb -AS:nop=7:rbl=3 -SK:pr=80 -AL:mrs=80 -FN:xtii=dummy_traceinfo.xml -GE:mxti=yes -454data -454:l454d=yes -CL:msvs=no:qc=no:bsqc=no:pvlc=no:mbc=no:emlc=no -DP:ure=no -OUT:otc=yes

and V2.9.43 was assembled with (hopefully comparable):
mira -job=denovo,est,draft,454 -project=$project -AS:nop=7:rbl=3 -SK:pr=80 -AL:mrs=80 -FN:xtii=dummy_traceinfo.xml -LR:mxti=no -LR:l454d=yes -CL:msvs=no:qc=no:bsqc=no:pvlc=no:mbc=no:emlc=no -DP:ure=no -SK:mnr=yes -OUT:otc=yes

Using cap3 on these 24376 Mira contigs produces 1167 cap3 contigs using 17545 Mira contigs. Using cap3 on the 8540 Mira contigs of V2.9.15, 630 contigs are produced using 2277 Mira contigs.

The 454 Titanium reads for another project are of much better quality, they are the expected length for the technology used. The same effect can be seen in these, many Mira contigs which cap3 can assemble with default options. Looking at the .ace file from cap3, the cap3 assembly is reasonable and leaves the impression Mira only needs a single base difference to start a new contig.


Thanks for the quick response,

Richard

Bastien Chevreux wrote:
On Dienstag 02 Juni 2009 Richard Gregory wrote:
[...]
The only clue comes previous assemblies with earlier versions of Mira,
which produced much less redundancy, ie, was ~8000 contigs, now V2.9.43
produces ~18000. Mapping this onto a reference showed ~1500 contigs
could be the same gene.  Assembling the ~1500 contigs with cap3
produced ~3 contigs, one containing hundreds of contigs.

Hello Richard,

hmmm ... sounds funny, indeed. Could you tell me the last version of MIRA with which you get "good" results and which version gives you troubles?

I admit that I have been concentrating more on genome assemblies lately and perhaps a changed default parameter or a new algorithm behaves somewhat unexpectedly with cDNA.

Regards,
  Bastien



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: