[mira_talk] Re: assembly options for non-redundant contigs
- From: Richard Gregory <R.Gregory@xxxxxxxxxxxxxxx>
- To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
- Date: Tue, 09 Jun 2009 03:39:54 +0100
Hi Bastien,
Have prepared a mini example, by picking out all reads (both pre and
Titanium) that went into contigs that blasted to the same reference
gene. These reads were then assembled with 3 different versions of Mira,
V2.9.15, .37, .43, and cap3, and contigs from the Mira versions were
assembled again with cap3. The results aren't quite as obvious as the
full data set, but it does have the advantage of assembling in minutes
rather than days.
The results are:-
First assembly:
Used total
reads contigs bases
2002 13 5696 Reads assembled with cap3
2002 42 15643 Reads assembled with Mira V2.9.15
2016 59 19234 Reads assembled with Mira V2.9.37
2008 67 20981 Reads assembled with Mira V2.9.43
Reassembled with cap3 with default params:
Mira Used Cap3
Contigs contigs Singletons Contigs
13 8 5 3 cap3
42 40 2 3 v2.9.15
59 51 1 4 v2.9.37
67 59 6 4 v2.9.43
Will send the reads via PM.
I should also mention these reads have been heavily trimmed for adapter
and poly A/T using several passes of blast. Poly A/T will be present,
but only when it was originally <10 bases long and within the middle
half of the read.
Have also now tested 2.9.37 with the original pre-Titanium dataset I was
using to show the change in behaviour between Mira versions. This is
using not-so well trimmed reads.
Number Total Number of
of Reads Bases Contigs
169796 2865603 8540 V2.9.15
146840 5673833 22863 V2.9.37
149758 6167756 24376 V2.9.43
Mira Cap3 Cap3
Contigs Singletons Contigs
In Used Bases 'reads' Bases Out
8540 2277 2021257 6263 459553 630 V2.9.15
22863 16141 1918793 6722 656203 1116 V2.9.37
24376 17545 2007147 6831 724953 1167 V2.9.43
Which shows the main difference is between .15 and .37
Richard
Bastien Chevreux wrote:
On Mittwoch 03 Juni 2009 Richard Gregory wrote:
The "good" results were from V2.9.15 . Making sure this effect was real,
I've just tried assembling an old dataset using 2.9.43 and exactly the
same input reads. This example uses pre-Titanium reads, a pool of two
samples of relatively degraded cdna with an average read length 120bp.
[...]
Hmmm ... 2.9.15 is from one and a half years ago. A lot has changed in the
mean time and I'll need to investigate that.
The question is: why does MIRA put things apart that it now thinks do not
belong together. One idea I have is (as I changed poly-A/T clip handling) that
it now sses more "differences" in these parts and therefore has the effect you
noticed.
In the end, it would be best if this could be looked at with some very specific
examples at hand. Would it be possible for you to make available for me one of
these data sets? If yes, you could show me a few cases that trouble you and I
would have a deeper look at what happened (or did not happen).
Regards,
Bastien
PS: Please note that this could probably only happen mid to end of next week
or later as I this week-end is already reserved for something else and then
I'm on travel.
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: