[mira_talk] Re: assembly options for non-redundant contigs

  • From: Richard Gregory <R.Gregory@xxxxxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Tue, 09 Jun 2009 03:39:54 +0100

Hi Bastien,

Have prepared a mini example, by picking out all reads (both pre and Titanium) that went into contigs that blasted to the same reference gene. These reads were then assembled with 3 different versions of Mira, V2.9.15, .37, .43, and cap3, and contigs from the Mira versions were assembled again with cap3. The results aren't quite as obvious as the full data set, but it does have the advantage of assembling in minutes rather than days.

The results are:-
First assembly:
Used            total
reads contigs   bases
2002   13       5696    Reads assembled with cap3
2002   42      15643    Reads assembled with Mira V2.9.15
2016   59      19234    Reads assembled with Mira V2.9.37
2008   67      20981    Reads assembled with Mira V2.9.43

Reassembled with cap3 with default params:
 Mira     Used            Cap3
Contigs contigs  Singletons  Contigs
  13       8         5         3       cap3
  42      40         2         3       v2.9.15
  59      51         1         4       v2.9.37
  67      59         6         4       v2.9.43

Will send the reads via PM.

I should also mention these reads have been heavily trimmed for adapter and poly A/T using several passes of blast. Poly A/T will be present, but only when it was originally <10 bases long and within the middle half of the read.

Have also now tested 2.9.37 with the original pre-Titanium dataset I was using to show the change in behaviour between Mira versions. This is using not-so well trimmed reads.
 Number    Total    Number of
of Reads   Bases     Contigs
 169796   2865603      8540    V2.9.15
 146840   5673833     22863    V2.9.37
 149758   6167756     24376    V2.9.43


      Mira           Cap3             Cap3
    Contigs        Singletons        Contigs
  In      Used    Bases  'reads'   Bases   Out
 8540     2277   2021257  6263    459553   630  V2.9.15
22863    16141   1918793  6722    656203  1116  V2.9.37
24376    17545   2007147  6831    724953  1167  V2.9.43

Which shows the main difference is between .15 and .37


Richard


Bastien Chevreux wrote:
On Mittwoch 03 Juni 2009 Richard Gregory wrote:
The "good" results were from V2.9.15 . Making sure this effect was real,
I've just tried assembling an old dataset using 2.9.43 and exactly the
same input reads. This example uses pre-Titanium reads, a pool of two
samples of relatively degraded cdna with an average read length 120bp.
[...]

Hmmm ... 2.9.15 is from one and a half years ago. A lot has changed in the mean time and I'll need to investigate that.

The question is: why does MIRA put things apart that it now thinks do not belong together. One idea I have is (as I changed poly-A/T clip handling) that it now sses more "differences" in these parts and therefore has the effect you noticed.

In the end, it would be best if this could be looked at with some very specific examples at hand. Would it be possible for you to make available for me one of these data sets? If yes, you could show me a few cases that trouble you and I would have a deeper look at what happened (or did not happen).

Regards,
  Bastien

PS: Please note that this could probably only happen mid to end of next week or later as I this week-end is already reserved for something else and then I'm on travel.


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: