[mira_talk] Re: MIRA / large contigs

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 13 May 2014 20:17:10 +0200

On 13 May 2014, at 14:01 , Sabrina rodriguez 
<sabrina.rodriguez@xxxxxxxxxxxxxxxx> wrote:
> In some cases, eventhough I obtain large contigs (> 500bp) as observed from 
> the <project>_info_assembly.txt file in the <project>_info directory; in the 
> <project>_result directory, no "LargeContigs" files were generated.

Hello Sabrina,

I base my answer on the assumption you are using a MIRA 4.x version. If not, 
please upgrade.

There may be a couple of reasons for what you are seeing. Let’s go through:

1. A bug in MIRA. Possible, but I’d be a little bit surprised.

2. Is the “large contigs” info file in the info directory populated, that is, 
does it contain contig names. If yes, then “something” went wrong “somewhere” 
when MIRA, after the main assembly, called itself to extract the contigs. 
However, I think that this scenario is unlikely as you observe this only “for 
some cases / datasets”.

3. If the “large contigs” info file in the info directory is not populated, 
your data set is … weird, and fools the heuristics which determine what to 
consider as “large contig.” This heuristic works like this: during assembly, 
MIRA looks at all contigs >= 5kb to determine an average coverage of those 
contigs >= 5kb. Then, at the end of the assembly, it defines as “large” contigs 
all contigs >= 500bp which have a coverage being at least 50% of the previously 
calculated average coverage (33% on projects with a coverage <40x).

BTW: you can change the 500 bp and 5000 bp limits via -MI:lcs and -MI:lcs4s 
parameters, maybe you want to test lcs=500 and lcs4s=2000. But please read on.

> In one example, I have obtained contig lengths going from  120 bp to 4543.
> In a second example, I have obtained contigs length between 107 and 9432 bp.

So, in the first example I can totally understand why MIRA did not extract any 
contig as “large” contig: there was none >= 5 kbp to calculate statistics on, 
hence no average coverage estimation could be given. In the second example 
however I wonder a little bit what kind of other effect prevented at least the 
9kbp contig to be regarded as large.

However, in case you were not assembling some viral data, your assembly stats 
point to some deeper problem: projects with a max contig size of 9 kbp (let 
alone 4 kbp) are a total catastrophe. Something feels very wrong there.

Best,
  Bastien


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: