[mira_talk] Re: Multiple long repeats in genome.

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 20 Mar 2011 14:11:19 +0100

On Thursday 17 March 2011 18:40:27 Andrzej N wrote:
> Short question. In genome I'm working now there are long repeats (~5 kbp)
> multiple times. How to make MIRA distribute that sequences from "within"
> that region equal, instead collect them in one place, with like 200x
> coverage? Uniform read distribution is not doing it :(. 454 data, one end,
> coverage about 20x.

What you are describing only affects contigs made of repeats longer than a 
read length (or than the insert size of paired reads).

In the past, MIRA indeed made multiple copies of repeats and stored them 
separately ... because I too feel that this is the right approach.

Until people complained that those version of MIRA were bad program because 
they made "much more" contigs than other assemblers. When I then started to 
see assembler comparisons involving MIRA just based on N50 and number of 
contigs, I mailed the authors of some to point out the important difference. I 
either got no response, or responses from people I had then to assume were 
undergrad students who had no idea about the underlying problematic and had 
this as a "homework" or response from people with enough knowledge but told me 
that the "needed something easy to measure and they had no time for more in-
depth analysis of the assembly quality".

I then grudgingly reverted my decision and made MIRA again collapse repetitive 
contigs.

Sad, but true.

> I hope MIRA is not deleting that repeats. How to keep them together with my
> other contigs?

Erm, what do you mean by that?

B.

Other related posts: