[mira_talk] Re: N50 decrease while sequencing depth increase ?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 3 Aug 2009 23:06:25 +0200

On Montag 03 August 2009 johann JOETS wrote:
> maybe I made something wrong using Mira ?

Hello Johann,

the command line was exactly as it should be for such a test case.

> Hope you will have an idea about this.

A few. Let's see what's been biting you.

My first question: does metasim generate reads in forward and reverse 
direction? (I never used that program) That might be important later.

Second question: did you tell metasim to also simulate paired-ends or did you 
just simulate an unpaired data generation. You wrote that you'll want to have 
a look at a piece of a plant genome. Plants tend to be also quite complex in 
terms of repetitiveness, so going with unpaired sequencing might not be the 
best strategy there.

> [...]
> The N50 is as follow :
>       N50     av cov
> 5X    257     0
> 10X   5419    8,95
> 15X   111471  13,74
> 20X   108664  18,29
> 25X   27526   22,21
> 50X   446     40,08

Third question: how did you calculate the N50? Did you do that yourself by 
just or did you take numbers from the file "*_info_assembly.txt"? If the 
later, did you take numbers for 'large contigs' or 'all contigs'.

If the numbers above are for all contigs, then I'm not so much troubled. You 
would need to filter the results first to get rid of contig debris. Please 
also see http://chevreux.org/uploads/media/mira3_usage.html#section_14 (What 
to do with MIRA result files) regarding this.

Now, if the numbers above are for large contigs, then I'm troubled and I'd ask 
you to make me the sets for 20x to 100x available via FTP if possible.

> You may notice that the average coverage is roughly as expected. However I
> was surprised by the decrease of n50 for datasets deeper than 15X. This is
> also true for the length of the largest contig.
> As I know were reads should have been assembled I can check assembly
> quality (roughly I count breakages in contigs). According to these tests,
> the quality of the assembly also drop down.

Now, this is a bit strange. I would expect that starting with 20x, the whole 
150kb should be covered ... with 50x certainly more so. I would really like to 
have a look at those data sets.

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: