[mira_talk] Re: N50 decrease while sequencing depth increase ?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 4 Aug 2009 20:27:17 +0200

On Dienstag 04 August 2009 Brian Forde wrote:
> Not sure if this is relevant
> I believe that with 454 sequences the assembly can become saturated at a
> certain point and I believe that this can depend on the size of the genome.
> MWG did a presentation on it at FEMS this year. they showed that with a 2.7
> mb genome the assembly becomes saturated at a coverge of 21-22x above this
> threshold the assembly quality decrease i.e. you in fact get more contigs.
> Also a representative of GATC said something similar at a seminar where I
> work recently.
> I am currently looking for papers on this and will pass them on if/when I
> find them.

I know the claims from MWG and GATC and they're mostly right. The problem is 
more due to computational complexity: for non-repetitive sequence and a 
coverage of 100, the number of overlaps to check is ~16 times higher than with 
coverage 25.

For repeats, things then get *really* ugly. Even a small and easy bacterium 
with, say, 10 rRNA islands suddenly generates overlap numbers that are a major 
pain both in time and space requirement.

On the other hand, what Johann describes looks different ... let's see.

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: