[mira_talk] Fwd: Extraction of large contigs

Re-posted as the original mail did not get through.

> From 98447122@xxxxxxxxxxxxxx  Mon Mar 23 12:34:16 2009
> Does anyone know why the number of large contigs in the
> assembly info file may be 169 but when contigs of this size
> (>=500) are extracted the number increase (in this case to
> 400). Is there another factor influencing the number of
> large contigs i.e. coverage, quality,
> Brian

Hi Brian,

yes, there is: coverage. In the initial assembly, MIRA will treat as "large 
contig" any contig >= 500 bases AND (having a minimum average coverage >= 1/3 
of the average coverage of all contigs >5000 bases OR an average coverage per 
sequencing technology of 1/3 of the average coverage for this technology of 
all contigs >5000). 

Hmmm, long sentence. And not really comprehensible. Let's have a look at an 
example. Here's the head of a typical assembly info file for a 454 only 
assembly:

------------------------------- snip ------------------------------------
Localtime: Tue Mar 24 01:04:33 2009

Assembly information:
=====================

Num. reads assembled: 294854
Num. singlets: 1136

Large contigs:
--------------
With    Contig size             >= 500
        AND (Total avg. Cov     >= 8
             OR Cov(san)        >= 0
             OR Cov(454)        >= 8
             OR Cov(sxa)        >= 0
             OR Cov(sid)        >= 0
            )

  Length assessment:
  ------------------
  Number of contigs:    35
  Total consensus:      3286877
  Largest contig:       306978
  N50 contig size:      198400
  N90 contig size:      103100
  N95 contig size:      95220

  Coverage assessment:
  --------------------
  Max coverage (total): 35
  Max coverage
        Sanger: 0
        454:    47
        Solexa: 0
        Solid:  0
  Avg. total coverage (size >= 5000): 23.92
  Avg. coverage (contig size >= 5000)
        Sanger: 0.00
        454:    23.91
        Solexa: 0.00
        Solid:  0.00

------------------------------- snap ------------------------------------

Here, contigs >5000 bases had an average total coverage of 23.9, so the 
minimum average coverage for a contig to be seen as "large" was set to 8 
(calculated like this: int((23.92/3)+0.5) )

Now, when using convert_project, the minimum average coverage criterion is 
switched off by default, but you can set a value via "-y".

Hope this helps,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: