[mira_talk] Fwd: Extraction of large contigs
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 25 Mar 2009 22:52:27 +0100
Re-posted as the original mail did not get through.
> From 98447122@xxxxxxxxxxxxxx Mon Mar 23 12:34:16 2009
> Does anyone know why the number of large contigs in the
> assembly info file may be 169 but when contigs of this size
> (>=500) are extracted the number increase (in this case to
> 400). Is there another factor influencing the number of
> large contigs i.e. coverage, quality,
> Brian
Hi Brian,
yes, there is: coverage. In the initial assembly, MIRA will treat as "large
contig" any contig >= 500 bases AND (having a minimum average coverage >= 1/3
of the average coverage of all contigs >5000 bases OR an average coverage per
sequencing technology of 1/3 of the average coverage for this technology of
all contigs >5000).
Hmmm, long sentence. And not really comprehensible. Let's have a look at an
example. Here's the head of a typical assembly info file for a 454 only
assembly:
------------------------------- snip ------------------------------------
Localtime: Tue Mar 24 01:04:33 2009
Assembly information:
=====================
Num. reads assembled: 294854
Num. singlets: 1136
Large contigs:
--------------
With Contig size >= 500
AND (Total avg. Cov >= 8
OR Cov(san) >= 0
OR Cov(454) >= 8
OR Cov(sxa) >= 0
OR Cov(sid) >= 0
)
Length assessment:
------------------
Number of contigs: 35
Total consensus: 3286877
Largest contig: 306978
N50 contig size: 198400
N90 contig size: 103100
N95 contig size: 95220
Coverage assessment:
--------------------
Max coverage (total): 35
Max coverage
Sanger: 0
454: 47
Solexa: 0
Solid: 0
Avg. total coverage (size >= 5000): 23.92
Avg. coverage (contig size >= 5000)
Sanger: 0.00
454: 23.91
Solexa: 0.00
Solid: 0.00
------------------------------- snap ------------------------------------
Here, contigs >5000 bases had an average total coverage of 23.9, so the
minimum average coverage for a contig to be seen as "large" was set to 8
(calculated like this: int((23.92/3)+0.5) )
Now, when using convert_project, the minimum average coverage criterion is
switched off by default, but you can set a value via "-y".
Hope this helps,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts:
- » [mira_talk] Fwd: Extraction of large contigs - Bastien Chevreux