On 5/16/2011 1:17 PM, Bastien Chevreux wrote:
On May 16, 2011, at 15:16 , Phillip San Miguel wrote:I tried MIRA V3.2.1.15 on a 70% GC bacterial genome (Deinococcus) at around 100x coverage with solexa PE 101 base reads. My N50 contig size was 4630 bases. That seems short to me, but it might be a result of the 70% GC. So I decided to de novo assemble a 50% GC data set from the same run.That's bad, really bad. You are the second report I get that apparently, MIRA has problems with high GC Solexa data sets. The first being a supersecret bug of a big company, I cannot get the data to see what's causing havoc. Would it be possible for me to have a look at that thing? No promises, but it might help.B.
Probably, just let me check with the owner of the sequences.However, the short contig lengths may derive from something trivial: read distribution bias. An Eland/Gerald mapping of our Illumina Salmonella reads produces a reasonably even coverage depth across the the genome. A similar mapping of our Illumina Deinococcus reads shows mostly 50-150x coverage, but also frequent regions with very low coverage (a few X coverage -- or zero).
-- Phillip -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html