[mira_talk] High GC genomes and mira

  • From: Phillip San Miguel <pmiguel@xxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 17 May 2011 11:15:59 -0400

On 5/16/2011 1:17 PM, Bastien Chevreux wrote:
On May 16, 2011, at 15:16 , Phillip San Miguel wrote:

I tried MIRA V3.2.1.15 on a 70% GC bacterial genome (Deinococcus) at around 100x coverage with solexa PE 101 base reads. My N50 contig size was 4630 bases. That seems short to me, but it might be a result of the 70% GC. So I decided to de novo assemble a 50% GC data set from the same run.

That's bad, really bad. You are the second report I get that apparently, MIRA has problems with high GC Solexa data sets. The first being a supersecret bug of a big company, I cannot get the data to see what's causing havoc. Would it be possible for me to have a look at that thing? No promises, but it might help.

B.

    Probably, just let me check with the owner of the sequences.
However, the short contig lengths may derive from something trivial: read distribution bias. An Eland/Gerald mapping of our Illumina Salmonella reads produces a reasonably even coverage depth across the the genome. A similar mapping of our Illumina Deinococcus reads shows mostly 50-150x coverage, but also frequent regions with very low coverage (a few X coverage -- or zero).

--
Phillip

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: