[mira_talk] Re: Call for testing: MIRA 3.2.1.17 and Ion Torrent

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 16 May 2011 19:17:01 +0200

On May 16, 2011, at 15:16 , Phillip San Miguel wrote:
> I am now using MIRA V3.2.1.17 to de novo assemble 13 million solexa reads 
> (101 base PE reads).  That is 1.3 billion bases of sequence. The genome size 
> is about 4.5 million bases (Salmonella). So that is 200x-300x coverage--more 
> than I intended.

Do yourself a favour: go with 6m reads, that should be plenty enough.

> Anyone want to predict the N50 contig length?

Depends on the genome itself, how repetitive it is. With PE reads I would hope 
for N50 >20kb though.

> I tried MIRA V3.2.1.15 on a 70% GC bacterial genome (Deinococcus) at around 
> 100x coverage with solexa PE 101 base reads. My N50 contig size was 4630 
> bases. That seems short to me, but it might be a result of the 70% GC. So I 
> decided to de novo assemble a 50% GC data set from the same run.

That's bad, really bad. You are the second report I get that apparently, MIRA 
has problems with high GC Solexa data sets. The first being a supersecret bug 
of a big company, I cannot get the data to see what's causing havoc. Would it 
be possible for me to have a look at that thing? No promises, but it might help.

B.

Other related posts: