[mira_talk] Library insert size distribution effects on MIRA 3.2.1.17

From: Phillip San Miguel <pmiguel@xxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Thu, 19 May 2011 11:18:29 -0400

On 5/16/2011 1:17 PM, Bastien Chevreux wrote:

On May 16, 2011, at 15:16 , Phillip San Miguel wrote:
I am now using MIRA V3.2.1.17 to de novo assemble 13 million solexareads (101 base PE reads). That is 1.3 billion bases of sequence.The genome size is about 4.5 million bases (Salmonella). So that is200x-300x coverage--more than I intended.
Do yourself a favour: go with 6m reads, that should be plenty enough.
Anyone want to predict the N50 contig length?
Depends on the genome itself, how repetitive it is. With PE reads Iwould hope for N50 >20kb though.

We had two strains that appear to differ by only a handful of SNPs thatwe assembled. One produced an N50 of ~150Kb and the other ~40Kb.(Details below.)

The main difference between the two samples was in a detail of TruSeqlibrary construction . One isolate had size selection done using aPippin Prep system. The other isolate used E-gels and combined 2fractions. The actual size-selection systems are probably unimportant.However, the Pippin Prep system size selection produced a library with"inserts" ranging from 340-440 bp with a mode at 381 bp. Whereas theE-gel size selected libraries resulted from the combination of 2 sizefractions. The resulting insert size distribution was bimodal, with onemode at 320 and the other mode at 366.

Histogram:

--this is from ELAND2 alignment of the reads against the (nearlyidentical) reference sequence:

length bin (bp)    Pippin prep    E-gel
  0-140             1%             1%
141-160             0%             1%
161-180             0%             2%
181-200             0%             2%
201-220             0%             2%
221-240             0%             3%
241-260             0%             4%
261-280             0%             5%
281-300             0%             8%
301-320             1%            22%
321-340             2%            20%
341-360            11%            10%
361-380            32%            13%
381-400            34%             5%
401-420            13%             2%
421-999             3%             1%

Mira de novo assembly results:
For "Large" contigs
Number of contigs      145 371
Total consensus:   4916075 4919599
Largest contig:     311963 118228
N50 contig size:    149609 39379
N90 contig size:     41722 7578
N95 contig size:     18547 3588

The assemblies were done on different computers, but I think the salientparameters were the same:

For the "Pippin prep" assembly:

--job=denovo,genome,accurate,solexa SOLEXA_SETTINGS-GE:tismin=200:tismax=700


For the "E-gel" assembly:
--job=genome,accurate,solexa SOLEXA_SETTINGS  -GE:tismin=200:tismax=700

--
Phillip

Follow-Ups:
- [mira_talk] Re: Library insert size distribution effects on MIRA 3.2.1.17
  - From: Bastien Chevreux

References:
- [mira_talk] Re: Call for testing: MIRA 3.2.1.17 and Ion Torrent
  - From: Bastien Chevreux
- [mira_talk] Re: Call for testing: MIRA 3.2.1.17 and Ion Torrent
  - From: Phillip San Miguel
- [mira_talk] Re: Call for testing: MIRA 3.2.1.17 and Ion Torrent
  - From: Bastien Chevreux

[mira_talk] Library insert size distribution effects on MIRA 3.2.1.17

Other related posts: