[mira_talk] Re: Assembling 454 and Solexa mate-pair data - rethinking ...

  • From: "Martin A. Hansen" <mail@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 15 Sep 2009 11:23:26 +0200

I have tried yet another approach for this assembly. I assumed that the
Solexa data was contaminated, so I ran MIRA with the 454 contigs (used as
long Sanger reads) and the Solexa mate-pairs - but only the mate pairs that
could be mapped to the contigs (using Bowtie and allowing for 3 mismatches).
This reduced the amount of mate-pairs from 3M to 1M, and still reads should
be included that spans the gaps between the contigs.

So I fed the following to MIRA

mira -project=M1 -job=denovo,genome,normal,sanger,solexa
-GENERAL:number_of_threads=4 SOLEXA_SETTINGS -CO:msr=no
-GE:uti=no:tismin=2000:tismax=3000

M1_in.sanger.fasta
M1_in.solexa.fasta

And waited for a day and got:

Localtime: Tue Sep 15 10:51:42 2009

Assembly information:
=====================

Num. reads assembled: 1186418
Num. singlets: 20

Large contigs:
--------------
With    Contig size             >= 500
        AND (Total avg. Cov     >= 8
             OR Cov(san)        >= 0
             OR Cov(454)        >= 0
             OR Cov(sxa)        >= 8
             OR Cov(sid)        >= 0
            )

  Length assessment:
  ------------------
  Number of contigs:    915
  Total consensus:      696521
  Largest contig:       6010
  N50 contig size:      739
  N90 contig size:      538
  N95 contig size:      523

  Coverage assessment:
  --------------------
  Max coverage (total): 1625
  Max coverage
        Sanger: 0
        454:    0
        Solexa: 1625
        Solid:  0
  Avg. total coverage (size >= 5000): 23.75
  Avg. coverage (contig size >= 5000)
        Sanger: 0.00
        454:    0.00
        Solexa: 23.75
        Solid:  0.00

  Quality assessment:
  -------------------
  Average consensus quality:                    22
  Consensus bases with IUPAC (IUPc):            645     (you might want to
check these)
  Strong unresolved repeat positions (SRMc):    0       (excellent)
  Weak unresolved repeat positions (WRMc):      0       (excellent)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)


All contigs:
------------
  Length assessment:
  ------------------
  Number of contigs:    28134
  Total consensus:      3312250
  Largest contig:       6010
  N50 contig size:      178
  N90 contig size:      48
  N95 contig size:      41

  Coverage assessment:
  --------------------
  Max coverage (total): 1625
  Max coverage
        Sanger: 0
        454:    0
        Solexa: 1626
        Solid:  0
  Avg. total coverage (size >= 5000): 23.75
  Avg. coverage (contig size >= 5000)
        Sanger: 0.00
        454:    0.00
        Solexa: 23.75
        Solid:  0.00

  Quality assessment:
  -------------------
  Average consensus quality:                    19
  Consensus bases with IUPAC (IUPc):            1542    (you might want to
check these)
  Strong unresolved repeat positions (SRMc):    0       (excellent)
  Weak unresolved repeat positions (WRMc):      0       (excellent)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)


This strikes me as completely wrong. The long contigs are gone.

According to the log both Sanger and Solexa reads were loaded (I omitted the
quals on purpose expecting a simple run).



Martin


On Thu, Sep 10, 2009 at 5:09 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Donnerstag 10 September 2009 Martin A. Hansen wrote:
> > So, why are qualities so important? If you have enough sequence it should
> > level out?
>
> The "non-perfect-repeat" detection routines heavily rely on qualities to
> tag
> bases that aid to discern the different repeats. The base calling
> algorithms
> also take qualities into consideration when confronted to unsure
> situations.
>
> Plus a few other places where qualities do matter quite a lot :-)
>
> B.
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: