[mira_talk] Re: Assembling 454 and Solexa mate-pair data - rethinking ...

  • From: "Martin A. Hansen" <mail@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 8 Sep 2009 09:51:57 +0200

OK, so finally MIRA completed with the combined 454/Solexa data run.

Here is the call parameters & info:

mira -project=M1 -job=denovo,genome,normal,454,solexa
-GENERAL:number_of_threads=4 SOLEXA_SETTINGS -CO:msr=no
-GE:uti=no:tismin=2000:tismax=3000

Localtime: Mon Sep  7 12:46:26 2009

Assembly information:
=====================

Num. reads assembled: 3392921
Num. singlets: 33

Large contigs:
--------------
With    Contig size             >= 500
        AND (Total avg. Cov     >= 20
             OR Cov(san)        >= 0
             OR Cov(454)        >= 13
             OR Cov(sxa)        >= 6
             OR Cov(sid)        >= 0
            )

  Length assessment:
  ------------------
  Number of contigs:    1066
  Total consensus:      3770477
  Largest contig:       306903
  N50 contig size:      117364
  N90 contig size:      802
  N95 contig size:      638

  Coverage assessment:
  --------------------
  Max coverage (total): 2110
  Max coverage
        Sanger: 0
        454:    241
        Solexa: 2005
        Solid:  0
  Avg. total coverage (size >= 5000): 58.56
  Avg. coverage (contig size >= 5000)
        Sanger: 0.00
        454:    39.42
        Solexa: 18.93
        Solid:  0.00

  Quality assessment:
  -------------------
  Average consensus quality:                    22
  Consensus bases with IUPAC (IUPc):            634     (you might want to
check these)
  Strong unresolved repeat positions (SRMc):    5       (you might want to
check these)
  Weak unresolved repeat positions (WRMc):      0       (excellent)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)


All contigs:
------------
  Length assessment:
  ------------------
  Number of contigs:    50303
  Total consensus:      6877147
  Largest contig:       306903
  N50 contig size:      757
  N90 contig size:      35
  N95 contig size:      31

  Coverage assessment:
  --------------------
  Max coverage (total): 2546
  Max coverage
        Sanger: 0
        454:    241
        Solexa: 2548
        Solid:  0
  Avg. total coverage (size >= 5000): 58.56
  Avg. coverage (contig size >= 5000)
        Sanger: 0.00
        454:    39.42
        Solexa: 18.93
        Solid:  0.00

  Quality assessment:
  -------------------
  Average consensus quality:                    18
  Consensus bases with IUPAC (IUPc):            2382    (you might want to
check these)
  Strong unresolved repeat positions (SRMc):    5       (you might want to
check these)
  Weak unresolved repeat positions (WRMc):      0       (excellent)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)


I find the result is more messy that the ~70 contigs from the 454-data only
assembly. The longest contigs are a bit longer, but none of the big contigs
appear to have been joined. And then a fair number of short contigs have
appeared.

Bwt. I did check the integrity of the Solexa mate pair data and it does look
OK.


Martin


On Fri, Sep 4, 2009 at 9:15 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Freitag 04 September 2009 Martin A. Hansen wrote:
> > Btw, I have a feeling that the preparation of the mate-pair library was
> > somehow was faulty so that there are no mate-pair reads - only single
> > reads. Would that effect run-time of MIRA?
>
> Using templates has some impact of O(nlog(n)) with n being number of
> paired-
> end reads in contig during contig assembly. Should not be too noticable
> unless
> one has contigs with a million reads or more.
>
> Hmmm ... which may be the case for hybrid assemblies. I'll have to rethink
> whether trading space for time is needed.
>
> Regards,
>  Bastien
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: