[mira_talk] Re: mira 3.0.1 crashes in mapping mode

From: Tony Travis <a.travis@xxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Mon, 29 Mar 2010 17:26:13 +0100

On 15/03/10 11:50, Tony Travis wrote:

[...]

Please note that mapping assemblies made with 3.0.1 and 3.0.2 will have broken
ACE, CAF and MAF files because of this error (provided they ran through).


Hello, Bastien.

OK, I re-ran the mapping assembly over the weekend using 3.0.3, and it
completed successfully.


Hello, Bastien.

Looks like I spoke too soon: We are still having problems doing mappingassemblies of ~40K Sanger reads + 1.2M 454 reads. MIRA crashes with anerror complaining that 'rr_####<n>####' is longer than 29900 base-pairs:

mira -project=map \
        -fasta \
        -job=mapping,genome,sanger,454 \
        -SB:lb=on -FN:bbin=all_backbone_in.fasta \
        SANGER_SETTINGS -LR:rns=fr,mxti=on -FN:xtii=all_traceinfo_in.sanger.xml 
\
        454_SETTINGS 
-FN:fai=all_in.454.fasta,fqui=all_in.454.fasta.qual,xtii=all_traceinfo_in.454.xml
This is MIRA V3.0.3 (production version).

...

Fatal Error (may be due to problems of the input data):
"Read rr_####409#### is longer than 29900 bases. SKIM cannot handle this, 
aborting.
"

->Thrown: uint32 Skim::computePartition(uint32 maxmemusage, bool 
computenumpartitions)

->Caught: main

Program aborted, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.

CWD: /home/rri20/genome/Roseburia/hominis/MIRA/current
make: *** [map] Aborted


Are 'rr_####<n>####' synthetic reads from suspected repeat regions?

I tried using '-SK:bph=31' but it made no difference...

I also had problems trying to read Sanger quality scores for reads in aMIRA assembly into Consed from a phd.ball that is consistent with aPhrap assembly - I used the same FASTA sequence and quality files forboth assemblies, but when I compare the reads in the Phrap and MIRA acefiles they are slightly different and I can't explain why.

I'm using the phredPhrap pipeline to vector screen our Sanger reads,then assemble the fasta.screen and fasta.screen.qual files created bythe pipeline using both Phrap and MIRA to compare the assemblies. Iexpected the reads, not the contigs, to be the same in both ace filesbut they are not: I used "readseq -degap=\*" to reformat the sequencesfrom the same read in each ace file in unpadded GenBank format to makeit easier to compare them. For example:

diff Phrap.read.gb MIRA.read.gb
1,2c1,2
< LOCUS       RA10-p04-K04_f   1286 bp NUCLEOTIDE             UNA       
29-Mar-2010
< DEFINITION  RA10-p04-K04.f 1286 0 0
---
> LOCUS       RA10-p04-K04_f   1151 bp NUCLEOTIDE             UNA       
29-Mar-2010
> DEFINITION  RA10-p04-K04.f 1151 0 0
18,22c18,22
<       841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTT CACTCCAGTC TTTACGTTTT
<       901 ATCACTTTTA ATTTTTGCTA AAGCCCTAAT AATCTCAATA TAATATTGCT TCCGTGCATT
<       961 ATCCGTATTC CGCATTAACA GCTCAAAACC ATTTTCTAAG CCCAAAGAAT GGTTTTTATT
<      1021 TTTTCAAAAG TCTTTAGTTT AAAATTTTTA ATCCCTAAAC CCAACCTCTC TCCCAATTAC
<      1081 CAATTAACTA ATTTCATTAA AAGGGAATTT AAAATTTCTT GGAATTACCC CGAAAAACA
---
>       841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTC ACTCCAGTCT TTACGTTTTA
>       901 TCACTTTTAT TTTTGCTAAA GCCTAATAAT CTCAATATAA TATTGCTTCC GTGCATTATC
>       961 CGTATTCCGC ATTAACAGCT CAAAACCATT TTCTAAGCCC AAAGAATGGT TTTTATTTTT
>      1021 TCAAAAGTCT TTAGTTTAAA ATTTTTAATC CCTAAACCCA ACCTCTCTCC CAATTACCAA
>      1081 TTAACTAATT TCATTAAAAG GGAATTTAAA ATTTCTTGGA ATTACCCCGA AAAACA

I'm not surprised that the number of padded bases is different, but I amsurprised that MIRA has deleted bases from the read! This makes itimpossible to use a 'phd.ball' created from the original 'phd' files, orthe fasta.screen and fasta.screen.qual created by phredPhrap, to loadquality values for the Sanger reads when using Consed to view an 'ace'file produced by a MIRA assembly. Consed complains that the bases in theace file differ from the bases in the 'phd' file:

ace file: all_out.ace
Version 19.0 (090206)
exception thrown: ace file says base at nPaddedSequencePos 893 or nBases 880 is
c but PHD file says t which on top strand is t for read RA10-p04-K04.f at line 
23206 in file of phd files phd.ball

The reads in the MIRA ace file are consistent with the checkpointed MIRAread-pool, but the read-pool itself is different to the FASTA file readin by MIRA:

diff Fasta.read.gb chkpt.read.gb
1,4c1
< LOCUS       RA10-p04-K04_f   1139 bp    DNA             UNA       29-Mar-2010
< DEFINITION  RA10-p04-K04.f  CHROMAT_FILE: RA10-p04-K04.f PHD_FILE:
<             RA10-p04-K04.f.phd.1 CHEM: term DYE: big TIME: Fri Mar 12 
15:39:23<             2010 TEMPLATE: RA10-p04-K04 DIRECTION: fwd
---
> LOCUS       chkpt_read_1            1136 bp
20,24c17,25
<       841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTT CACTCCAGTC TTTACGTTTT
<       901 ATCACTTTTA ATTTTTGCTA AAGCCCTAAT AATCTCAATA TAATATTGCT TCCGTGCATT
<       961 ATCCGTATTC CGCATTAACA GCTCAAAACC ATTTTCTAAG CCCAAAGAAT GGTTTTTATT
<      1021 TTTTCAAAAG TCTTTAGTTT AAAATTTTTA ATCCCTAAAC CCAACCTCTC TCCCAATTAC
<      1081 CAATTAACTA ATTTCATTAA AAGGGAATTT AAAATTTCTT GGAATTACCC CGAAAAACA
---
>       841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTC ACTCCAGTCT TTACGTTTTA
>       901 TCACTTTTAT TTTTGCTAAA GCCTAATAAT CTCAATATAA TATTGCTTCC GTGCATTATC
>       961 CGTATTCCGC ATTAACAGCT CAAAACCATT TTCTAAGCCC AAAGAATGGT TTTTATTTTT
>      1021 TCAAAAGTCT TTAGTTTAAA ATTTTTAATC CCTAAACCCA ACCTCTCTCC CAATTACCAA
>      1081 TTAACTAATT TCATTAAAAG GGAATTTAAA ATTTCTTGGA ATTACCCCGA AAAACA
> //
> LOCUS       chkpt_read_2               1 bp
> ORIGIN
>         1 N

The chromatogram 'scf' files are not accessible to MIRA, and I did notexpect any automatic editing of the reads to be done:

  Edit options (-ED):
        Automatic contig editing (ace)              :  [san]  no
                                                       [454]  yes
     Sanger only:
        Strict editing mode (sem)                   : no
        Confirmation threshold in percent (ct)      : 50


Am I misunderstanding something about how MIRA manages its reads?

What I want to do is examine a MIRA assembly in Consed with library andquality information included: Normally, a 'phd.ball' file is used forthis purpose. Newbler, for example, creates a 'phd.ball' automaticallywhen producing an 'ace' file. Could MIRA create a 'phd.ball' to loadquality scores and library info into Consed for the MIRA ace file?


Thanks,

  Tony.
--
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:a.travis@xxxxxxxxxx, http://bioinformatics.rri.sari.ac.uk/~ajt

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Follow-Ups:
- [mira_talk] Re: mira 3.0.1 crashes in mapping mode
  - From: Bastien Chevreux

References:
- [mira_talk] Re: mira 3.0.1 crashes in mapping mode
  - From: Stefano Ghignone
- [mira_talk] Re: mira 3.0.1 crashes in mapping mode
  - From: Tony Travis
- [mira_talk] Re: mira 3.0.1 crashes in mapping mode
  - From: Bastien Chevreux
- [mira_talk] Re: mira 3.0.1 crashes in mapping mode
  - From: Tony Travis

[mira_talk] Re: mira 3.0.1 crashes in mapping mode

Other related posts: