On 15/03/10 11:50, Tony Travis wrote:
[...]Please note that mapping assemblies made with 3.0.1 and 3.0.2 will have broken ACE, CAF and MAF files because of this error (provided they ran through).Hello, Bastien. OK, I re-ran the mapping assembly over the weekend using 3.0.3, and it completed successfully.
Hello, Bastien.Looks like I spoke too soon: We are still having problems doing mapping assemblies of ~40K Sanger reads + 1.2M 454 reads. MIRA crashes with an error complaining that 'rr_####<n>####' is longer than 29900 base-pairs:
mira -project=map \ -fasta \ -job=mapping,genome,sanger,454 \ -SB:lb=on -FN:bbin=all_backbone_in.fasta \ SANGER_SETTINGS -LR:rns=fr,mxti=on -FN:xtii=all_traceinfo_in.sanger.xml \ 454_SETTINGS -FN:fai=all_in.454.fasta,fqui=all_in.454.fasta.qual,xtii=all_traceinfo_in.454.xml This is MIRA V3.0.3 (production version).
...
Fatal Error (may be due to problems of the input data): "Read rr_####409#### is longer than 29900 bases. SKIM cannot handle this, aborting. " ->Thrown: uint32 Skim::computePartition(uint32 maxmemusage, bool computenumpartitions) ->Caught: main Program aborted, probably due to error in the input data or parametrisation. Please check the output log for more information. For help, please write a mail to the mira talk mailing list. CWD: /home/rri20/genome/Roseburia/hominis/MIRA/current make: *** [map] Aborted
Are 'rr_####<n>####' synthetic reads from suspected repeat regions? I tried using '-SK:bph=31' but it made no difference...I also had problems trying to read Sanger quality scores for reads in a MIRA assembly into Consed from a phd.ball that is consistent with a Phrap assembly - I used the same FASTA sequence and quality files for both assemblies, but when I compare the reads in the Phrap and MIRA ace files they are slightly different and I can't explain why.
I'm using the phredPhrap pipeline to vector screen our Sanger reads, then assemble the fasta.screen and fasta.screen.qual files created by the pipeline using both Phrap and MIRA to compare the assemblies. I expected the reads, not the contigs, to be the same in both ace files but they are not: I used "readseq -degap=\*" to reformat the sequences from the same read in each ace file in unpadded GenBank format to make it easier to compare them. For example:
diff Phrap.read.gb MIRA.read.gb 1,2c1,2 < LOCUS RA10-p04-K04_f 1286 bp NUCLEOTIDE UNA 29-Mar-2010 < DEFINITION RA10-p04-K04.f 1286 0 0 --- > LOCUS RA10-p04-K04_f 1151 bp NUCLEOTIDE UNA 29-Mar-2010 > DEFINITION RA10-p04-K04.f 1151 0 0 18,22c18,22 < 841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTT CACTCCAGTC TTTACGTTTT < 901 ATCACTTTTA ATTTTTGCTA AAGCCCTAAT AATCTCAATA TAATATTGCT TCCGTGCATT < 961 ATCCGTATTC CGCATTAACA GCTCAAAACC ATTTTCTAAG CCCAAAGAAT GGTTTTTATT < 1021 TTTTCAAAAG TCTTTAGTTT AAAATTTTTA ATCCCTAAAC CCAACCTCTC TCCCAATTAC < 1081 CAATTAACTA ATTTCATTAA AAGGGAATTT AAAATTTCTT GGAATTACCC CGAAAAACA --- > 841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTC ACTCCAGTCT TTACGTTTTA > 901 TCACTTTTAT TTTTGCTAAA GCCTAATAAT CTCAATATAA TATTGCTTCC GTGCATTATC > 961 CGTATTCCGC ATTAACAGCT CAAAACCATT TTCTAAGCCC AAAGAATGGT TTTTATTTTT > 1021 TCAAAAGTCT TTAGTTTAAA ATTTTTAATC CCTAAACCCA ACCTCTCTCC CAATTACCAA > 1081 TTAACTAATT TCATTAAAAG GGAATTTAAA ATTTCTTGGA ATTACCCCGA AAAACA
I'm not surprised that the number of padded bases is different, but I am surprised that MIRA has deleted bases from the read! This makes it impossible to use a 'phd.ball' created from the original 'phd' files, or the fasta.screen and fasta.screen.qual created by phredPhrap, to load quality values for the Sanger reads when using Consed to view an 'ace' file produced by a MIRA assembly. Consed complains that the bases in the ace file differ from the bases in the 'phd' file:
ace file: all_out.ace Version 19.0 (090206) exception thrown: ace file says base at nPaddedSequencePos 893 or nBases 880 is c but PHD file says t which on top strand is t for read RA10-p04-K04.f at line 23206 in file of phd files phd.ball
The reads in the MIRA ace file are consistent with the checkpointed MIRA read-pool, but the read-pool itself is different to the FASTA file read in by MIRA:
diff Fasta.read.gb chkpt.read.gb 1,4c1 < LOCUS RA10-p04-K04_f 1139 bp DNA UNA 29-Mar-2010 < DEFINITION RA10-p04-K04.f CHROMAT_FILE: RA10-p04-K04.f PHD_FILE: < RA10-p04-K04.f.phd.1 CHEM: term DYE: big TIME: Fri Mar 12 15:39:23< 2010 TEMPLATE: RA10-p04-K04 DIRECTION: fwd --- > LOCUS chkpt_read_1 1136 bp 20,24c17,25 < 841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTT CACTCCAGTC TTTACGTTTT < 901 ATCACTTTTA ATTTTTGCTA AAGCCCTAAT AATCTCAATA TAATATTGCT TCCGTGCATT < 961 ATCCGTATTC CGCATTAACA GCTCAAAACC ATTTTCTAAG CCCAAAGAAT GGTTTTTATT < 1021 TTTTCAAAAG TCTTTAGTTT AAAATTTTTA ATCCCTAAAC CCAACCTCTC TCCCAATTAC < 1081 CAATTAACTA ATTTCATTAA AAGGGAATTT AAAATTTCTT GGAATTACCC CGAAAAACA --- > 841 TTAATTCAAT GTCTAATCGT TCTATAGTTT TCTTGTTTTC ACTCCAGTCT TTACGTTTTA > 901 TCACTTTTAT TTTTGCTAAA GCCTAATAAT CTCAATATAA TATTGCTTCC GTGCATTATC > 961 CGTATTCCGC ATTAACAGCT CAAAACCATT TTCTAAGCCC AAAGAATGGT TTTTATTTTT > 1021 TCAAAAGTCT TTAGTTTAAA ATTTTTAATC CCTAAACCCA ACCTCTCTCC CAATTACCAA > 1081 TTAACTAATT TCATTAAAAG GGAATTTAAA ATTTCTTGGA ATTACCCCGA AAAACA > // > LOCUS chkpt_read_2 1 bp > ORIGIN > 1 N
The chromatogram 'scf' files are not accessible to MIRA, and I did not expect any automatic editing of the reads to be done:
Edit options (-ED): Automatic contig editing (ace) : [san] no [454] yes Sanger only: Strict editing mode (sem) : no Confirmation threshold in percent (ct) : 50
Am I misunderstanding something about how MIRA manages its reads?What I want to do is examine a MIRA assembly in Consed with library and quality information included: Normally, a 'phd.ball' file is used for this purpose. Newbler, for example, creates a 'phd.ball' automatically when producing an 'ace' file. Could MIRA create a 'phd.ball' to load quality scores and library info into Consed for the MIRA ace file?
Thanks, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis@xxxxxxxxxx, http://bioinformatics.rri.sari.ac.uk/~ajt -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html