On 2013-03-27, at 3:31 PM, Peter Cock wrote: > On Wed, Mar 27, 2013 at 7:18 PM, John Nash <john.he.nash@xxxxxxxxx> wrote: >> Good afternoon, >> >> I need to do some samtools mpileups and I have been trying to generate BAM >> files from mira's output without success. >> >> The data are from mapping assemblies using various versions of the 3.9.x >> series of mira. >> >> A. If I use convert_project to generate a sam file from a maf file, all >> goes well, and I can view the assembly in Tablet. However, either doing: >> >> $ samtools view -bS STRAIN.sam > STRAIN.bam >> >> or, after generating an index from the STRAIN_unpadded.fasta file using >> samtools faidx: >> >> $ samtools view -bt STRAIN_unpadded.fasta.fai STRAIN.sam > STRAIN.bam >> >> generates a Parse error at line 84: unmatched CIGAR operation > > What is line 84 of the file? Hi Peter, It's line 86 (not 84) in the case that I am reporting below because I am using a different assembly to the one mentioned above but this one also gave the same problem. I had 3 crashing at once and I cannot remember which crashed at line 84 and which at line 86, etc. - but it's the same story... I'm not a sam file expert but the offending line looks like a HUGE string, probably representing the genome I reference mapped to (it's S. enteritidis). It starts with: NC_011294 0 NC_011294_bb 527 50 43M1D23M2D10M2D76M1D26M1D10M1D27M1D8M1D27M1D4M1D8M1D11M1D33M2D2M1D10M1D17M1 D75M1D22M1D38M1D5M1D10M2D12M1D10M1D2M1D55M1D51M2D15M1D11M1D21M1D25M1D12M1D13M1D3M1D98M1D23M1D62M1D35M1D9M1D2M1D7M1D10M1D28M1D32M1D5 M1D8M1D114M1D27M1D61M1D8M1D24M1D6M1D11M1D7M1D42M1D30M1D37M1D14M1D41M1D10M1D11M1D27M1D20M1D38M1D58M1D66M1D307M1D4M1D17M1D8M1D12M1D18 1M1D36M1D5M1D4M1D24M1D7M1D50M1D23M1D7M1D7M1D25M1D4M1D6M1D10M1D24M1D2M1D15M1D5M1D45M1D16M1D21M1D7M1D10M1D18M1D16M1D40M1D75M1D15M1D20 M1D47M1D4M1D18M1D2M1D7M1D102M1D1M1D31M1D38M1D69M1D58M1D3M1D25M1D5M1D18M1D6M1D12M1D50M1D25M1D4M1D14M1D51M1D27M1D4M1D26M1D24M1D10M1D3 and goes on for several megabases. > Are you (accidentally) using a padded SAM with an unpadded FASTA? > If so, first you need to run 'samtools depad' to get a more traditional > SAM file using an unpadded reference. This should be in the latest > samtools 0.1.19 release (and can do SAM to BAM at the same time). I've usually used Mira's unpadded.fasta as a reference w/o issues but good advice should always be taken :) $ samtools depad -S SE_10-101_mira454MappingSE.sam -T SE_10-101_mira454MappingSE_out.padded.fasta -so fred.sam [samopen] SAM header is present: 37 sequences. Parse error at line 86: unmatched CIGAR operation Aborted (core dumped) *sigh... Using the unpadded fasta file caused it to tell me to use the padded.fasta file :) >> B. Peter Cock's maf2sam.py (version 2.0) hangs when I try to convert the >> maf file to a sam file. > > I never updated it to cope with the MAF from MIRA v3.9.x, but > if you fancy sharing the data I can have a look at make it at least > give a clear error. There seemed little point as MIRA v3.9.x has > native SAM output. Fair enough, I was grasping at straws by then. I don't want to give you extra work but if you would be kind enough to look at a 330 megabyte gzipped maf file, I can send you the link backchannel as soon as it uploads to Dropbox. > >> D. picard-tools SamFormatConverter crashes. >> >> I have tried it with at least 3 different reference assemblies. I tested this with a few more genomes, and it might be only occurring with 454 reads mapped to a reference - I have to check this further. The hybrid 454/illumina assembly that I just tried converted from maf to sam to bam fine. > Did that help? It's helping... I was using Mira because I had a bunch of Mira reference assemblies already done, and needed a reference assembly as a bam file so that I could run samtools mpileup. I ended up getting it to work using the MOSAIK tools. Thanks John -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html