On 2011-10-17, at 3:13 PM, Bastien Chevreux wrote: > On Oct 17, 2011, at 19:27 , John Nash wrote: >> I just did a mapping assembly of the results of a de novo assembly. It's an >> E. coli strain with 40x 454 and 70X Illumina. After I did the de novo >> assembly, I mapped it against K12 and now I am going to BLAST the non-K12 >> contigs to see what I have. >> >> I took the caf file and did: >> >> $ convert_project -f caf -t fasta -t caf -x 500 -y 30 PROJ_out.caf >> PROJ_out_LargeContigs >> >> When I look at the results (ls -1 *.unpadded.fasta) so as not to list the >> padded results and the qual files, I get: >> PROJ_LargeContigs_AllStrains.unpadded.fasta >> PROJ_LargeContigs_ReferenceStrain.unpadded.fasta >> PROJ_LargeContigs_default.unpadded.fasta >> >> I understand what the ReferenceStrain is - one contig was a HUGE hint :) >> >> I also know that the other two files have a mapping to the backbone strain >> as an entry. What else is different between them. >> >> (I did do a Google search and read the manual, promise). > > Yeah, the manual is somewhat incomplete *cough* > > You probably did assembly and mapping with MIRA 3.2.x, right? Else there > would be "StrainX" somewhere as strain. Anyway, here's the short version: I thought I had updated all of my mira installations to 3.4 so I checked where keep my tarballs. It's from mira_3.4.0_prod_linux-gnu_x86_64_static.tar.bz2 My command params were: mira --project=PROJ_mapping --job=mapping,genome,accurate,sanger --caf=PROJ_mapping_in.sanger.caf COMMON_SETTINGS -GE:kpmf=15:not=40 -MI:somrnl=0 -SB:bft=gbf:abnc=yes In the contigs, the backbone strain is referred to as NC_000913_bb, and its sequence is the one in ReferenceStrain > - every read, be it backbone or normal shotgun, can have a strain name > attached as info to it (see parameters -SB:bsn:dsn:lsd) > - it is entirely possible to map more than one strain against a backbone, > hence the need to extract strain specific consensus > - in earlier versions of MIRA, shotgun data was assigned the strain name > "default". Not a good name, I've changed the default to StrainX in 3.4.x (see > also -SB:dsn). Backbone sequences get per default "ReferenceStrain" since > early 2.9 version I think. > > > So, assume you mapped a strain against a backbone, and you get in the > multiple alignment: > > ReferenceStrain gtacg*tcagtcg > default/StrainX read1 gtacgAtTag*cg > default/StrainX read2 gtacgAtTag*cg > default/StrainX read3 gtacgAtTag*cg > etc.pp > > (note the differences) > > Now, if you extract a consensus, convert_project will create one consensus > sequence for each strain and one for all strains together. In the case above > it will be three sequences: > > ReferenceStrains: gtacgtcagtcg > default/StrainX: gtacgAtTagcg > AllStrains: gtacgWYWRKcg > > There are of course interesting borderline cases. E.g.: what happens when a > strain is not covered at all? > > ReferenceStrain > gtacggtatatgtgtgtgcagttgcagtgtcagtactggtgtcagttggtcagtgtacgt > default/StrainX read1 gtacggtata > default/StrainX read2 gta > default/StrainX read3 > tgtacgt > default/StrainX read4 > cgt > > Here, you will get this: > > ReferenceStrain: > gtacggtatatgtgtgtgcagttgcagtgtcagtactggtgtcagttggtcagtgtacgt > default/StrainX: > gtacggtata@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@tgtacgt > AllStrains: > gtacggtatatgtgtgtgcagttgcagtgtcagtactggtgtcagttggtcagtgtacgt Thanks - this makes it clear to me. > > However, while the @-character was perhaps a nice idea to be very clear that > there's something to be careful, third party programs are not prepared for > that (understandably so). So, since MIRA 3.4, the @ has been replaced by "X". My PROJ_ReferenceStrain.unpadded.fasta has: >NC_000913_bb @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@agcttttcattctgactgcaacgggcaatatgtctctgtgtggattaaaaaaagagtg tctgatagcagcttctgaactggttacctgccgtgagtaaattaaaattttattgactta ggtcactaaatactttaaccaatataggcatagcgcacagacagataaaaattacagagt acacaacatccatgaaacgcattagcaccaccattaccaccaccatcaccattaccacag gtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgcg Is my 3.4.0 out of date? > > Hope this helps. > > B. You always help :) Thanks John