On Monday 28 March 2011 19:44:45 Arun Rawat wrote: > I assembled bacterial genome from 454 Titanium that resulted in around 500 > contigs overall. That is a fairly high number of contigs. Way to high to my likings. What's the coverage of these? > Now I am trying to use these sets of contigs to map paired reads to > generate larger contigs. To test the results, I ran with default > parameters: mira --project=mapSX_454 --job=mapping,genome,accurate,solexa > -GE:not=16 -AS:nop=1 -SB:bft=caf >&log_assembly.txt > > The results came pretty good with higher N50, lesser number of contigs > (~300) etc as mentioned in info_assembly.txt Ahmmm ... something's not right here. If 500 contigs came in, 500 must come out of a mapping assembly in the "All contigs" category: MIRA does not join contigs in mapping. Can you please do a count on your input CAF for the number of contigs with grep -c Is_contig mapSX_454_backbone_in.caf and tell what that gives you? > However when I try to access these contigs from *.padded.fasta, I get > around 120 contigs that are lot shorter rather than the ones I expect as > mentioned in the info_assembly. Which number in the assembly are you refewrring to? There are quite a few ... :-) > Can you please help me how/where can I > access the (solexa + 454 mapped) contigs that are mentioned in the > info_assembly file. All data is in the *_out.* files. If you want to filter your results for values, use convert_project on either a MAF (recommended, fast) or a CAF file (less recommended, slow). Like this to get all contigs which are at least 500bp long and have at least an average coverage of 10: convert_project -f maf -t caf -t fasta -x 500 -y 10 input.maf filteredproject Hope that helps, Bastien