Hi Jan,From your plot it seems that you have a lot of very small contigs that are non aligned. These are likely to be spurious. I would recommend filtering out these. As a rule of thumb, any contig that is slightly longer than the read length and has a coverage lower than 1/3 or 1/2 of the average coverage may be discarded.
You can use convert_project to do that. With an assembly of 454 Titanium data (read length ~400) and ~60X average coverage, I use
convert_project -f caf -t caf -x 500 -y 20 raw_assembly.caf filtered_assembly
On 9 Jul 2010, at 11:39 , Jan van Haarst wrote:
Dear All,I'm working on some benchmarks of assemblers[1], and one of those is MIRA. For that purpose, I have downloaded 3 datasets [2], and put those into MIRA after using sff_extract.I have run the assembly using the parameters as mentioned on the MIRA website :mira --project=$PROJECT --job=denovo,genome,accurate,454 I have used mira_3.2.0rc1_dev_linux-gnu_x86_64_static for this.What I see is that the resulting consensus is twice the size of the reference E. coli genome ! If I do a mummerplot of the consensus versus the reference (hopefully attached), I see that the complete reference is present, but also a lot of other data.I would like to know what I can do to get MIRA to give about the same (or better) results than newbler or CABOG using this dataset.-- Dag, Jan [1] https://wiki.nbic.nl/index.php/Raw_results_of_NGS_de_novo_assembly [2] ENA SRR00086ENA SRR000870ENA SRR001028 <mira_vs_reference_filtered_SNP.png>
============================================ Lionel Guy Thunmansgatan 25, SE-75421 Uppsala phone: +46 (0)18 245596 mobile: +46 (0)73 9760618 email: guy.lionel@xxxxxxxxx ============================================ -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html