I am trying to assemble a bacterial genome based on Sanger paired-end reads but have found that XML and ancillary files obtained from NCBI contain completely untrustworthy (i.e. nearly all bogus) values of the quality and vector trimming coordinates. Given that I know the sequences of the vectors and adapters what are good approaches to use these data with mira? My reading of the available documentation is that the XML file contains the information needed to treat pairs as pairs. Assuming that I can clip away vectors (e.g. with SSAHA) and low quality ends (somehow) prior to running mira, is there a parameter that will allow me to use the template information from the XML but ignore vector and quality trim coordinates? An alternate that comes to mind is to modify the XML file itself. If that is feasible, could I simply remove CLIP_VECTOR and CLIP_QUALITY blocks? Does any one have any suggestions as to how I might proceed? ... For the record here is the last command that I used to produce an assembly that had 1687 large contigs: >mira -project=e_hyb -fasta -job=denovo,genome,normal,sanger,454 \ -highlyrepetitive -DP:ure=yes -CL:pvlc=yes \ 454_SETTINGS -CL:emrc=yes:qc=no SANGER_SETTINGS -CL:qc=yes(The project also contains, non-paired-end 454 reads and -- yes --- lots of plasmids with diverse copy numbers).
Thanks mira talkers, Eric C. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html