Hi, I'm software guy making an assembly for the cowpea. Our group has vector trimmed, quality trimmed, methyl-filtered, gene space reads from Sanger sequencing. MIRA does a good job assembling the 225,000 reads into approximately 61,000 contigs and 1500 singlets. The MIRA docs are great, but as a software guy I'm a bit weak on stats and molecular biology. We've got a 16 processor machine with 128GB of RAM, so I've got nice hardware to play with. With the first pass done I'd like to improve/extend the contigs with 140,000 cowpea ESTs from HarvEST and with with 51,000 BAC end sequences (BES) from the Legume Information System. Several things "worked", but I'm not sure what I should have done, and I'm not sure how to evaluate the quality of the asssembly. Which is better, or more logical (I've tried both): 1) Throwing the GSRs and ESTs into one big file, then run MIRA as "genome,denovo". 2) Two steps: (a) contig the GSRS, (b) the map the ESTs on using the unpadded.fasta from (a) as the backbone. My third question is basically what to do about repeats in the BES. When I tried throwing the GSR contigs into a big fasta file with the BES, MIRA complained about 1 megahub. I'm still adjusting nrr to see if I can clear that up. The combined fasta file is simply cat gsr.fast bes.fasta > mira_input.fasta Thanks, Tom