On Mittwoch 16 Dezember 2009 Tom wrote: > [...] > With the first pass done I'd like to improve/extend the contigs with > 140,000 cowpea ESTs from HarvEST and with with 51,000 BAC end sequences > (BES) from the Legume Information System. 'Improving' is a pretty vague term :-) Please remember that MIRA is an assembler and not a clustering program, so as soon as it detects enough valid information for allellic SNPs, it'll create two or more contigs. Rightly so, as these are truly different mRNAs in the cell. The downside: throwing together ESTs from different sources is bound to create different contigs as soos as a SNP is detected. You can mitigate these problems a bit by trying tio use -CO:mr=no and then using very stringent alignment values (90% and upwards in -AS:mrs), but my personal experience with this is mixed. Sometimes it works, sometimes not. A better way would be to use strain information. > 1) Throwing the GSRs and ESTs into one big file, then run MIRA as > "genome,denovo". That's what I'd try first. Remember to use strain information to allow MIRA to perhaps throw together sequences from different strains that have only a low number of differences. > 2) Two steps: (a) contig the GSRS, (b) the map the ESTs on using the > unpadded.fasta from (a) as the backbone. Also possible, but 1) would be better if it works. > My third question is basically what to do about repeats in the BES. When > I tried throwing the GSR contigs into a big fasta file with the BES, > MIRA complained about 1 megahub. I'm still adjusting nrr to see if I can > clear that up. Remember that everything masked as nasty will not contribute to finding alignments. As you only have 275k sequences, you might want to try just ignoring them (-SK:mmhr=1 or similar). Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html