On Sep 13, 2011, at 14:46 , Robert Bruccoleri wrote: > The problem is the large size of the CAF file. There are two other > possibilities. First, use convert_project to reduce the number of contigs in > the caf file, e.g.: > > convert_project -f caf -t caf -x 2000 -y 10 source.caf dest > > will make a new CAF file with only contigs bigger than 2000 bp. Brian is doing a mapping, I have my doubts that the strategy of filtering only for large contigs will work. BTW, using MAF as input for the conversion is faster and uses a bit less memory. > Second, use cafcat and the -fofn option to select contigs of interest. Ummm, yes, cafcat is a possibility. Which I never use: "convert_project -n" does the same there. Brian: you might nevertheless want to try the splitting approach. In fact, for many chromosomes/contigs of your reference sequence that should work. However, in case the reference sequence has collapsed repeats (I've seen reference sequences with just one rRNA copy as placeholder for >100 rRNA copies across the genome), you might still have the caf2gap problem for that chromosome/contig if that repeat is quite polymorphic. If that's the case, just drop a note here and I'll jot down an undocumented trick which I developed to get around that problem. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html