[mira_talk] Re: caf2gap

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 13 Sep 2011 19:42:35 +0200

On Sep 13, 2011, at 14:46 , Robert Bruccoleri wrote:
>     The problem is the large size of the CAF file. There are two other 
> possibilities. First, use convert_project to reduce the number of contigs in 
> the caf file, e.g.:
> 
> convert_project -f caf -t caf -x 2000 -y 10 source.caf dest
> 
>     will make a new CAF file with only contigs bigger than 2000 bp.

Brian is doing a mapping, I have my doubts that the strategy of filtering only 
for large contigs will work. BTW, using MAF as input for the conversion is 
faster and uses a bit less memory.

>     Second, use cafcat and the -fofn option to select contigs of interest.

Ummm, yes, cafcat is a possibility. Which I never use: "convert_project -n" 
does the same there.

Brian: you might nevertheless want to try the splitting approach. In fact, for 
many chromosomes/contigs of your reference sequence that should work. However, 
in case the reference sequence has collapsed repeats (I've seen reference 
sequences with just one rRNA copy as placeholder for >100 rRNA copies across 
the genome), you might still have the caf2gap problem for that 
chromosome/contig if that repeat is quite polymorphic.

If that's the case, just drop a note here and I'll jot down an undocumented 
trick which I developed to get around that problem.

B.


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: