[mira_talk] Re: Log files

On Sunday 29 March 2009 Jan van Haarst wrote:
> OK, here we go, the result of
> [...]
> I have attached the log.
> I hope you can fix this.

Hi Jan,

ouch. Ouch ouch.

You have 5.4M reads, the average size lets me think there are about 1.4M 
paired-end and around 4M FLX reads. Lower eukaryote in the 30 to 45MB range.

This log is ... disturbing. The 5.4M reads generate more than 4 *billion* 
possible overlaps in the skim part. The two files with 85GB are not logs, but 
needed temporary result files and therefore unavoidable. I think I need to fix 
SKIM for such cases.

Not many megahubs, which means it's probably not unclipped adaptor sequence 
causing troubles ... but that you have a hellishly repetitive genome

I'm not sure MIRA will cope with that beast. One way to try is to brutally 
mask the most repetitive sequence parts in the SKIM phase via -SK:mnr, but 
note that this will also put reads into the debris file that are 100% highly 
repetitive. On the other hand ... other assemblers do something similar.

You could play around with -SK:rt, starting at 20 just to see how it goes, 
then adjusting it slowly down. Also, upping the -SK:pr to around 80 or even 90 
might help.

In case you decide to try -SK:mnr ... please have a look at 
http://www.freelists.org/post/mira_talk/assembly-parameters-and-more,5 
starting with "Now, what can you do?". The two files generated by -SK:mnr are 
quite interesting: one allows you to get a good feeling what kind of repeats 
are causing harm, the other one is a histogram file that can be used to 
estimate a good -SK:rt cutoff value. If you want you can send me the histogram 
file and I'll give a walkthrough on how to do this.

Regards,
  Bastien



-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: