[mira_talk] Re: Log files
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Mon, 30 Mar 2009 23:37:36 +0200
On Sunday 29 March 2009 Jan van Haarst wrote:
> OK, here we go, the result of
> [...]
> I have attached the log.
> I hope you can fix this.
Hi Jan,
ouch. Ouch ouch.
You have 5.4M reads, the average size lets me think there are about 1.4M
paired-end and around 4M FLX reads. Lower eukaryote in the 30 to 45MB range.
This log is ... disturbing. The 5.4M reads generate more than 4 *billion*
possible overlaps in the skim part. The two files with 85GB are not logs, but
needed temporary result files and therefore unavoidable. I think I need to fix
SKIM for such cases.
Not many megahubs, which means it's probably not unclipped adaptor sequence
causing troubles ... but that you have a hellishly repetitive genome
I'm not sure MIRA will cope with that beast. One way to try is to brutally
mask the most repetitive sequence parts in the SKIM phase via -SK:mnr, but
note that this will also put reads into the debris file that are 100% highly
repetitive. On the other hand ... other assemblers do something similar.
You could play around with -SK:rt, starting at 20 just to see how it goes,
then adjusting it slowly down. Also, upping the -SK:pr to around 80 or even 90
might help.
In case you decide to try -SK:mnr ... please have a look at
http://www.freelists.org/post/mira_talk/assembly-parameters-and-more,5
starting with "Now, what can you do?". The two files generated by -SK:mnr are
quite interesting: one allows you to get a good feeling what kind of repeats
are causing harm, the other one is a histogram file that can be used to
estimate a good -SK:rt cutoff value. If you want you can send me the histogram
file and I'll give a walkthrough on how to do this.
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: