[mira_talk] Re: highly polymorphic species EST assembly

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 17 Jan 2011 22:19:02 +0100

On Thursday 13 January 2011 12:43:03 Jorge.DUARTE@xxxxxxxxxxxx wrote:
> Problem is, it is difficult to tune the parameters with such long assembly
> times... (1st pass was completed in one week for default parameters, and i
> had to kill the process because of a shut down of our server for
> maintainance)

I am a bit surprised that 450k reads need 1 week for a first pass. May I have 
the output log to see what takes so long? I'd guess that there are a couple of 
very highly expressed transcripts.

> So my question is, which parameters would be the best in order to have a
> good compromise between :
> - good quality assembly with nice SNPs and SNP flanking sequences (for
> which i would normaly use mrs=95) on one side
> and
> - speed on the other side

Default parameters are made for that, that's why I currently have no idea.

> i am planning to test mrpc=4, mrs=95 and mhpr=20, 10 or even 5
> 
> Can someone tell me what will be the impact of such an assembly on
> accuracy versus the default behaviour of mira ? (ie. decreasing the mhpr
> too much)

Reducing mhpr will reduce the number of reads going into Smith-Waterman. If 
the SW phase eats up much time, it may be a possibility. though reducing it 
too much is not recomended, will lead to nice join not being found.

> what other parameters could i modify in order to get a quick draft
> assembly done on such high polimorphic species data in order to be able to
> test other parameters in a reasonable time ? (the memory is not a problem,
> as i have access to 1Tb)

Hving the log to read would help to guess where the bottleneck is.

B.

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: