[mira_talk] Re: mira denovo assembly of solexa reads

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 17 Jan 2011 19:49:55 +0100

On Monday 17 January 2011 15:51:09 Lionel Guy wrote:
> I would take between an 8th and a 5th of your data. You could then either
> map the rest of your data to the de novo assembly to increase the quality,
> or do multiple assemblies with multiple subsets and compare the resulting
> assemblies.
> To subset (provided that you have fastq files, one line per sequence), you
> can: - use 'head -n X file1.fastq > file1_filt.fastq', where X is the
> number of reads you want * 4 (don't forget to do that on both files). -
> use a script to do that. Attached is the perl script I use to do that. One
> possible command would be perl fastqSampler.pl -q file1.fastq -l 8 >
> filt1_filt.fastq

Subsampling is one method which should work until I find a fix for these 
annoyingly overcovered projects.

Saima: first, may I encourage you to switch to 3.2.1? It will be significantly 
faster in the first phase (SKIM). Then: you may want to try this change:

  -CL:pecbph=27 SOLEXA_SETTINGS -SK:bph=27:pr=99:mhpr=100

If the above still does not work, use the subsampling script of Lionel and 
reduce the number of reads by 75% (take every 4th read) or so.


You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 

Other related posts: