[mira_talk] Re: reducing number of Illumina reads

  • From: Egon Ozer <e-ozer@xxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 21 Apr 2011 17:27:56 -0500

fastqSample distributed with the wgs-assembler package can subset your fastq 
files for you.  I haven't checked to see if it picks random or series (i.e. 
every 3rd or every 4th, etc) of reads.  

There's a little blurb about it at the bottom of this page:
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=FastqToCA#fastqSample

Otherwise it wouldn't be too tough to write a quick perl script to output every 
fifth read from your data set to another file.

- E


On Apr 21, 2011, at 5:08 PM, Robert Bruccoleri wrote:

> However, there is a problem with taking this approach -- you could 
> significantly change the statistics for the determination of repetitive 
> regions and cause misassemblies as a result. 
> 
> Mira will generate coverage equivalent reads which can help in this situation.
> 
> --Bob
> 
> ALLO (Alfredo Lopez De Leon) wrote:
>> 
>> You can try to collapse your reads with the FASTX toolkit  this will leave 
>> you with a set made of unique reads.
>> This method will preserve the sequence coverage and remove the redundancy.
>> http://hannonlab.cshl.edu/fastx_toolkit/
>>  
>> AlLo
>>  
>> From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] 
>> On Behalf Of Goldman, Thomas
>> Sent: Thursday, April 21, 2011 2:27 PM
>> To: mira_talk@xxxxxxxxxxxxx
>> Subject: [mira_talk] reducing number of Illumina reads
>>  
>> Hello all,
>>  
>> I have a 24GB RHEL5 machine on which I was able to do a de novo assembly of 
>> 454 paired-end and fragment reads (~1.5 million reads). I also have about 6 
>> million 36bp Illumina reads.
>>  
>> I would like to:
>> 1)      Map the Illumina reads to the 454 backbone
>> 2)      Include the Illumina reads with the 454 reads for a de novo assembly
>> But I believe I don’t have enough memory to handle all the Illumina reads. I 
>> think my VM could handle maybe 20% of the Illumina reads. What is the best 
>> way to reduce the Illumina reads used for the mapping and/or the de novo 
>> assemblies? Would it be to just randomly pick 20% of the reads out of the 
>> fastq file? Is there a tool out there I could use for this?
>>  
>> Thanks,
>> Tom
> 
> <bruc.vcf>

Other related posts: