[mira_talk] Assembling the readrepeats

From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Fri, 18 Apr 2014 19:53:46 -0400

I'm working on a bacteria which has a lot of apparently repeatedsequences. The file in the "_d_info" directory whose name ends in_readrepeats.lst is 340 MB long. The repeat ratio tabulation in the logfile shows that there are lot of reads containing sequences repeated100's of time relative to most of the genome.

My goal is to figure out what's repeated. It's not practical to gothrough all the reads in the _readrepeats.lst file, so I want toassemble the data in the _readrepeats.lst file. In preparation, I havewritten a script that will remove all duplicated sequences in_readrepeats.lst as well as all sequences that are subsequences of anyother. This script will effectively reduce the size of the problem downmany fold -- the final sequence file is now 40MB.

The question is, what parameters should I use for assembling these nastyreads?

Here's the manifest file I've tried (backslashes have been removed forclarity's sake):


project = repeats

job = denovo,genome,accurate

parameters = COMMON_SETTINGS
             -SK:bph=31:mmhr=15
-HS:mnr=no:ldn=yes:fenn=0.0001:fexn=100:fer=101:fehr=102:fecr=103
             -GE:not=1
             -NW:cnfs=no:cmrnl=no:cac=warn
             TEXT_SETTINGS
             -CL:pec=no
             -AS:epoq=no:mrpc=1

readgroup
data = fa::repeats.fasta
technology = text
strain = Illumina

but it's clear the contigs are chimeras. Blasting and reviewing theresulting output is very feasible -- there's only 350 contigs andsinglets to look at. The final contig Fasta file is 214kb.

I haven't tried to optimize this manifest -- it gives me results that Ican interpret, but I wonder if there's a better solution out there.

If anyone has tried doing this, I'd appreciate hearing about yourexperience.


Thanks. --Bob

begin:vcard
fn:Robert Bruccoleri
n:Bruccoleri;Robert
org:Audacious Energy, LLC and Congenomics, LLC
adr:;;;;;;USA
email;internet:bruc@xxxxxxx
title:President
version:2.1
end:vcard

Follow-Ups:
- [mira_talk] Re: Assembling the readrepeats
  - From: Bastien Chevreux

[mira_talk] Assembling the readrepeats

Other related posts: