[mira_talk] Re: force MIRA to produce only singletons and no debris

Hello everybody,

I am trying a de-novo "hybrid assembly" with 454 and Sanger reads,
where my Sanger reads are in fact high quality contigs. I would like
that in case my "sanger" reads do not assemble, they still end-up as
singlet in the assembly file as they contain fully trusted seqences.

I tried this command with a reduced dataset, with all qualities of
"sanger" reads artificially set to 40. The reduced dataset contains
~5000 454 true reads and ~5000 sanger reads.

/media/STORAGE/1b_ref_transcriptome/ref_assembly_4/mira_reference_assembly4/mira_3.2.1_prod_linux-gnu_x86_64_static/bin/mira
--project=cleaned --job=denovo,est,normal,454,sanger -notraceinfo
-GE:not=8 -AS:nop=4 454_SETTINGS -AL:mo=45,mrs=95 -CO:fnicpst=yes
-CL:cpat=no SANGER_SETTINGS -LR:wqf=no
-AS:mrl=50,epoq=no,ardml=200,ardgl=20,mrpc=1
-CL:cpat=yes,emlc=no,mbc=yes,qc=no -AL:mo=45,mrs=95
-OUT:sssip=yes,stsip=yes > screen

I am satisfied with the assembly itself, but performing a blast tells
me that 1437 sanger reads I inputed do not have any counterpart in the
cleaned_out.unpadded.fasta file... how can that be? On what basis are
these sequences discarded? I will end up fetching all those sanger
input that do not have a hit in the final fasta file produced, but I
am still curious.

Best,

Yvan




On Thu, Feb 17, 2011 at 1:41 AM, Jeremy Volkening
<volkening@xxxxxxxxxxxxx> wrote:
> On Wed, 2011-02-16 at 22:19 +0100, Bastien Chevreux wrote:
>
>> In the current settings, two parameters are the main one if it relates to 
>> keeping/loosing rare variants: -AS:mrps (minimum reads per contigs) and the
>> -OUT:sssip:stsip parameters.
>>
>> The current defaults are such that just the very rarest variants end up in 
>> the debris file: -AS:mrpc defaults to 2 for Sanger, 2 for 454 and 4 for 
>> Solexa when run
>> in de-novo EST mode.
>
> This helps - at least, using these parameters, I can make sure that rare
> variants get into the assembly files.
>
>> Does
>>
>>   
>> http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect1_est_difference_assembly_clustering
>>
>> help to clarify things?
>
> Somewhat. I'm still not entirely clear on how the interaction of the
> different parameters affects the decision to include a read in a contig
> or split it off into a new one. The defaults for -AL:mrs are 85/80/80,
> so a single base difference shouldn't prevent a read from being
> assembled into a contig, but yet the manual section you referenced
> above, along with your earlier post in this thread, seem to indicate
> that SNPs are split into separate contigs. What occurs after the initial
> alignment causing this split, and are there parameters that will force
> mira to assemble reads containing SNPs into the same contig? I've tried
> setting -CO:mroir=1:asir=1 but this doesn't seem to have the desired
> effect.
>
> Jeremy
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html
>

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: