[mira_talk] Re: new version 2.9.45 and solexa

On Mittwoch 13 Mai 2009 Jan Paces wrote:
> Previous versions of mira did not recognized solexa naming scheme (/1
> /2), I had to rename all reads (using .r .f). Now it seems mira can use
> the original naming scheme? Can also .r.f scheme be used for solexa reads?

Yes. You just have to set the naming scheme you want for the sequencing type 
you want (-LR:rns=...).

> Also quality score previously used by mira was phred score. Now it looks
> mira can use solexa quality scoring as well. Am I right? What is the
> default? Is it possible to explicitly tell mira which scoring scheme
> should be used? Because both schemes are very similar, can we expect no
> big changes in assembly?

MIRA has a kind of autodetection for the quality schemes. If negatgive values 
are present in the quality files, it's Solexa scoring. No negative values is 
guessed to be phred scoring. Forcing MIRA to treat the quality values as phred 
style from the beginning on can be done with -LR:ssiqf=no

I cannot imaging the scoring scheme having any impact on the assembly itself, 
they just differ in the very low quality scores. You might have a tiny 
difference in consensus quality scores for bad areas with low coverage and bad 
base qualities, but even that should be very rare.

> My first impression is, that mapping with new version is very slow.
> for a contig of the size 3.3k and average coverage of solexa reads 10x
> it took 25 minutes:

Jan, would you be interested to shift your interest away from eukaryotes with 
100MB towards something smaller, say ... a nice Bacillus with 4MB for example? 
:-) You are pushing MIRA a bit.

But these times hurt. I think I have an idea what caused this: setting up a 
contig as backbone and then preparing all reads for a mapping it is 
comparatively costly at the moment (uses same routines as de-novo). If you had 
only a couple of contigs it does not matter, but for unfinished genomes ...

Actually, can you please send me the whole log file for the building of contig 
1? I'd like to check a few things?

I cannot promise quick help, but I'll have a look.

> these are parameters I used for mapping:
>
> <code>
> mira
> -project=mi_v8 -job=mapping,genome,solexa,normal
> COMMON_SETTINGS
> -GE:not=16:uti=on:tismin=2300:tismax=2700
> -SB:lb=on:sbuip=1:bft=caf
> -AS:nop=3 -DP:ure=on -ED:ace=off
> -SK:mnr=on:bph=14:hss=4
> SOLEXA_SETTINGS
> -GE:uti=on:tismin=2300:tismax=2700
> </code>

Set mapping of short Solexas, set -AS: nop=1. Furthermore, do not switch on -
DP:ure (only if you have clipped longer reads in the lot).

Lastly, -SK:mnr is not needed for pure mapping assemblies: MIRA will evenly 
distribute repeats by itself and the whole megahub problematic is virtually 
non-existent for mappings.

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: