[mira_talk] miraSearchESTSNPs

  • From: "Ian Armstead [ipa]" <ipa@xxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Tue, 15 Jun 2010 17:21:16 +0100

HI Bastien

I've regressed from trying to analyse my 454 EST data to trying to get a handle 
on how  miraSearchESTSNPs calls SNPs, using a very simple artificial dataset.  
However, I can't seem to figure out why I get the results I do and I'd 
appreciate any explanations, pointers etc.

I've been using a 238bp sequence (derived from a 454 run) and introduced a SNP 
into it at position 107 - so all the reads assembled are identical except for 
this base. All bases are assigned the same quality score (50). I've then 
assembled variable numbers of replicate reads of each SNP type with one or two 
different strains to see how miraSearchESTSNPs calls the SNPs. I'm using 3.0.5 
with the command:

miraSearchESTSNPs  -project=xxxx -job=denovo,normal,454,esps1/2/3 -notraceinfo 
COMMON_SETTINGS -CO:asir=yes

My expectation was that in step1 all the reads would be assembled together to 
give a single contig with a SNP called at the variable base and that I would be 
able to control the sensitivity using the -CO:mrpg variable. What happens is:

When I use 2 strains with each SNP-type in different strains it calls SNPs 
differently depending on the relative numbers of each SNP type. When 10SNP1 + 
10SNP2 type reads are assembled together, step 1 identifies 2 contigs relating 
to the two SNP-types with no SNP-tags and  step 3 identifies a single contig 
with an IUPC-tag at the SNP base position. I get the same results using 
10SNP1/9SNP2, 10/8, 10,7 or 10/6 of the different type reads. However, when I 
use 10/5, 10/4, 10/3 or 10/2 SNP-type read proportions both the step1 and 3 
assemblies identify a single contig with an SROC-tag at the SNP position.

If I use just 1 strain (i.e. expecting to pick up SAOC tags) for 10/10, 10/9, 
10/8, 10,7 or 10/6 proportions of the different SNP-types step1 gives 2 contigs 
and step 3 an IUPC call, as for 2 strains. For 10/5, 10/4 etc., step 1 
identifies 2 contigs and step 3 is empty - the step2_reads.caf file contains a 
single contig with the quality score of the SNP base assigned a value of 0.

Also,  I can't seem to enforce the -CO:mrpg switch. The default for 454 in 
step1 is 4. However, it doesn't seem to affect the assemblies whatever it is 
set to. The log file indicates the number set by the -CO:mrpg, but it does not 
seem to influence the assemblies - unless I've misunderstood its function.

Thanks for any help you can give

Ian


Ian Armstead
IBERS
Gogerddan Campus,
Aberystwyth University,
SY23 3EB
UK
email: ipa@xxxxxxxxxx
tel: +44 (0)1970 823108




Other related posts: