[mira_talk] Re: SNP mining from ESTs using mira--repost

On Friday 12 December 2008 00:34, Pavel Sova wrote:
> Is there a way of mining SNPs from EST databases?  I have an EST
> database that should contain "intraorganismal" SNPs (as most genes are
> expressed from both chromosomes; the EST collection wash obtained from
> one animal).  However, mira outputs "pristine" transcript contigs that
> have been sorted apart according to presence of SNP; I am looking for a
> way of putting them all together again to tag SNPs existing in ~50/50
> ratio and then extract this info.

Hello Pavel,

no, there is no dedicated mode that supports this in MIRA. The difficulty for 
this kind of job would be, I suppose, to separate SNPs from very similar 
homologues.

That being said, there are a few switches that could be twisted a bit to try 
and see whether it could be possible.

-CO:asir=yes  will tell MIRA to assume that differences are SNPs and not
              repeats, hence it will keep those reads together in a contig

-AL:mrs       set this to a high value (at least 90, I'd first even test 95 %)
              to force only very similar sequences to align in the hope that
              most homologues have a difference > than 5 to 10%. This is, of
              course, not always the case.

-AL:egp=yes:egpl=reject_codongaps:megpp=100
              Sometimes we get "lucky" and homologues that are virtually
              identical differ by one or a few additional amino acids, which
              translates to a gap in the DNA of at least three bases. The
              three switches above ensure that no alignment is made that
              includes a gap with three or more gaps. Of course, you are hosed
              if the "SNP" translates into this kind of difference.

-CO:rodirs    for EST assemblies, set this to a low value (10 or less) to
              prevent misalignments of very similar transcripts. Though you
              will separate a few sequences into singlets that should belong
              to contigs, but those will then mostly be sequences that have
              more sequencing errors than they should.

-CL:prc=yes   (for MIRA >= 2.9.37) if you are hunting SNPs anyway and don't
              care about the maximum consensus you could get out of
              transcripts, you might want to test this one. It implements a
              right clip that makes sure that the 3' end of your sequences are
              of high quality as it enforces the 30mer at the end to be
              present at least in one other sequence in your data set.
              The drawbacks: you will loose all singlets (they'll end up as
              debris) and many of the very lowly expressed sequences will be
              highly truncated in the contigs. Furthermore, you might loose
              a few SNPs at the very 3' end of your sequences, but this would
              happen only for low coverage SNPs in regions that have a high
              sequencing error anyway.
              Therefore, while this switch is not recommended for 'normal' EST
              assembly, for hunting SNPs as you do it might be a nice plus.

Don't forget that a few of the swicthes above are adjustable for every 
sequencing technology, you might need to set them for Sanger and 454 
sequences if you are using hybrid assembly.

I suggest to start MIRA like this for transcripts with Sanger reads:

 "mira ... --job=denovo,est,normal,sanger ... (the switches described above)"

With transcripts from 454:

 "mira ... --job=denovo,est,normal,454 ... 
       (the switches from above that are technology independent)
       454_SETTINGS (the switches described above that need technology
                     dependent settings)"

With transcripts from Sanger and 454:

 "mira ... --job=denovo,est,normal,sanger,454 ... 
       (the switches from above that are technology independent)
       SANGER_SETTINGS (the switches described above that need technology
                        dependent settings)
       454_SETTINGS (the switches described above that need technology
                     dependent settings)"

E.g. for the later case:

  mira --project=MyProject --job=denovo,est,normal,sanger,454
       -CO:asir=yes -CL:prc=yes
       SANGER_SETTINGS -AL:mrs=95:egp=yes:egpl=reject_codongaps:megpp=100
                       -CO:rodirs=10
       454_SETTINGS -AL:mrs=95:egp=yes:egpl=reject_codongaps:megpp=100
                     -CO:rodirs=10


Hope that helps.

Regards,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: