[mira_talk] Re: SNP mining from ESTs using mira--repost
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Sat, 13 Dec 2008 12:43:36 +0100
On Friday 12 December 2008 00:34, Pavel Sova wrote:
> Is there a way of mining SNPs from EST databases? I have an EST
> database that should contain "intraorganismal" SNPs (as most genes are
> expressed from both chromosomes; the EST collection wash obtained from
> one animal). However, mira outputs "pristine" transcript contigs that
> have been sorted apart according to presence of SNP; I am looking for a
> way of putting them all together again to tag SNPs existing in ~50/50
> ratio and then extract this info.
Hello Pavel,
no, there is no dedicated mode that supports this in MIRA. The difficulty for
this kind of job would be, I suppose, to separate SNPs from very similar
homologues.
That being said, there are a few switches that could be twisted a bit to try
and see whether it could be possible.
-CO:asir=yes will tell MIRA to assume that differences are SNPs and not
repeats, hence it will keep those reads together in a contig
-AL:mrs set this to a high value (at least 90, I'd first even test 95 %)
to force only very similar sequences to align in the hope that
most homologues have a difference > than 5 to 10%. This is, of
course, not always the case.
-AL:egp=yes:egpl=reject_codongaps:megpp=100
Sometimes we get "lucky" and homologues that are virtually
identical differ by one or a few additional amino acids, which
translates to a gap in the DNA of at least three bases. The
three switches above ensure that no alignment is made that
includes a gap with three or more gaps. Of course, you are hosed
if the "SNP" translates into this kind of difference.
-CO:rodirs for EST assemblies, set this to a low value (10 or less) to
prevent misalignments of very similar transcripts. Though you
will separate a few sequences into singlets that should belong
to contigs, but those will then mostly be sequences that have
more sequencing errors than they should.
-CL:prc=yes (for MIRA >= 2.9.37) if you are hunting SNPs anyway and don't
care about the maximum consensus you could get out of
transcripts, you might want to test this one. It implements a
right clip that makes sure that the 3' end of your sequences are
of high quality as it enforces the 30mer at the end to be
present at least in one other sequence in your data set.
The drawbacks: you will loose all singlets (they'll end up as
debris) and many of the very lowly expressed sequences will be
highly truncated in the contigs. Furthermore, you might loose
a few SNPs at the very 3' end of your sequences, but this would
happen only for low coverage SNPs in regions that have a high
sequencing error anyway.
Therefore, while this switch is not recommended for 'normal' EST
assembly, for hunting SNPs as you do it might be a nice plus.
Don't forget that a few of the swicthes above are adjustable for every
sequencing technology, you might need to set them for Sanger and 454
sequences if you are using hybrid assembly.
I suggest to start MIRA like this for transcripts with Sanger reads:
"mira ... --job=denovo,est,normal,sanger ... (the switches described above)"
With transcripts from 454:
"mira ... --job=denovo,est,normal,454 ...
(the switches from above that are technology independent)
454_SETTINGS (the switches described above that need technology
dependent settings)"
With transcripts from Sanger and 454:
"mira ... --job=denovo,est,normal,sanger,454 ...
(the switches from above that are technology independent)
SANGER_SETTINGS (the switches described above that need technology
dependent settings)
454_SETTINGS (the switches described above that need technology
dependent settings)"
E.g. for the later case:
mira --project=MyProject --job=denovo,est,normal,sanger,454
-CO:asir=yes -CL:prc=yes
SANGER_SETTINGS -AL:mrs=95:egp=yes:egpl=reject_codongaps:megpp=100
-CO:rodirs=10
454_SETTINGS -AL:mrs=95:egp=yes:egpl=reject_codongaps:megpp=100
-CO:rodirs=10
Hope that helps.
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: