[mira_talk] Re: quantity vs quality of SNP predictions

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 15 Apr 2009 00:11:56 +0200

On Monday 06 April 2009 Jorge.DUARTE@xxxxxxxxxxxx wrote:
> Following Bastien's advises i've seen in this list previously,
> i have used very strict parameters to assemble 454 data in order to detect
> potential SNPs with a good confidence.
> Problem is, using these parameters, the number of reads falling into
> debrislist reaches 60%,
> and out of the 40% remaining reads used during assembly only 25% en up in
> contigs with good enough coverage for SNP detection.

Hi Jorge,

sorry for the long delay and the shortness of this reply. I've been on a short 
vacation and have now a pretty tight professional work schedule until the 
week-end.

The numbers you mention above seem way to high. Could you tell which 
parameters you used?

> Does someone else have similar results ? Or is it that my reads are really
> of low quality ?

I gave up on finding good (read: better than the ones from 454) quality 
clipping parameters for 454 reads. In genome assemblies, I attain a better 
clipping using a few other tricks, but these are potentially harmful in EST 
assemblies.

> Is it really worth it to loose 90% of reads in order to gain in confidence
> of SNPs discovered ?
> Did someone tried using different parameter settings in order to evaluate
> sensitivity vs specificity of SNP detection with mira ?

No, I'd say be less stringent in the quality inclusion and if there are too 
many false positives, more stringent in the SNP calling parameters (-CO:mr and 
the parameters below). I didn't work with 454 EST for quite a long time and 
never really went into the parameter optimising round for these beasts.

> The species i'm working with is a polyploid eukaryote, and the sequences
> are PCR amplicons which were ligated before sonication
> and 454 titanium sequencing on 4 different cultivars.
>
> I've developped a script in order to detect and split potential chimeras,
> and although it worked quite well,
> i'm pretty sure i didn't detect all chimeric sequences. So if someone
> knows of a tool which does this kind of clipping,
> i'd also like to hear from him !!!

The next version of MIRA has automatic chimera downsizing. It really finds all 
potential chimeras. It's not doing splitting, but downsizing may also help. 
Please give me a few days, I hope to be able to prepare ... "something" this 
weekend. If not, then next week,

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: