[mira_talk] assembling variants

  • From: Fleur Darré <fleur.darre@xxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 01 Mar 2010 22:01:54 +0100


Hi,

I've been moving around my problem without finding an obvious solution. I hope you'll be able to help me for this and thank you for this in advance.

I'm currently assembling solexa reads, mapping them against a given virome. I hope to see some variability even INSIDE my sample and would like to assess it. My overall coverage is around 300, so, even if I have 10 to 20 haplo-virome (which I expect), it should be ok. (In another assembly, I have a 25x coverage for a single expected strain) Now, the (directed) assembly goes well, I end up with a single contig. Some of the .exp files do correspond to several reads together (how many? how deeply?). When I edit my output in Gap4 (staden package), these long .exp files's quality are all set to 1, which provides unfair excessive weight to the reads that were kept "alone". Even if I used the Base Frequency as consensus algorithm (avoiding the weigth problem), each isolated read has as an as heavy weight as a "long .exp"... which is a strong bias when I want to assess the frequency of a given variant/allele in my sample (for this purpose, I've been using different consensus sequence out of gap4, with different cons threshold).
Am I missing some otpion? some step?
Is there a way to get the SNPs and there frequency (among reads) without prior knowledge on strains?

Thanks again,

Fleur Darré


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: