[mira_talk] Re: assembling variants

  • From: Fleur Darré <fleur.darre@xxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 02 Mar 2010 21:17:53 +0100




Bastien Chevreux a écrit :
On Montag 01 März 2010 Fleur Darré wrote:
I've been moving around my problem without finding an obvious solution.
I hope you'll be able to help me for this and thank you for this in
 advance.

Hi Fleur,

so do I :-)

I'm currently assembling solexa reads, mapping them against a given
virome. I hope to see some variability even INSIDE my sample and would
like to assess it.
My overall coverage is around 300, so, even if I have 10 to 20
haplo-virome (which I expect), it should be ok. (In another assembly, I
have a 25x coverage for a single expected strain)
Now, the (directed) assembly goes well, I end up with a single contig.

Ummm, there you lost me. Directed assembly? Do you mean you produced an assembly with MIRA which you are importing via gap4 directed assembly?
No, sorry, I miexplained myself : the "directed" stands here for mapping.
At this stage, no staden!
Some of the .exp files do correspond to several reads together (how
many? how deeply?).

I suppose you mean coverage equivalent reads (CER).
ok, this is it.
When I edit my output in Gap4 (staden package),
these long .exp files's quality are all set to 1, which provides unfair
excessive weight to the reads that were kept "alone". Even if I used the
Base Frequency as consensus algorithm (avoiding the weigth problem),
each isolated read has as an as heavy weight as a "long .exp"... which
is a strong bias when I want to assess the frequency of a given
variant/allele in my sample (for this purpose, I've been using different
consensus sequence out of gap4, with different cons threshold).
Am I missing some otpion? some step?

Yes. And no. You're missing the option to convert the gap4 database back to a CAF file and then let MIRA redo the consensus calculation. As described in the MIRA manual, gap4 does not know anthing about 454 and Solexa data, so this is the only viable way to get things going after editing a project in gap4.
ok, fine, this is a first point.
BUT, given that I may have as many as 20 (or more) different viromes (from the same virus, still) in my sample, I am interested in seeing discrepancies even if only 5% of the reads harbor it. Is MIRA able to provide this? Would it provide an "allele frequency" estimate, then?
Also have a look at using the SOLEXA_SETTINGS -CO:msr=no to prevent creation of CERs. Beware, data volumes will explode.
ok, I may choose this option if MIRA is not suitable for my specific task.
Data volume should not be problematic.... Finger crossed...

Thanks,

Fleur

Regards,
  Bastien



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: