Hello Fleur my two cents: One option would be to map your reads to the reference consensus (created by MIRA) using a program like maq/bowtie (one strain at a time). That should give you the strain specific SNPs but also (with some parsing of the output) you could get the strain-specific weight for the haplotype (allele frequency). You can also see if the SNP is polymorphic or fixed within the strain. Then you can make a table (e.g. with strains in columns and position rows) to see and sort which sites of the genome are most differentiated... Personally, I would discard reads that would be 'striking' in more than one place (i.e. repeats, domains etc). The foremost advantage would be speed... a On Tue, 2010-03-02 at 21:17 +0100, Fleur Darré wrote: > > > Bastien Chevreux a écrit : > > On Montag 01 März 2010 Fleur Darré wrote: > > > >> I've been moving around my problem without finding an obvious solution. > >> I hope you'll be able to help me for this and thank you for this in > >> advance. > >> > > > > Hi Fleur, > > > > so do I :-) > > > > > >> I'm currently assembling solexa reads, mapping them against a given > >> virome. I hope to see some variability even INSIDE my sample and would > >> like to assess it. > >> My overall coverage is around 300, so, even if I have 10 to 20 > >> haplo-virome (which I expect), it should be ok. (In another assembly, I > >> have a 25x coverage for a single expected strain) > >> Now, the (directed) assembly goes well, I end up with a single contig. > >> > > > > Ummm, there you lost me. Directed assembly? Do you mean you produced an > > assembly with MIRA which you are importing via gap4 directed assembly? > > > No, sorry, I miexplained myself : the "directed" stands here for mapping. > At this stage, no staden! > > > >> Some of the .exp files do correspond to several reads together (how > >> many? how deeply?). > >> > > > > I suppose you mean coverage equivalent reads (CER). > > > ok, this is it. > > > >> When I edit my output in Gap4 (staden package), > >> these long .exp files's quality are all set to 1, which provides unfair > >> excessive weight to the reads that were kept "alone". Even if I used the > >> Base Frequency as consensus algorithm (avoiding the weigth problem), > >> each isolated read has as an as heavy weight as a "long .exp"... which > >> is a strong bias when I want to assess the frequency of a given > >> variant/allele in my sample (for this purpose, I've been using different > >> consensus sequence out of gap4, with different cons threshold). > >> Am I missing some otpion? some step? > >> > > > > Yes. And no. You're missing the option to convert the gap4 database back to > > a > > CAF file and then let MIRA redo the consensus calculation. As described in > > the > > MIRA manual, gap4 does not know anthing about 454 and Solexa data, so this > > is > > the only viable way to get things going after editing a project in gap4. > > > ok, fine, this is a first point. > BUT, given that I may have as many as 20 (or more) different viromes > (from the same virus, still) in my sample, I am interested in seeing > discrepancies even if only 5% of the reads harbor it. Is MIRA able to > provide this? Would it provide an "allele frequency" estimate, then? > > Also have a look at using the SOLEXA_SETTINGS -CO:msr=no to prevent > > creation > > of CERs. Beware, data volumes will explode. > > > ok, I may choose this option if MIRA is not suitable for my specific task. > Data volume should not be problematic.... Finger crossed... > > Thanks, > > Fleur > > > Regards, > > Bastien > > > > > >