[mira_talk] Re: big file in the log diretory

From: Stephanie Pearl <pearlsa110@xxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Mon, 28 Mar 2011 13:50:24 -0400

Hi again,

It looks like MiraSearchESTSNPs can be useful to me, but I have a few
questions for you about it:

1. Is Gap4 the recommended viewer for viewing the assembly and its tags?

2. On p. 74 of the Definitive Guide to Mira handbook, you mention something
about needing to invert single contigs by hand. Under what circumstances
would I need to invert contigs by hand and how would I know that they should
be inverted?

3. On p. 96 of the Definitive Guide under the "Where are the SNPs?" section,
you indicate that you don't recommend assembling sequences of more than one
strain to ID SNPs. If I'm interpreting this correctly and this is the case,
then under which circumstances should I use MiraSearchESTSNPs? (I have
Sanger contigs (plus the individual reads that comprise these, but no
quality scores), then 2 sets of 454 reads of 2 more closely related species
(both have quality scores).

Thanks!

Stephanie

On Wed, Mar 23, 2011 at 8:23 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

>  On Wednesday 23 March 2011 22:29:09 Stephanie Pearl wrote:
>
> > So, it also has come to my attention that my "strains" aren't closely
>
> > related enough to be assembled in the manner in which I am trying to --
>
> > they are actually closely related species, ~4000 years diverged. So I
>
> > guess the messy command line is now a moot point.
>
> There's worse. 4 M, 40 M and 400M years come to mind. In case you tell me 4
> billion years I'd start to really worry :-)
>
> > The goal for my project is to assemble 3 different closely related
> species
>
> > (1 of which has already been assembled by someone else -- this is the one
>
> > with the Sanger reads) for further analysis. I had thought that the mixed
>
> > assembly would use information from each set of ESTs and produce 3
>
> > differently assembled outputs for each set of reads, but perhaps that's
> not
>
> > the case?
>
> D'oh. Hit me, I'm dumb and can't read. I overlooked the "est" thing in your
> command line and went on thinking it to be a genome assembly. Oh well.
>
> In that case, using "-SB:lsd" is even more important, you should think of
> that in case you assemble all reads together.
>
> If you put all reads together in one assembly, MIRA will not make three
> separate contigs out of that, but mix together whenever possible. When it
> has strain data, it will then also mark SNP in the contigs (else it will
> build different contigs even if there's one nucleotide difference).
>
> Please have a look at
>
>
> http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect1_est_difference_assembly_clustering
>
> and
>
>
> http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect3_input:_two_strains_454_with_xml_ancillary_data_polya_already_removed
>
> and other sections in that chapter to get a feeling how this might affect
> your data.
>
> > Would you just recommend a de novo assembly for each of the three sets of
> reads?
>
> Depends on what you are looking for:
>
> - a maximum of clean transcripts? Then each data set on it's own with "mira
> --job=est".
>
> - a set of contigs, all strains mixed together, with SNPs marked? The all
> sets together with "mira --job=est ... -SB:lsd=yes" and straindata
>
> - a set of contigs (one for each strain) where there are differences known
> between the strains as well as a light clustering like assembly of contigs
> with SNPs? Use the miraSearchESTSNPs pipeline.
>
> B.
>
>

Follow-Ups:
- [mira_talk] Re: big file in the log diretory
  - From: Bastien Chevreux

References:
- [mira_talk] big file in the log diretory
  - From: Kishi
- [mira_talk] Re: big file in the log diretory
  - From: Bastien Chevreux
- [mira_talk] Re: big file in the log diretory
  - From: Stephanie Pearl
- [mira_talk] Re: big file in the log diretory
  - From: Bastien Chevreux

[mira_talk] Re: big file in the log diretory

Other related posts: