[mira_talk] Re: big file in the log diretory

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 28 Mar 2011 21:28:38 +0200

On Monday 28 March 2011 19:50:24 Stephanie Pearl wrote:
> [...]
> 1. Is Gap4 the recommended viewer for viewing the assembly and its tags?

I like gap4, so if you do not have millions of contigs/reads, it's still works 
OK.

Although James has now a pretty stable beta version of gap5 out. It's 
definitively the way to go for large projects. And integration with MIRA is as 
easy as never before with gap5:

To import: "tg_index -C input.caf"
To export: use the CAF export function of gap5

> 2. On p. 74 of the Definitive Guide to Mira handbook, you mention something
> about needing to invert single contigs by hand. Under what circumstances
> would I need to invert contigs by hand and how would I know that they
> should be inverted?

Hmm, a section might have helped for me to find it quickly in the HTML ... PDF 
is a bulky beast and can you believe it ... I had to download the version from 
the net (I didn't want to check out an old document version extra for that).

Anyway ... p.74 reads for me:

------
4.6.2 Reverse GenBank features are in forward direction in a gap4 project

caf2gap has currently (as of version 2.0.2) a bug that turns around all 
features in reverse direction during the conversion from
CAF to a gap4 project. There is a fix available, please contact me for further 
information (until I find time to describe it here).
------

This is only of interest if you input data contained sequences having 
annotations in GenBank format (either as backbone reference sequence or as own 
sequence to be assembled, be it in GBF/GBK files or CAF/MAF files whhich were 
created with such annotations).

Do you have that?

> 3. On p. 96 of the Definitive Guide under the "Where are the SNPs?"
> section, you indicate that you don't recommend assembling sequences of
> more than one strain to ID SNPs.

You are in a section regarding mapping assembly.

Chapter 6  Solexa sequence assembly with MIRA3
6.4 Mapping assemblies
6.4.6 Places of interest in a mapping assembly
6.4.6.1 Where are SNPs?

There I indeed recommend to map your strains one by one to reference sequences 
to get the best results.

> If I'm interpreting this correctly and
> this is the case, then under which circumstances should I use
> MiraSearchESTSNPs? (I have Sanger contigs (plus the individual reads that
> comprise these, but no quality scores), then 2 sets of 454 reads of 2 more
> closely related species (both have quality scores).

miraSearchESTSNPs is not a mapping tool, but a de-novo assembly tool. You have 
to decide what you want to do:

- map the 45 against Sanger contigs? Use "mira --job=mapping,est,..." (for 
each strain data set separately, after that perhaps together to see whether 
the result suits you). If you give strain info information to those mapping 
assemblies, MIRA will happily point you toward SNPs which are more or less 
good.

- assemble all data de-novo, separating contigs with no SNP from contigs with 
SNPs and have MIRA point at differences? Use "miraSearchESTSNPs" with all data 
sets at once (and use strain info there as well, for all reads!)


Hope that helps,
  B.

Other related posts: