[mira_talk] Re: Reference assembly issues...

  • From: Shankar Manoharan <shankarmanostar@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 3 Apr 2012 19:46:06 +0530

*Thank you professor. :) Helped a LOT.

I've made MIRA de novo and mapping assemblies.

I get around 56 contigs which are more than 500 bp long and 1/3rd the
overall coverage from the de novo assembly.

The mapping assembly generated a single contig with a few regions where
only the template provides coverage.

My next plan of work is to recover the 40k odd reads which are in the
debris of the reference assembly, try to do a de novo assembly of these and
try to fit them into the de novo assembly.

I'd like your opinion on that professor. Plus, how can I extract debris
reads from the Sff file based on the headers that MIRA provides in the info
directory ? Do we have a script for that or should I write my own ? I'm a
rather lousy scripter :(

Many thanks in advance.

Shankar


*
*
*
*Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University*
*Ph. +919790167534*
*
*
*I strongly believe in doing my best and leaving the rest to God*
*
*



On Tue, Mar 20, 2012 at 1:07 AM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Mar 19, 2012, at 14:12 , Shankar Manoharan wrote:
>
> *     I made a reference assembly of my 454-bacterial data with a closely
> related strain as the backbone.*
>
>
>
> There is a slight semantical difference between reference (guided)
> assembly and a mapping assembly. MIRA does de novo & mapping assemblies,
> but not reference guided assemblies.
>
>
>    1. *When visualizing the reference assembly with Tablet, I see that
>    there are regions where there aren't really any reads spanning the region
>    except the template. How is this acceptable ?*
>
>
> Totally so, if the backbone (reference) genome contains sequence that is
> not present in the genome you sequenced. Or if parts of the reference
> genome are vastly different from the corresponding parts in your genome.
> Depending on the parameters you used, "vastly" can be anything down to 1
> SNP, though standard mapping parameters are far more lenient than that.
>
>
>    1. * It appears as though MIRA replaces the assembly with the template
>    sequence which may or may not be present in the sequenced genome.*
>
>
> Yes and no. When MIRA writes out the result, CAF, MAF and ACE files
> contain the complete alignment alignment and give you the full picture of
> what is present (and what not). The FASTA file indeed contains a kind of
> mixture of both sequences. Some people need that, others not. In case you
> gave MIRA information about the strain of the reference and the strain of
> the mapped reads (you did do that, right?), you see what is there and what
> not also a bit more detailed in FASTA format by running:
>
>   convert_project -f MAF -t FASTA miraresult_out.maf somename
>
> which will create several FASTA files where each strain gets its separate
> file.
>
>
>    1. * So how far can this assembly be trusted ?
>    *
>
>
> As far as you keep in mind that this is not comparable to a de-novo
> assembly. It really is a mapping assembly. That is: you basically tell the
> assembler to treat all reads as if they came from the same organism as the
> reference. Whether or not this is the truth, that's how the reads are
> treated.
>
>
>    1. *Secondly, wasn't the reference assembly feature of MIRA developed
>    to identify SNPs and other genomic changes in pre-sequenced genomes?*
>
>
> It was developed to find differences between a reference sequence and
> reads mapped to it.
>
>
>    1. * So, is it technically right to assemble based on closely related
>    organisms ?*
>
>
> As long as the organisms are closely related, yes. As you have 454 reads,
> even smaller inserts or deletions can be correctly resolved, though one
> might need to do a bit of manual correction here and there (but MIRA
> usually tells you where to look).
>
> As soon as your organism starts to differ quite a bit, like, e.g., genome
> reorganisations or stretches with a larger differences on the nucleotide
> level, mapping assemblies will give you an idea of where to turn your
> attention to. Which you should do then, really.
>
>
>    1. *Third, If I were to accept the reference assembly that MIRA has
>    putput, what kind of validation tests are essential before annotation?*
>
>
> The way you described it, there are some bigger differences between the
> reference and what you sequenced. You should try to resolve these. Always
> keep in mind what you want to do with that sequence: depending on what
> questions you want to answer, you may need more or less work.
>
> B.
>
>

Other related posts: