[mira_talk] Re: Sanger and 454 ESTs assembly with mapping

(Note: the original mail apparently did not make it to the list)

> Date: Wed, 07 Jan 2009 13:47:45 +0100
> From: Emmanuelle Morin <emmanuelle.morin@xxxxxxxxxxxxx>
> Subject: Sanger and 454 ESTs assembly with mapping
> Hello,
> I'm actually trying to assemble ESTs from 2 sequencing technologies :
> Sanger and 454, I've already assembled them de novo but 43% of the
> contigs didn't matched any predicted genes.

I am actually not really surprised by this. The ENCODE paper in Nature from 
some 18 months ago 
(http://www.nature.com/nature/journal/v447/n7146/full/nature05874.html) 
stated, I cite, that "The human genome is pervasively transcribed, [...]".

I think they also gave a number (60%? 80% ?) somewhere, but I don't recall it 
on top of my head. Perhaps it's in the special issue of Genome Research 
(http://genome.cshlp.org/content/17/6.toc) which contains some results from 
the pilot study.

> So now, I want to assemble them with the genome sequence as a backbone.
> Still there are so many arguments you can play with, I'm a little bit
> confused.
>
> Here is my command line :
> mira -fasta -project=tmel_hyb -job=mapping,est,normal,sanger,454

Oooops, I never though of combining "mapping" and "est". It could be that this 
leads to some unexpected results, I need to think about it.

But the problem when going that way (map ESTs directly to a eukarytic 
backbone): it will not work, at least not with MIRA. The reason is that this 
would need to account for post-transcriptional modification of the mRNA 
(splicing etc.), and this is not really what a sequence assembler can do.

I'd continue with the results of the de-novo assembly and then use some 
specialised software that can map sequences on eukaryotic genomes.

> We know that the genome we are working with contains a lot of
> transposable elements, should it be relevant to add the
> -highlyrepetitive option or to use the masked genome sequence ?

-highlyrepetitive will probably help for the de-novo assembly of ESTs, but far 
less for a mapping assembly where one already has a pretty good guide).

> In the case, would it be better to use miraEST ?

No, at least not if the ESTs you have are just from one strain respectively 
organism. The normal mira (using --job=...est...) does a pretty good job 
there.

Hope that helps,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: