[mira_talk] Re: Sanger and 454 ESTs assembly with mapping
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Sat, 10 Jan 2009 17:58:17 +0100
(Note: the original mail apparently did not make it to the list)
> Date: Wed, 07 Jan 2009 13:47:45 +0100
> From: Emmanuelle Morin <emmanuelle.morin@xxxxxxxxxxxxx>
> Subject: Sanger and 454 ESTs assembly with mapping
> Hello,
> I'm actually trying to assemble ESTs from 2 sequencing technologies :
> Sanger and 454, I've already assembled them de novo but 43% of the
> contigs didn't matched any predicted genes.
I am actually not really surprised by this. The ENCODE paper in Nature from
some 18 months ago
(http://www.nature.com/nature/journal/v447/n7146/full/nature05874.html)
stated, I cite, that "The human genome is pervasively transcribed, [...]".
I think they also gave a number (60%? 80% ?) somewhere, but I don't recall it
on top of my head. Perhaps it's in the special issue of Genome Research
(http://genome.cshlp.org/content/17/6.toc) which contains some results from
the pilot study.
> So now, I want to assemble them with the genome sequence as a backbone.
> Still there are so many arguments you can play with, I'm a little bit
> confused.
>
> Here is my command line :
> mira -fasta -project=tmel_hyb -job=mapping,est,normal,sanger,454
Oooops, I never though of combining "mapping" and "est". It could be that this
leads to some unexpected results, I need to think about it.
But the problem when going that way (map ESTs directly to a eukarytic
backbone): it will not work, at least not with MIRA. The reason is that this
would need to account for post-transcriptional modification of the mRNA
(splicing etc.), and this is not really what a sequence assembler can do.
I'd continue with the results of the de-novo assembly and then use some
specialised software that can map sequences on eukaryotic genomes.
> We know that the genome we are working with contains a lot of
> transposable elements, should it be relevant to add the
> -highlyrepetitive option or to use the masked genome sequence ?
-highlyrepetitive will probably help for the de-novo assembly of ESTs, but far
less for a mapping assembly where one already has a pretty good guide).
> In the case, would it be better to use miraEST ?
No, at least not if the ESTs you have are just from one strain respectively
organism. The normal mira (using --job=...est...) does a pretty good job
there.
Hope that helps,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts:
- » [mira_talk] Re: Sanger and 454 ESTs assembly with mapping - Bastien Chevreux