[mira_talk] Re: Sanger and 454 assembly de novo
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 14 Jan 2009 21:35:21 +0100
On Tuesday 13 January 2009 09:52, Emmanuelle Morin wrote:
> my command is : mira -fasta -project=hyb
> -job=denovo,est,normal,sanger,454 -highlyrepetitive -GE:not=4
Looks good to me.
> I have quality files for both methods
Perfect.
> If I compare the lrc sequence and the c sequence, I see a big difference
> at the start. The lrc sequence starts with a lot of M,W,Ks .... but
> still I found some in c sequences.
ESTs and hybrid assembly. Hmmm. That means we'll have a lot of low coverage
parts (end of ESTs) and when there are conflicting bases which cannot be
resolved further, MIRA chooses to tell you that by using IUPAC codes.
First, make sure that the contigs with IUPAC codes are completely resolved
(i.e., that they do not contain any SRMc tags, see the info file for
consensus tags in the info directory).
All contigs that contain SRMc still contain assembly errors. You'd need to to
rerun the assembly with more passes (-AS:nop) (sorry).
For those contigs with no SRMc tag but still IUPAC codes, you can choose:
- use -CO:fnicpst to reduce non-IUPAC (I do not recommend that)
- or go through the contigs with IUPAC by hand to see what you can resolve.
- filter out all contigs with IUPAC codes.
I know, it's not perfect, but EST data are tricky.
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: