[mira_talk] Re: Sanger and 454 assembly de novo

On Tuesday 13 January 2009 09:52, Emmanuelle Morin wrote:
> my command is : mira -fasta -project=hyb
> -job=denovo,est,normal,sanger,454 -highlyrepetitive -GE:not=4

Looks good to me.

> I have quality files for both methods

Perfect.

> If I compare the lrc sequence and the c sequence, I see a big difference
> at the start. The lrc sequence starts with a lot of M,W,Ks .... but
> still I found some in c sequences.

ESTs and hybrid assembly. Hmmm. That means we'll have a lot of low coverage 
parts (end of ESTs) and when there are conflicting bases which cannot be 
resolved further, MIRA chooses to tell you that by using IUPAC codes.

First, make sure that the contigs with IUPAC codes are completely resolved 
(i.e., that they do not contain any SRMc tags, see the info file for 
consensus tags in the info directory).

All contigs that contain SRMc still contain assembly errors. You'd need to to 
rerun the assembly with more passes (-AS:nop) (sorry).

For those contigs with no SRMc tag but still IUPAC codes, you can choose:
- use -CO:fnicpst to reduce non-IUPAC (I do not recommend that)
- or go through the contigs with IUPAC by hand to see what you can resolve.
- filter out all contigs with IUPAC codes.

I know, it's not perfect, but EST data are tricky.

Regards,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: