[mira_talk] Re: de novo plant genome

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 21 Dec 2009 12:45:16 +0100

On Mittwoch 16 Dezember 2009 Tom wrote:
> [...]
> With the first pass done I'd like to improve/extend the contigs with
> 140,000 cowpea ESTs from HarvEST and with with 51,000 BAC end sequences
> (BES) from the Legume Information System.

'Improving' is a pretty vague term :-) Please remember that MIRA is an 
assembler and not a clustering program, so as soon as it detects enough valid 
information for allellic SNPs, it'll create two or more contigs. Rightly so, 
as these are truly different mRNAs in the cell. The downside: throwing 
together ESTs from different sources is bound to create different contigs as 
soos as a SNP is detected.

You can mitigate these problems a bit by trying tio use -CO:mr=no and then 
using very stringent alignment values (90% and upwards in -AS:mrs), but my 
personal experience with this is mixed. Sometimes it works, sometimes not.

A better way would be to use strain information.

> 1) Throwing the GSRs and ESTs into one big file, then run MIRA as
> "genome,denovo".

That's what I'd try first. Remember to use strain information to allow MIRA to 
perhaps throw together sequences from different strains that have only a low 
number of differences.

> 2) Two steps: (a) contig the GSRS, (b) the map the ESTs on using the
> unpadded.fasta from (a) as the backbone.

Also possible, but 1) would be better if it works.

> My third question is basically what to do about repeats in the BES. When
> I tried throwing the GSR contigs into a big fasta file with the BES,
> MIRA complained about 1 megahub. I'm still adjusting nrr to see if I can
> clear that up. 

Remember that everything masked as nasty will not contribute to finding 
alignments. As you only have 275k sequences, you might want to try just 
ignoring them (-SK:mmhr=1 or similar).

Regards,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: