[mira_talk] Re: regd Mapping 454 titanium with Solexa paired reads

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 28 Mar 2011 21:07:03 +0200

On Monday 28 March 2011 19:44:45 Arun Rawat wrote:
> I assembled bacterial genome from 454 Titanium that resulted in around 500
> contigs overall.

That is a fairly high number of contigs. Way to high to my likings. What's the 
coverage of these?

> Now I am trying to use these sets of contigs to map paired reads to
> generate larger contigs. To test the results, I ran with default
> parameters: mira --project=mapSX_454 --job=mapping,genome,accurate,solexa
> -GE:not=16 -AS:nop=1 -SB:bft=caf >&log_assembly.txt
> 
> The results came pretty good with higher N50, lesser number of contigs
> (~300) etc as mentioned in info_assembly.txt

Ahmmm ... something's not right here. If 500 contigs came in, 500 must come 
out of a mapping assembly in the "All contigs" category: MIRA does not join 
contigs in mapping. Can you please do a count on your input CAF for the number 
of contigs with

  grep -c Is_contig mapSX_454_backbone_in.caf

and tell what that gives you?

> However when I try to access these contigs from *.padded.fasta, I get
> around 120 contigs that are lot shorter rather than the ones I expect as
> mentioned in the info_assembly.

Which number in the assembly are you refewrring to? There are quite a few ... 
:-)

> Can you please help me how/where can I
> access the (solexa + 454 mapped) contigs that are mentioned in the
> info_assembly file.

All data is in the *_out.* files. If you want to filter your results for 
values, use convert_project on either a MAF (recommended, fast) or a CAF file 
(less recommended, slow). Like this to get all contigs which are at least 
500bp long and have at least an average coverage of 10:

 convert_project -f maf -t caf -t fasta -x 500 -y 10 input.maf filteredproject

Hope that helps,
  Bastien

Other related posts: