[mira_talk] Re: Backbone assembly

  • From: John Nash <john.he.nash@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 15 Apr 2011 09:13:02 -0400

On 2011-04-15, at 8:59 AM, Andrei Tudor wrote:

> Hello,
> I have just finished a backbone assembly. It is the first time I have done 
> such an assembly, and i am wondering what do I do with the resulting files.
> I saw that instead of a multifasta file with contigs, MIRA made 1 hole 
> chromosome from the reads. Does this mean that I do not have to creade a 
> pseudomolecule?
> If not what should I do next?

What I usually do next is (using gap5)

1. Use tg_index to convert the CAF file to a gap5 database

2. "Top and tail" the assembly, i.e. make sure that you have true circularity 
if your genome is a circular one.  Then using gap5, trim the start and end of 
the genome to make sure the coordinates match up as a circular genome.

3. Next I use the assembly view in gap5 (or use Tablet), to look for:
        a. holes - where there is no coverage of the corresponding region in 
the scaffold
        b. Areas of very low coverage - indicating possible misassembly, using 
the data and Mira's tags to look for flanking regions which can be closed by 
fresh PCR
        c. Regions of extremely high coverage - indicating repeats. I usually 
PCR the regions from HIGH coverage (usually in a repeat) to normal or low 
coverage (indicating the flanking non-repeated sequence), to make sure that 
scaffold-bias has not influenced the assembly.
        d. I pay special attention to what I call "cliffs" - regions of very 
high coverage next to regions of very low coverage.

4. I browse through Mira's tags using gap5 to look for areas that Mira wants me 
to check. The manual has good coverage of how to do that.

5. Then I scan the sequence to proofread it using gap5. I don't care about pads 
but I use gap5's "find next" search parameter (using the consensus quality 
selection) to scan and fix obvious miscalls - where there is obviously NO pad 
but mira has put a base there - there are not many of those.

Then I am done.

The ORF-calling software should find bases that could be indels causing 
frameshifts - let that software remove that worry!


Other related posts: