[mira_talk] Reference vs. De novo assembly.

  • From: Andrzej N <andrzej.k.n@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 2 Dec 2009 16:52:13 -0600

Dear Bastien,

I need some help... I did *de novo* assembly of several plant mitochondrial
genome sequences (454, Titanium, one end reads), about 200000 reads used for
assembly, this should give me about... 100x coverage). Yes, I know overkill,
but... MIRA created about 160  contings around 78 quality score (what is it
exactly?) (total number of contigs like 5,000 but including smaller ones
that don’t help much i.e., "junk"). These contigs don't go together to
create one big consensus contig.

I also did reference assembly, to an already finished and assembled
sequence. MIRA is covering all of this reference sequence with just only one
small break (so I get two huge contings about 200000bp each).

Now is the interesting part. When I take these contings from *de novo *
assembly* *and blast them against the ones generated based on reference
assembly, they cover the entire sequence very nicely... So, my question is
why MIRA is not creating larger contings during *de novo* assembly. These
contigs are next to each other and show a certain amount of sequence overlap
(I setup BLAST on my computer to blast the against each other) but MIRA is
not seeing this and combining them.

Technically I do have all possible data to create contigs about 200000bp but
MIRA gives me only max 60000bp :(.

What parameters in MIRA need to be changed to help build larger contings? My
adjustment to date have not helped do much more than your default settings
for "fast assembly".

My computer will run whole assembly in about 9 hours, so time is not an
issue. I can try anything.

Please feel free to ask any additional questions if I have not explained
things well.

Andrzej

Other related posts: