[mira_talk] Problem with convert_project

  • From: Steven Sijmons <steven.sijmons@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 26 Oct 2012 21:14:39 +0200

Dear Bastien et al.,


First of all a massive thanks for all your efforts concerning MIRA. It is
really awesome to see a free sequence assembly suite, that is curated very
well. I'm using MIRA for a while now, but encountered some strange behavior
today that I need some advice on.

I am using MIRA to assemble my Human cytomegalovirus genomes. My viral DNA
is extracted from cell culture, so it contains some contaminating cellular
DNA. At the moment, I am using MIRA to map all my 454 and Illumina reads
onto the final consensus sequence I derived from these datasets through de
novo and contig mapping analysis. This to get an estimate of the purity of
my samples and compare this to estimations I made before sequencing via
quantitative PCR. For 3 of 14 samples I got a purity which was much lower
than predicted (much less reads mapping than expected).

Following this, I wanted to do a de novo assembly with these reads that
ended up in the debris file to check what is making up this contamination
(I expect it to be mostly human DNA). So far so good. I extracted these
reads from the original fastq file through convert_project by making use of
the debris_list. However, when I did the de novo assembly, I got 116
contigs which I mapped with nucmer against the same consensus sequence I
did the initial mapping on. To my surprise, 114 of these mapped perfectly
to this sequence. For another sample, the same happened: 3707 of 19049
contigs from debris mapped perfectly to the reference.

When I then compared the number of reads in the fastq file I created with
convert_project, I noticed that these contained more reads than the number
of lines present in the debris_list. Debris_lists contained 232 628, 2 288
887 and 42 826 127 reads, while the convert_project outputs contained 437
106, 2 392 084 and 43 632 082 reads respectively. Could these 'extra' reads
be causing this? And where are these reads coming from if they aren't in
the debris_list? Hoping that someone can give some possible explanations
for this, because I really don't understand what's happening here.


Kind regards,
Steven

Other related posts: