[mira_talk] mixing real and simulated 454 data

  • From: Ariel Amadio <arielfamadio@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 17 Apr 2012 14:21:39 -0300

Hi all
I'm trying to mix real 454 data from a bacterial genome with simulated
paired-end reads (also 454).
I've used metasim, since the idea is to mix several genomes in one 454 run
without tags.
Looking into the list archives I found a few suggestions, and I'm running
into a few problems.
I have 3 files, a fastq and xml for real 454 data, and a fasta file for the
simulated reads.
The command looks like:
mira --project=PROJECT --job=genome,denovo,accurate,sanger,454
--noqualities -GE:not=4 SANGER_SETTINGS -AS:bdq=30 >&log_assembly.txt

I can only enter the fixed value of 30 for one technology, so I'm treating
the simulated reads as Sanger.
First question:
Since the --noqualities is necessary for not to look for a qual file, is
MIRA using the real data quality value in the fastq file?
Second question:
I'm thinking that MIRA is not using the pairs information for simulated
reads, should I create the XML TRACEINFO file to take this info into
account?

A few more details:
The reads were simulated from a chimera with contigs from an assembly with
real data.
So, what I want to try is how many paired reads I need to obtain a "good"
assembly (comparing with the chimera).
If someone has recommendations about how to do this simulation, it would be
very appreciated.

Many thanks in advance
Ariel

Other related posts: