[mira_talk] Singlet output from MIRA

  • From: "Kvist, Sebastian" <skvist@xxxxxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Fri, 25 Jan 2013 16:42:26 -0500

Dear Bastien,

Quick question (I hope) about the assembly process. I want to run four 
iterations of MIRA3 on my 454 data and when I do so, I'm encountering some 
issues:

The data set is cDNA and I used the following command to call MIRA: mira 
--project=HT8UG4B01 --job=denovo,est,accurate,454 &

I started of with about 120,000 frags (trimmed and quality controlled), the 
first pass with MIRA gives me ~12,000 contigs and the "info_assembly.txt" file 
tells me that NO singlets remained after assembly. After the second pass, I am 
left with ~600 contigs and, again, the info file tells me no singlets are 
present. After the third and fourth passes, I'm left with 16 contigs (again, no 
singlets). The N50 increases for each pass, which I'm happy about but when I 
look at the 16 contigs there must be a lot of data that has been discarded 
somewhere. At least I hope that we haven't effectively sequenced only 16 highly 
expressed genes.

I was wondering if the singlets are contained within another file and that the 
info file is telling me the wrong thing? I checked the "info_debrislist_pass.1 
[…] 5" and it seems as though the singlet names are in there but not the actual 
sequences. I could "grep -f" from the original file using the debris file as a 
guide but I was wondering if there's an easier way to get to the singlet 
sequences?

Also, I was wondering if you have any opinion on performing several iterations 
of MIRA on the same data, without loosing any potential singlets (because the 
"singlets" in the second pass are effectively contigs from the first pass) and 
if you have any ideas on how to do it most effectively?

Sorry about the length of the e-mail for such a trivial question but I thought 
that it might be best to give the full picture.

Thanks a lot.
Best,
Sebastian Kvist


Other related posts: