[mira_talk] Re: How does Mira determine quality scores?

  • From: Davide Sassera <davide.sassera@xxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 04 Aug 2009 09:16:48 +0200

Ah, I see!
This is to remind everybody and me in primis to give full details of the problems to avoid wasting Bastien's time

Thank you so much
D.



On Montag 03 August 2009 Davide Sassera wrote:
To answer your question on where I get this problem: it's in gap4, which
is my choice for checking the assembly
I attach a screenshot. you can also see in the screenshot that the
quality of the highlighted gap is 1 (same for all gaps). Due to high
coverage you cannot see all the reads but take my word, gaps are the
vast majority.
As ready I am to help solve this problem I have to let you know I'm a
newbie in bioinformatics, so maybe I'm just doing something wrong... I
just do not know what

Ah ... now I understand the problem. Nice trap. It's related to gap4, not MIRA.

In fact, gap4 does not know anything about different sequencing technologies. I think it is able to distinguish between different Sanger sequencing machines, but not totally different sequencing technologies. I suppose that this is something that James will implement in gap5.

There's an easy workaround though: once you finished a genome in gap4, convert the gap4 database back to CAF (with gap2caf) and then use "convert_project" (from the MIRA package) with the '-r' option to re-analyse / re-export to other formats.

E.g.:
  gap2caf -project DEMO_EDITED -version 3 >demo_edited3.caf
  convert_project -f caf -t caf -t fasta -r c demo_edited3.caf final_result

The '-r c' for convert_project is important: gap4 will not import / work / export the originally computed consensus from MIRA and compute something own. Which, when not knowing 454 and Solexa technologies, usually results in total mayhem.

'-r c' tells convert_project to discard any consensus present in the CAF and recompute an own one.

With that, you can concentrate on finishing the genome only at major problematic sites (misassemblies or sites where MIRA put a problem tag) and still then let MIRA recalculate again the correct consensus at the end of your task.

I'll make a short notice about that in the MIRA manual, it may be surprising for more people than just you.

Regards,
  Bastien




--
Davide Sassera
Sezione di Patologia Generale e Parassitologia
Dipartimento di Patologia Animale, Igiene e Sanita` Pubblica Veterinaria Facolta` di Veterinaria
Universita` degli Studi di Milano
Via Celoria 10, 20133, Milano, ITALY
Tel: +39 0250318094
Fax: +39 0250318095

Other related posts: