[mira_talk] Re: hybrid assembly crashes - file not found

Bastien Chevreux wrote:
On Friday 06 March 2009 Andreas Petzold wrote:
I try to build a hybrid assembly with 454 and Sanger sequences and mira
(2.9.39) crashes before the alignment phase:
[...]
The missing Nf_genomic_int_posmatchc_preassembly.0.lst contains all
complement skim hits, right ? The funny thing is that the file for the
"forward" hits, Nf_genomic_int_posmatchf_preassembly.0.lst, exists ...

Hi Andreas,

interesting error, one from the category "should not happen." Indeed, these files contain the hits from skim ... and both are created empty, one after the other, at the same time. That is: they always (should) exist, even if empty.

Which makes me wonder what could cause the symptoms you are seeing.

I think I got the problem at least a little bit cornered because I tried to
assemble 454 and sanger sequence separately and 454 worked but not Sanger.

The Sanger sequences are in EXP format and have trace data, could that be
the problem ??? Will check this ...

Care to explain a bit more in detail what you tried? EXP being the first data format that was implemented in MIRA (even before FASTA), it should work. Except if the EXP files come from an existing assembly and contain edits, I never implemented loading this type as CAF was a far better alternative.

Regards,
  Bastien



Hi,

this problem seems to be Sanger read related ....

At the FLI (former IMB) we still have this old pipeline called Converge that processes all our raw Sanger reads (base calling, vector clipping, quality controll ..). Unfortunately it stores all reads in separate experiment files. Thats really inconvenient for large projects - in my project there are about 200000 Sanger reads in separate files :-). But all these experiment files do not come from an existing assembly, they contain just the information about sequence, quality, quality clipping, vector clipping, trace data ... I keep all Sanger reads in a separate directory but that should be no problem (my fofn contains the correct path) ?


The next thing I tried was to assemble different numbers of these Sanger reads 
and sadly mira crashes when using more than 835 reads:

Aligning possible forward matches:

Fatal Error: "Nf_genomic_d_log/Nf_genomic_int_posmatchf_pass.6.lst.reduced"
: File not found. This should have been written earlier by MIRA!
->Thrown: void Assembly::makeAlignmentsFromPosMatchFile(const string & filename)
->Caught: void Assembly::makeAlignments()
Program aborted.
Program aborted.
CWD: /misc/vulpix/data/andpet/Nothobranchius/genomic
CWD: /misc/vulpix/data/andpet/Nothobranchius/genomic
Abort


This time the funny thing is that the file 
Nf_genomic_int_posmatchf_preassembly2.0.lst.reduced IS in the directory. This 
is strange but
one thought would be that that mira already tries to read the file when the 
last write operation has not finished yet. Could that be a problem
of a thread implementation ? I do not know much about threads but when I fork 
some parallel processes I always have to synchronize them (wait, wait_pid).

Regards,

Andreas
--

Andreas Petzold
Genome Analysis
Fritz Lipmann Institute
Beutenbergstrasse 11, D-07745 Jena
voice : ++49-3641-656038
fax   : ++49-3641-656038
email : andpet@xxxxxxxxxxxxxx

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: