On Thursday 13 January 2011 15:08:16 Sandrine Moreira wrote: > From Jonathan: > My Name is Jonathan and I am an Scientific computing analyst at RQCHP. My > role is to help scientist to do science using hpc platforms in Canada. So > I will be responsible of helping Sandrine with her project. I can provide > help to implement either parallelization or checkpoint restart, but in the > present situation, it makes more sense to go with checkpoint restart. > > In order to help me catchup on that, could you resume where your CR > implementation stand, and what still need to be done? What kind of things > are missing, and what can I do to help? Hello Jonathan, MIRA talk is subscription only to keep spam out as much as possible. But if you do not want to subscribe, we can also take the discussion offline if you want. MIRA works effectively in several passes, the check pointing being foreseen to be made after each pass. Checkpoint restart is in its infancy and stuck at a very crucial point. What has been done so far is: - saving the parameters how MIRA was started (simply by logging the parameters on the command line) - saving the current reads including all changes which they had to endure until that point. What needs to be done: - writing some structure dump and load functions to also save some info which MIRA amasses during the run but are not directly related to the reads themselves - changing the main assembly function to allow loading/saving the above data and jump to the right point in the assembly - or hooking the restart into the main mira program, making it load the restart data and jumping to the above mentioned assembly function. Especially the later two are a bit tricky as the function in question has grown over the past 10 years and has become somewhat unreadable. It was scheduled for cleanup and rewrite for quite some time, but I never came around doing this. I'm not quite sure what the best way would be, if I had I'd be further in the implementation. If you want to have a look: src/progs/mira.C contains the main caller function where I suppose the loading should be done and src/mira/assembly.C the assemble() function which should be changed to know about restart. You might see some comments and start of stubs regarding this, but they are not functional yet (apart saving reads as checkpoint data). Best, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html