On Donnerstag 10 Dezember 2009 Filip Van Nieuwerburgh wrote: > Thanks for your encouraging insights ;-( Sorry, I've acquired the reputation to have changed from "diplomatic" to "a bit direct" :-) But you are aware that the Human Genome Project employed perhaps hundreds of people and had 35 million reads (Sanger, a bit longer but still) to assemble the human genome, right? Celera also had around 35 million reads, less people, but a huge server farm. And they made almost nightly downloads of the public HGP data to perform comparisons and reconcilations (some called this 'cheating', but hey, the data *was* public after all). Coming to work load: last I read is that the Beijing Center has 7 bioinformaticians per Illumina GA to get the analysis work done ... and they have at least 30(!) of these babies. Source: http://www.genomeweb.com/informatics/bioinformatics-job-market-tug-war-heavy- demand-data-analysis-vs-tightening-budge So, excuse me if I'm being blunt, but ... I wouldn't do this assembly only by myself, and certainly not in a month or two :-) > I am curious: If MIRA could > run on multiple processors (I also have access to a 128-core system), > would it be able to manage this project? Yes and no. Let me start with answering this question: > I then of course have a second > question: Are there any concrete plans to develop MIRA so that it can be > run on multiple processors? MIRA already is in part using multiple processors (in the SKIM part, the first all-vs-all comparison). I've had plans to implement multi-threading also in the Smith-Waterman part for quite some time now, but never really came around it due to lack of time (if anyone's willing to implement, I'll coach :-) But that's not the biggest problem. It's afterwards, during contig pathfinding in the overlap graph and contig building. This can not be easily parallelised except by taking repeats out of the assembly process. And keeping good track of repeats is actually what makes MIRA pretty competitive against other assemblers, so it's at the moment a no-go for me. Last but not least: memory. MIRA keeps tons of stuff as info in memory to get things assembled right, but this has been killing me ever since Solexas came on the market. I still have ideas on how to bring down memory requirements further, but this takes time to implement. At the moment, you'd need at least ~250 GB RAM to even think of running MIRA with 100m reads. Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html