On Apr 23, 2012, at 14:06 , Adam Witney wrote: > I am having trouble assembling a bacterial sequence without running out of > disk space on my non-NFS drive (it fills up the 48Gb of available space). Hi Adam, you'll hate me for this, but I simply have to make that joke: "get more disk space?" But read on for more practical help ;-) > I have 2934687 reads using the 200bp IonTorrent kit, for a ~3Mb bacterial > genome. I have cut the Hash statistics out of the log_assembly (this file is > about 450Mb) The first 5k lines of the log_assembly would have been nice. > and put them here: > I think it is probably related to the error profile of the reads, the quality > scores drop off quite quickly along the read. I have put the fastqc report > here: First things first: which hash statistics is that? The first, before the first clipping or the one before the second clipping or the one after that? In any way, I don't like that hash statistics file at all. Either some (unknown?) adaptors are still present or (maybe) homopolymer sequencing artifacts. I've never worked with FASTQC, but http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M3 lets me think that everyone of your sequences starts with "tcag", and if I am not mistaken that's the last part of the adaptor. Did you make sure you used "sff_extract" either with the "-c" option to clip sequences or (preferred) made sure that MIRA has been reading the accompanying XML file? Because if not, this would partly explain the behaviour you are seeing. If you are sure MIRA got the clips from XML (or clipped reads from the start), on to possible solutions for you: 1) use MIRA 3.9.0. I also had problems with large hash statistics files and rewrote the code. The hash statistics there uses more memory, is a tad slower, but substantially slashes the amount of needed disk space. In case you do not want to use 3.9.0 for the assembly, you can still use it for preprocessing only and use the then clipped reads in 3.4.x ... maybe that would be enough 2) as last resort only: perform yourself a clip at 200bp ... at least that is what I would try when seeing http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M3 Maybe even trim somewhere between 150 and 200bp, this is what http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M10 tells me. Hope this helps and please do tell how it works out for you (or not). Best, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html