Hi Bastien, Thanks for your email. > you'll hate me for this, but I simply have to make that joke: "get more disk > space?" But read on for more practical help ;-) Yes .. if only! :-) But sadly our big box has a small amount of local disk but uses large NFS disks for data >> I have 2934687 reads using the 200bp IonTorrent kit, for a ~3Mb bacterial >> genome. I have cut the Hash statistics out of the log_assembly (this file is >> about 450Mb) > > The first 5k lines of the log_assembly would have been nice. Ok I have put the first 5k there now... I have also added another from a second sequence run (of a different but similar strain) for comparison http://bugs.sgul.ac.uk/temp/hash_stats.txt http://bugs.sgul.ac.uk/temp/hash_stats2.txt > lets me think that everyone of your sequences starts with "tcag", and if I am > not mistaken that's the last part of the adaptor. Did you make sure you used > "sff_extract" either with the "-c" option to clip sequences or (preferred) > made sure that MIRA has been reading the accompanying XML file? Because if > not, this would partly explain the behaviour you are seeing. my general approach is this sff_extract -Q -s 66493_in.iontor.fastq -x 66493_traceinfo_in.iontor.xml *.sff mira -project=66493 -job=denovo,genome,accurate,iontor -GE:not=8 -DI:trt=/tmp/mira_temp >& log_assembly.txt & Incidentally, looking back at previous IonTorrent data which assembled with no problems with mira, they have the 'tcag' at the start also. It looks like it is reading the XML file though (as shown by the 5k of the hash_stats.txt file above) > If you are sure MIRA got the clips from XML (or clipped reads from the > start), on to possible solutions for you: > 1) use MIRA 3.9.0. I also had problems with large hash statistics files and > rewrote the code. The hash statistics there uses more memory, is a tad > slower, but substantially slashes the amount of needed disk space. In case > you do not want to use 3.9.0 for the assembly, you can still use it for > preprocessing only and use the then clipped reads in 3.4.x ... maybe that > would be enough > 2) as last resort only: perform yourself a clip at 200bp ... at least that is > what I would try when seeing > http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M3 > Maybe even trim somewhere between 150 and 200bp, this is what > http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M10 > tells me. Can mira do this clipping? If so how do i tell it to... I can't see how to tell it not to try and assemble the dataset but just do the preprocessing. Thanks again Adam > > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html