[mira_talk] Re: assembly runs out of temp space

  • From: Adam Witney <awitney@xxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 24 Apr 2012 12:49:43 +0100

Hi Bastien,

Thanks for your email.

> you'll hate me for this, but I simply have to make that joke: "get more disk 
> space?" But read on for more practical help ;-)

Yes .. if only! :-) But sadly our big box has a small amount of local disk but 
uses large NFS disks for data

>> I have 2934687 reads using the 200bp IonTorrent kit, for a ~3Mb bacterial 
>> genome. I have cut the Hash statistics out of the log_assembly (this file is 
>> about 450Mb)
> 
> The first 5k lines of the log_assembly would have been nice.

Ok I have put the first 5k there now... I have also added another from a second 
sequence run (of a different but similar strain) for comparison

http://bugs.sgul.ac.uk/temp/hash_stats.txt
http://bugs.sgul.ac.uk/temp/hash_stats2.txt

> lets me think that everyone of your sequences starts with "tcag", and if I am 
> not mistaken that's the last part of the adaptor. Did you make sure you used 
> "sff_extract" either with the "-c" option to clip sequences or (preferred) 
> made sure that MIRA has been reading the accompanying XML file? Because if 
> not, this would partly explain the behaviour you are seeing.

my general approach is this

sff_extract -Q -s 66493_in.iontor.fastq -x 66493_traceinfo_in.iontor.xml *.sff
mira -project=66493 -job=denovo,genome,accurate,iontor -GE:not=8 
-DI:trt=/tmp/mira_temp >& log_assembly.txt &

Incidentally, looking back at previous IonTorrent data which assembled with no 
problems with mira, they have the 'tcag' at the start also. It looks like it is 
reading the XML file though (as shown by the 5k of the hash_stats.txt file 
above)

> If you are sure MIRA got the clips from XML (or clipped reads from the 
> start), on to possible solutions for you:
> 1) use MIRA 3.9.0. I also had problems with large hash statistics files and 
> rewrote the code. The hash statistics there uses more memory, is a tad 
> slower, but substantially slashes the amount of needed disk space. In case 
> you do not want to use 3.9.0 for the assembly, you can still use it for 
> preprocessing only and use the then clipped reads in 3.4.x ... maybe that 
> would be enough
> 2) as last resort only: perform yourself a clip at 200bp ... at least that is 
> what I would try when seeing
>   http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M3
>  Maybe even trim somewhere between 150 and 200bp, this is what 
>   http://bugs.sgul.ac.uk/temp/66493_in.iontor_fastqc/fastqc_report.html#M10
>  tells me.

Can mira do this clipping? If so how do i tell it to... I can't see how to tell 
it not to try and assemble the dataset but just do the preprocessing.

Thanks again

Adam
> 
> 



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: