Hi Camila, If you have your sff file, you could use sff_extract to obtain your fasta, qualities and XML info with all the clipping information you need. If you do not have access to your sff file you could use a small script to transform your fasta and qual files to fastq and then use Ssaha2 to trim your primers. That way you would not need to use sequences without their qualities. If you do not have the script you could google it and you will find at least one. You can download Ssaha2 binaries from Sanger. Now, as for your memory problem, you could reduce the number of sequences you try to assemble. But I believe there is a way to manage memory needs from command line, but I have not used it so far. Regards, Juan Montenegro 2011/7/18 Mazzoni, Camila <mazzoni@xxxxxxxxxxxxx> > Hello, > > I tried to read about others' problems with memory but it doesn't seem to > apply to mine. I'm trying to run a cDNA assembly from 454 using a fasta file > only because I had to trim the SMART primers and couldn't do it in the > quality files. After trimming, many reads became empty, I hope it's not a > problem. There's a warning, but it runs anyway. > The assembly ran for a couple of days until I got the problem. Would really > appreciate some help. > > G4PJSDO01AVIZX: unable to load or other reason for invalid data. > G4PJSDO01D1A0V: unable to load or other reason for invalid data. > G4PJSDO01DQ1WX: unable to load or other reason for invalid data. > G4PJSDO01CP10J: unable to load or other reason for invalid data. > G4PJSDO01ENJN9: unable to load or other reason for invalid data. > G4PJSDO01DY3E3: unable to load or other reason for invalid data. > G4PJSDO01A5O3K: unable to load or other reason for invalid data. > G4PJSDO01A0QYE: unable to load or other reason for invalid data. > G4PJSDO01DM9DJ: unable to load or other reason for invalid data. > G4PJSDO01ED04F: unable to load or other reason for invalid data. > G4PJSDO01DOW9Z: unable to load or other reason for invalid data. > G4PJSDO01CVKTS: unable to load or other reason for invalid data. > G4PJSDO01DC53K: unable to load or other reason for invalid data. > > =========================================================================== > Pool statistics: > Backbones: 0 Backbone rails: 0 > > Sanger 454 PacBio Solexa SOLiD > ---------------------------------------- > Total reads 0 486969 0 0 0 > Reads wo qual 0 486969 0 0 0 > Used reads 0 126111 0 0 0 > Avg tot rlen 0 136 0 0 0 > Avg rlen used 0 382 0 0 0 > > With strain 0 0 0 0 0 > W/o clips 0 366774 0 0 0 > =========================================================================== > > > Localtime: Sun Jul 17 15:27:04 2011 > Writing temporary hstat files: > [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] > ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... > [90%] ....|.... [100%] done > Localtime: Sun Jul 17 15:27:18 2011 > > Analysing hstat files: > [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] > ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... > [90%] ....|.... [100%] > Localtime: Sun Jul 17 15:28:18 2011 > Hash statistics: > ========================================================= > Measured avg. frequency coverage: 38 > > Deduced thresholds: > ------------------- > Min normal cov: 15 > Max normal cov: 61 > Repeat cov: 72 > Heavy cov: 304 > Crazy cov: 760 > Mask cov: 3800 > > Repeat ratio histogram: > ----------------------- > 0 1092396 > 1 176514 > 2 67544 > 3 34124 > 4 19688 > 5 13174 > 6 7984 > 7 5068 > 8 3056 > 9 2316 > 10 2190 > 11 1740 > 12 1302 > 13 1258 > 14 1050 > 15 930 > 16 1144 > 17 916 > 18 558 > 19 464 > 20 472 > 21 572 > 22 592 > 23 550 > 24 396 > 25 474 > 26 492 > 27 326 > 28 374 > 29 466 > 30 580 > 31 500 > 32 448 > 33 342 > 34 436 > 35 330 > 36 174 > 37 254 > 38 232 > 39 150 > 40 138 > 41 120 > 42 132 > 43 208 > 44 224 > 45 122 > 46 168 > 47 224 > 48 228 > 49 304 > 50 178 > 51 126 > 52 152 > 53 104 > 54 130 > 55 154 > 56 112 > 57 94 > 58 52 > 59 108 > 60 114 > 61 100 > 62 118 > 63 144 > 64 94 > 65 166 > 66 108 > 67 76 > 68 70 > 69 110 > 70 138 > 71 96 > 72 80 > 73 74 > 74 68 > 75 72 > 76 76 > 77 78 > 78 106 > 79 96 > 80 102 > 81 106 > 82 136 > 83 170 > 84 208 > 85 220 > 86 162 > 87 134 > 88 128 > 89 74 > 90 114 > 91 108 > 92 50 > 93 82 > 94 92 > 95 46 > 96 66 > 97 54 > 98 88 > 99 80 > 100 74 > 101 74 > 102 74 > 103 74 > 104 80 > 105 70 > 106 72 > 107 62 > 108 74 > 109 64 > 110 64 > 111 104 > 112 98 > 113 92 > 114 76 > 115 96 > 116 102 > 117 98 > 118 62 > 119 46 > 120 36 > 121 56 > 122 80 > 123 70 > 124 118 > 125 62 > 126 64 > 127 76 > 128 84 > 129 98 > 130 110 > 131 86 > 132 66 > 133 52 > 134 36 > 135 54 > 136 142 > 137 46 > 138 44 > 139 52 > 140 66 > 141 92 > 142 56 > 143 36 > 144 46 > 145 98 > 146 36 > 147 48 > 148 20 > 149 20 > 150 48 > 151 50 > 152 22 > 153 42 > 154 30 > 155 22 > 156 22 > 157 32 > 158 40 > 159 24 > 160 22 > 161 26 > 162 26 > 163 22 > 164 34 > 165 22 > 166 4 > 167 14 > 168 24 > 169 18 > 170 48 > 171 28 > 172 34 > 173 22 > 174 18 > 175 6 > 176 10 > 177 8 > 178 16 > 179 4 > 180 12 > 181 10 > 182 8 > 183 4 > 184 2 > 185 2 > 186 4 > 188 4 > 189 4 > 190 12 > 191 10 > 192 8 > 193 12 > 194 18 > 195 18 > 196 12 > 197 18 > 198 12 > 199 12 > 200 6 > 201 6 > 202 2 > 203 12 > 204 6 > 205 12 > 206 4 > 207 4 > 208 8 > 209 2 > 211 6 > 212 8 > 213 12 > 214 16 > 215 20 > 216 14 > 217 18 > 218 18 > 219 28 > 220 6 > 221 4 > 222 8 > 223 12 > 224 2 > 225 4 > 226 18 > 227 42 > 228 12 > 229 26 > 230 36 > 231 24 > 232 16 > 233 8 > 234 18 > 235 14 > 236 18 > 237 12 > 238 24 > 239 38 > 240 18 > 241 10 > 317 2 > ========================================================= > > Assigning statistics values: > Localtime: Sun Jul 17 15:28:24 2011 > [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] > ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... > [90%] ....|.... [100%] > Localtime: Sun Jul 17 15:28:39 2011 > clean up temporary stat files...Localtime: Sun Jul 17 15:28:39 2011 > Writing read repeat info to: > G4PJSDO01_20110627_output_gact_assembly/G4PJSDO01_20110627_output_gact_d_info/G4PJSDO01_20110627_output_gact_info_readrepeats.lst > ... 70344 sequences with 290480 masked stretches. > Localtime: Sun Jul 17 15:28:41 2011 > > > Searching for possible overlaps: > Localtime: Sun Jul 17 15:28:44 2011 > Now running threaded and partitioned skimmer with 1 partitions in 2 > threads: > Ouch, out of memory detected. > > > ========================== Memory self assessment > ============================== > Running in 64 bit mode. > > Dump from /proc/meminfo > > -------------------------------------------------------------------------------- > MemTotal: 16471540 kB > MemFree: 235044 kB > Buffers: 1136 kB > Cached: 12848 kB > SwapCached: 55960 kB > Active: 13934608 kB > Inactive: 2154392 kB > Active(anon): 13930900 kB > Inactive(anon): 2145880 kB > Active(file): 3708 kB > Inactive(file): 8512 kB > Unevictable: 16 kB > Mlocked: 16 kB > SwapTotal: 1052216 kB > SwapFree: 236 kB > Dirty: 12 kB > Writeback: 312 kB > AnonPages: 16020528 kB > Mapped: 5376 kB > Shmem: 972 kB > Slab: 29332 kB > SReclaimable: 11716 kB > SUnreclaim: 17616 kB > KernelStack: 2160 kB > PageTables: 48628 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 9287984 kB > Committed_AS: 17754680 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 305136 kB > VmallocChunk: 34350540532 kB > HardwareCorrupted: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 9984 kB > DirectMap2M: 3135488 kB > DirectMap1G: 13631488 kB > > -------------------------------------------------------------------------------- > > Dump from /proc/self/status > > -------------------------------------------------------------------------------- > Name: mira > State: R (running) > Tgid: 19465 > Pid: 19465 > PPid: 1 > TracerPid: 0 > Uid: 1010 1010 1010 1010 > Gid: 100 100 100 100 > FDSize: 256 > Groups: 100 > VmPeak: 17188460 kB > VmSize: 17188460 kB > VmLck: 0 kB > VmHWM: 16172456 kB > VmRSS: 15973200 kB > VmData: 17183588 kB > VmStk: 108 kB > VmExe: 4728 kB > VmLib: 0 kB > VmPTE: 33384 kB > Threads: 1 > SigQ: 0/128618 > SigPnd: 0000000000000000 > ShdPnd: 0000000000000000 > SigBlk: 0000000000000000 > SigIgn: 0000000000000001 > SigCgt: 0000000180000000 > CapInh: 0000000000000000 > CapPrm: 0000000000000000 > CapEff: 0000000000000000 > CapBnd: ffffffffffffffff > Cpus_allowed: fff > Cpus_allowed_list: 0-11 > Mems_allowed: 00000000,00000003 > Mems_allowed_list: 0-1 > voluntary_ctxt_switches: 44914 > nonvoluntary_ctxt_switches: 232867 > Stack usage: 104 kB > > -------------------------------------------------------------------------------- > > Information on current assembly object: > > AS_readpool: 486969 reads. > AS_contigs: 0 contigs. > AS_bbcontigs: 0 contigs. > Mem used for reads: 753421136 (719 MiB) > > Memory used in assembly structures: > Eff. Size Free cap. > LostByAlign > AS_writtenskimhitsperid: 486969 2 MiB 0 B 4 > B > AS_skim_edges: 0 7.7 GiB 7.7 GiB 0 > B > AS_adsfacts: 0 133 MiB 133 MiB 4 > B > AS_confirmed_edges: 0 267 MiB 267 MiB 4 > B > AS_permanent_overlap_bans: 155345426 5.8 GiB 0 B 0 > B > AS_readhitmiss: 0 24 B 0 B 0 > B > AS_readhmcovered: 0 24 B 0 B 0 > B > AS_count_rhm: 0 24 B 0 B 0 > B > AS_clipleft: 486969 2 MiB 0 B 4 > B > AS_clipright: 486969 2 MiB 0 B 4 > B > AS_used_ids: 0 476 KiB 476 KiB 7 > B > AS_multicopies: 486969 476 KiB 0 B 7 > B > AS_hasmcoverlaps: 486969 476 KiB 0 B 7 > B > AS_maxcoveragereached: 486969 2 MiB 0 B 4 > B > AS_coverageperseqtype: 0 24 B 0 B 0 > B > AS_istroublemaker: 486969 476 KiB 0 B 7 > B > AS_isdebris: 486969 476 KiB 0 B 7 > B > AS_needalloverlaps: 486969 476 KiB 7 B 0 > B > AS_readsforrepeatresolve: 0 40 B 0 B 0 > B > AS_allrmbsok: 0 24 B 0 B 0 > B > AS_probablermbsnotok: 0 24 B 0 B 0 > B > AS_weakrmbsnotok: 0 24 B 0 B 0 > B > AS_readmaytakeskim: 0 40 B 0 B 0 > B > AS_skimstaken: 0 40 B 0 B 0 > B > AS_numskimoverlaps: 0 24 B 0 B 0 > B > AS_numleftextendskims: 0 24 B 0 B 0 > B > AS_rightextendskims: 0 24 B 0 B 0 > B > AS_skimleftextendratio: 0 24 B 0 B 0 > B > AS_skimrightextendratio: 0 24 B 0 B 0 > B > AS_usedlogfiles: 32 1 KiB 0 B 0 > B > Total: 15645945424 (14.6 GiB) > > > ================================================================================ > > > ========================== Memory self assessment > ============================== > Running in 64 bit mode. > > Dump from /proc/meminfo > > -------------------------------------------------------------------------------- > MemTotal: 16471540 kB > MemFree: 121808 kB > Buffers: 5004 kB > Cached: 48048 kB > SwapCached: 64824 kB > Active: 13986316 kB > Inactive: 2217332 kB > Active(anon): 13971012 kB > Inactive(anon): 2180684 kB > Active(file): 15304 kB > Inactive(file): 36648 kB > Unevictable: 16 kB > Mlocked: 16 kB > SwapTotal: 1052216 kB > SwapFree: 31128 kB > Dirty: 780 kB > Writeback: 0 kB > AnonPages: 16086008 kB > Mapped: 16436 kB > Shmem: 1100 kB > Slab: 29312 kB > SReclaimable: 11732 kB > SUnreclaim: 17580 kB > KernelStack: 2600 kB > PageTables: 49312 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 9287984 kB > Committed_AS: 18208056 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 305136 kB > VmallocChunk: 34350540532 kB > HardwareCorrupted: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 9984 kB > DirectMap2M: 3135488 kB > DirectMap1G: 13631488 kB > > -------------------------------------------------------------------------------- > > Dump from /proc/self/status > > -------------------------------------------------------------------------------- > Name: mira > State: R (running) > Tgid: 19465 > Pid: 19465 > PPid: 1 > TracerPid: 0 > Uid: 1010 1010 1010 1010 > Gid: 100 100 100 100 > FDSize: 256 > Groups: 100 > VmPeak: 17188460 kB > VmSize: 17188460 kB > VmLck: 0 kB > VmHWM: 16172456 kB > VmRSS: 15972808 kB > VmData: 17183588 kB > VmStk: 108 kB > VmExe: 4728 kB > VmLib: 0 kB > VmPTE: 33384 kB > Threads: 1 > SigQ: 0/128618 > SigPnd: 0000000000000000 > ShdPnd: 0000000000000000 > SigBlk: 0000000000000000 > SigIgn: 0000000000000001 > SigCgt: 0000000180000000 > CapInh: 0000000000000000 > CapPrm: 0000000000000000 > CapEff: 0000000000000000 > CapBnd: ffffffffffffffff > Cpus_allowed: fff > Cpus_allowed_list: 0-11 > Mems_allowed: 00000000,00000003 > Mems_allowed_list: 0-1 > voluntary_ctxt_switches: 44925 > nonvoluntary_ctxt_switches: 232898 > Stack usage: 104 kB > > -------------------------------------------------------------------------------- > > Information on current assembly object: > > AS_readpool: 486969 reads. > AS_contigs: 0 contigs. > AS_bbcontigs: 0 contigs. > Mem used for reads: 753421136 (719. MiB) > > Memory used in assembly structures: > Eff. Size Free cap. > LostByAlign > AS_writtenskimhitsperid: 486969 2 MiB 0 B 4 > B > AS_skim_edges: 0 7.7 GiB 7.7 GiB 0 > B > AS_adsfacts: 0 133 MiB 133 MiB 4 > B > AS_confirmed_edges: 0 267 MiB 267 MiB 4 > B > AS_permanent_overlap_bans: 155345426 5.8 GiB 0 B 0 > B > AS_readhitmiss: 0 24 B 0 B 0 > B > AS_readhmcovered: 0 24 B 0 B 0 > B > AS_count_rhm: 0 24 B 0 B 0 > B > AS_clipleft: 486969 2 MiB 0 B 4 > B > AS_clipright: 486969 2 MiB 0 B 4 > B > AS_used_ids: 0 476 KiB 476 KiB 7 > B > AS_multicopies: 486969 476 KiB 0 B 7 > B > AS_hasmcoverlaps: 486969 476 KiB 0 B 7 > B > AS_maxcoveragereached: 486969 2 MiB 0 B 4 > B > AS_coverageperseqtype: 0 24 B 0 B 0 > B > AS_istroublemaker: 486969 476 KiB 0 B 7 > B > AS_isdebris: 486969 476 KiB 0 B 7 > B > AS_needalloverlaps: 486969 476 KiB 7 B 0 > B > AS_readsforrepeatresolve: 0 40 B 0 B 0 > B > AS_allrmbsok: 0 24 B 0 B 0 > B > AS_probablermbsnotok: 0 24 B 0 B 0 > B > AS_weakrmbsnotok: 0 24 B 0 B 0 > B > AS_readmaytakeskim: 0 40 B 0 B 0 > B > AS_skimstaken: 0 40 B 0 B 0 > B > AS_numskimoverlaps: 0 24 B 0 B 0 > B > AS_numleftextendskims: 0 24 B 0 B 0 > B > AS_rightextendskims: 0 24 B 0 B 0 > B > AS_skimleftextendratio: 0 24 B 0 B 0 > B > AS_skimrightextendratio: 0 24 B 0 B 0 > B > AS_usedlogfiles: 32 1 KiB 0 B 0 > B > Total: 15645945424 (14.6 GiB) > > > ================================================================================ > Dynamic allocs: 28 > Align allocs: 618 > Out of memory detected, exception message is: std::bad_alloc > > > If you have questions on why this happened, please send the last 1000 > lines of the output log (or better: the complete file) to the author > together with a short summary of your assembly project. > > > > For general help, you will probably get a quicker response on the > MIRA talk mailing list > than if you mailed the author directly. > > To report bugs or ask for features, please use the new ticketing system at: > http://sourceforge.net/apps/trac/mira-assembler/ > This ensures that requests don't get lost. > >