Yep, that's the plan. I got a box with 512G of memory, which hopefully should be enough. I'm only putting in 1/7 of the data though. On Thu, Sep 15, 2011 at 2:58 PM, Robert Bruccoleri < bruc@xxxxxxxxxxxxxxxxxxxxx> wrote: > ** > Dear Artemus, > Is this a plant genome? Are you sequencing the whole thing? > > Cheers, > Bob > > > Bastien Chevreux wrote: > > On Sep 15, 2011, at 23:02 , Robert Bruccoleri wrote: > > > For the benefit of all mira users, could you explain these one letter > codes in more detail? Specifically, what do they all mean and what can be > done about them? > > > Probably, but not atm, I'm a bit short on time. > > > > I've looked in the source code, and I understand some of the them (like > 'G' which means repetitive sequence), but I don't understand what 'a' really > means. > > > 'a' == Align problem > > Specifically, there was an align overlap in pairwise comparison between reads > r1 and r2 which could be computed during the Smith-Waterman screening. But > during contig building, one of the reads (say, r1) got inserted in the contig > and when the pathfinder told the contig to use the align overlap of r1 & r2 > as template to insert read r2, the contig suddenly did not find any overlap > anymore. Often happens at repetitive sites or when reads inserted in-between > bring in too much noise through sequencing errors. > > But the somewhat larger amount of 'a' Artemus posted isn't really what made > me gasped ... it was more the x / y / z numbers at the end of each line: it's > a timing MIRA keeps track which shows how much time it spend where. The 'x' > component is the one for the pathfinder and is generally in the single or > two-digit range. Repetitive areas spike it up to higher numbers (three, very > rarely four digits), but these normally then go back down more or less > quickly. > > The numbers posted are 6-digit! Meaning that for considerable stretches it > takes 10,000 times longer than it should. I'd now like to find out what > triggers this. > > B. > > > > > > > -- Artemus Harper