Hi,
I pushed a bunch of patches to next [1] that should simplify overall
usage of PaStA's mailbox analysis. Those patches contain some
fundamental and invasive changes:
- Changes to UPSTREAM_MIN and UPSTREAM_MAX are automatically
recognised. No need to run pasta-prepare any longer. The upstream
commit hash file now contains a header that indicates the range. [2]
- Cumulative Unix mailbox files should now be placed at
'PaStA-resources/your-project/resources/mbox'. Symlinks are fine.
Rationale: instead of parsing huge mailbox files, PaStA now rips up
those bulky files and creates one single file per mail. Those files
are automatically created at
'PaStA-resources/your-project/mbox/year/month/day/MD5' where MD5 is
the MD5 hash of the Message-ID (Message-IDs may contain ~!</, which
are either not escapable or make your filesystem feel dizzy).
Requires procmail. [3] Together with a index file, the directory
scheme allows for a performant selection of time windows.
Mails that can not be parsed or that do not contain parsable patches
are automatically dropped and will not considered in future runs of
PaStA. This additionally increases overall performance when running
'pasta cache'.
- Semantics of 'pasta cache' changed. E.g., 'pasta cache -mbox'
changed to 'pasta cache -create mbox'
Why? There's a counterpart: 'pasta cache -clear mbox' will, e.g.,
remove the mailbox cache. [4]
- PaStA's mbox cache now always caches _all_ parsable mails,
independent of MBOX_{MIN,MAX}DATE. Bad thing is that this results
in huge cache files for mailboxes, but on the other hand, the cache
does not need to be recreated when changing the time window of the
actual analysis.
'pasta analyse' now supports a -mindate and -maxdate parameter.
Mails that are not within this range are temporarily evicted from
the cache during the analysis to save memory. [5]
- Many minor fixes.
Don't forget to update latest PaStA-resources.
I tested those patches extensively, but I'm sure I successfully hid some
bugs.
Cheers
Ralf
[1] https://github.com/lfd/PaStA/tree/next
[2]
https://github.com/lfd/PaStA/commit/81401302ef3e172f2ef0d6cae5c9cc0dab8a9e3b
[3]
https://github.com/lfd/PaStA/commit/c45ecb400cbbe63aae8bdd1dcf1aeabd866eb53a
[4]
https://github.com/lfd/PaStA/commit/e1d2e58e8ae013495d6b9072d14faaa514508780
[5]
https://github.com/lfd/PaStA/commit/3c65ac605c0484c0c31f20ca3059a8b959db79d8