[codeface] [PATCH] Change default email analysis behavior to load mbox file

  • From: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Fri, 20 Nov 2015 16:58:22 +0100

- The automatic switching between cached data and the mbox
files can lead to the unintentional reuse of stale data

- For debugging purposes we retain a function parameter for
loading a cached corpus

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/ml/analysis.r | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
index cec28e9..24b78f8 100644
--- a/codeface/R/ml/analysis.r
+++ b/codeface/R/ml/analysis.r
@@ -29,13 +29,12 @@ source("../mc_helpers.r")
source("project.spec.r")
source("ml_utils.r")

-gen.forest <- function(conf, repo.path, resdir) {
+gen.forest <- function(conf, repo.path, resdir, use.mbox=TRUE) {
## TODO: Use apt ML specific preprocessing functions, not always the
## lkml variant
corp.file <- file.path(resdir, paste("corp.base", conf$listname, sep="."))
- doCompute <- !(file.exists(corp.file))

- if (doCompute) {
+ if (use.mbox) {
corp.base <- gen.corpus(conf$listname, repo.path, suffix=".mbox",
marks=c("^_{10,}", "^-{10,}", "^[*]{10,},",
# Also remove inline diffs. TODO: Better
@@ -51,9 +50,12 @@ gen.forest <- function(conf, repo.path, resdir) {
encoding="UTF-8",
preprocess=linux.kernel.preprocess)
save(file=corp.file, corp.base)
- } else {
+ } else if (!use.mbox & file.exists(corp.file)) {
loginfo("Loading mail data from precomputed corpus instead of mbox file")
load(file=corp.file)
+ } else {
+ logerror("Corpus file not found")
+ stop()
}

return(corp.base)
--
2.1.4


Other related posts: