[codeface] Re: [PATCH] Change default email analysis behavior to load mbox file

  • From: Mitchell Joblin <joblin.m@xxxxxxxxx>
  • To: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>, codeface@xxxxxxxxxxxxx
  • Date: Fri, 20 Nov 2015 23:08:24 +0000

On Sat, Nov 21, 2015, 12:07 AM Wolfgang Mauerer <
wolfgang.mauerer@xxxxxxxxxxxxxxxxx> wrote:

Am 21/11/2015 um 00:04 schrieb Mitchell Joblin:

Hi Wolfgang,

On Fri, Nov 20, 2015, 11:59 PM Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxxxxxxxx
<mailto:wolfgang.mauerer@xxxxxxxxxxxxxxxxx>> wrote:

Hi Mitchell,

Am 20/11/2015 um 16:58 schrieb Mitchell Joblin:
> - The automatic switching between cached data and the mbox
> files can lead to the unintentional reuse of stale data
>
> - For debugging purposes we retain a function parameter for
> loading a cached corpus

thanks for the patch -- this basically implements what we
discussed, but I realised that we need to deal with more
cached files than just the corpus. We also load the communication
net, terms for content and subject and the term-document matrix
from cached files when they are available. The new mechansism
should be extended to these cases.

Alright, thanks for the review. I didn't know that other files were also
loaded elsewhere. I will have a look for where that's happening and make
changes.

just search for doCompute in ml/analysis.r -- the same pattern is used
for all cached files.

Thanks for the tip!

--Mitchell

Cheers, Wolfgang

Thanks,

Mitchell

So far,
Reviewed-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx
<mailto:wolfgang.mauerer@xxxxxxxxxxxxxxxxx>>

Thanks & best regards, Wolfgang
>
> Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx
<mailto:mitchell.joblin.ext@xxxxxxxxxxx>>
> ---
> codeface/R/ml/analysis.r | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
> index cec28e9..24b78f8 100644
> --- a/codeface/R/ml/analysis.r
> +++ b/codeface/R/ml/analysis.r
> @@ -29,13 +29,12 @@ source("../mc_helpers.r")
> source("project.spec.r")
> source("ml_utils.r")
>
> -gen.forest <- function(conf, repo.path, resdir) {
> +gen.forest <- function(conf, repo.path, resdir, use.mbox=TRUE) {
> ## TODO: Use apt ML specific preprocessing functions, not
always the
> ## lkml variant
> corp.file <- file.path(resdir, paste("corp.base",
conf$listname, sep="."))
> - doCompute <- !(file.exists(corp.file))
>
> - if (doCompute) {
> + if (use.mbox) {
> corp.base <- gen.corpus(conf$listname, repo.path,
suffix=".mbox",
> marks=c("^_{10,}", "^-{10,}",
"^[*]{10,},",
> # Also remove inline diffs.
TODO: Better
> @@ -51,9 +50,12 @@ gen.forest <- function(conf, repo.path, resdir) {
> encoding="UTF-8",
> preprocess=linux.kernel.preprocess)
> save(file=corp.file, corp.base)
> - } else {
> + } else if (!use.mbox & file.exists(corp.file)) {
> loginfo("Loading mail data from precomputed corpus instead of
mbox file")
> load(file=corp.file)
> + } else {
> + logerror("Corpus file not found")
> + stop()
> }
>
> return(corp.base)
>


Other related posts: