[codeface] Re: [PATCH 5/5] Add test for global mail analysis

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Thu, 5 Nov 2015 00:30:02 +0100

Am 29/10/15 um 11:57 schrieb Mitchell Joblin:

- Check the dispatch.all function works correctly and produces the
correct network

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/ml/test_analysis.r | 42 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/codeface/R/ml/test_analysis.r b/codeface/R/ml/test_analysis.r
index 02333d3..be0b480 100644
--- a/codeface/R/ml/test_analysis.r
+++ b/codeface/R/ml/test_analysis.r
@@ -1,8 +1,11 @@
library(testthat)

+source("../db.r", chdir=T)
source("analysis.r")

-conf <- list(listname="gmane.comp.emulators.qemu")
+conf <- connect.db("../../../codeface_testing.conf")
+conf$listname <- "gmane.comp.emulators.qemu"
+
path <- "test_data"
res.dir <- "test_data"

@@ -21,10 +24,45 @@ test.check.corpus.precon <- function() {
return(all(check.eq))
}

+test.global.analysis <- function () {
+ project.name <- "test_mail"
+ analysis.method <- "mail"

introducing a new analysis method just for testing purposes is basically
fine, but does not make too much sense in production use. If there will
be more than one single analysis mode for emails, we will need to
introduce a corresponding new parameter; if we stay with one single
method, then analysis.method will represent the VCS analysis mode.

To keep confusion low, I'd propose we define one VCS analysis mode as
default, and use this one here, or specify "none" to indicate that no
no VCS analysis has been done (yet).

+ conf$pid <- gen.clear.project.id.con(conf$con, project.name,
analysis.method)
+
+ ## Run analysis
+ dispatch.all(conf, path, res.dir)
+
+ ## Query for edgelist
+ start.date <- "2000-01-01"
+ end.date <- "2020-01-01"
+ edgelist <- query.mail.edgelist(conf$con, conf$pid, start.date, end.date)
+
+ ## Get author id to name mapping
+ id.to.name <- sapply(unique(unlist(edgelist[,c(1,2)])),
is unique(unlist(edgelist[,c(1,2)])) guaranteed to return a continuous
interval without gaps for real-world data sets (i.e., is it guaranteed
that no number is missing for some reason)? If yes, this guarantee
should be documented; something like 1:max(edgelist[,c(1,2)]) might
be a bit easier to understand (I had to contemplate for a few moments
to see what you are trying to do here). Alternatively, a comment would
also help.

+ function(id) query.person.name(conf$con, id))
+ edgelist[,c(1,2)] <- sapply(edgelist[,c(1,2)], function(id) id.to.name[id])
+
+ ## Generate graph from database data
+ g.db <- graph.data.frame(edgelist)
+
+ ## Generate target graph
+ edgelist.target <- data.frame(from=c("chatty kathy", "nasty nate"),
+ to=c("bossy bill", "sneaky sam"),

haha -- glad you follow Johannes Ebke's test case naming conventions ;)
+ weight=1)
+ g.target <- graph.data.frame(edgelist.target)
+
+ ## Test for edge agreement
+ res <- all(E(g.target) == E(g.db))

what's not clear to me: g.db is constructed from a subset of the
real-world qemu mailing list. Why should the edge list agree with the
artificial one constructed above?
+}
+
test_that("Forest generation functions correctly", {
expect_true(test.genforest())
})

test_that("Corpus precondiction checks works", {
expect_true(test.check.corpus.precon())
- })
\ No newline at end of file
+ })
+
+test_that("Global analysis returns correct network", {

Global _mailing list_ analysis (so that it can be distinguished from
VCS analysis tests)

Reviewed-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxx>

Thanks for the series, in particular for introducing new test cases!

Best regards, Wolfgang

+ expect_true(test.global.analysis())
+ })


Other related posts: