[codeface] Re: [PATCH] Add function to compute developer classification based on centrality

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Thu, 8 Oct 2015 19:19:23 +0200

Hi Mitchell,

Am 08/10/2015 um 17:48 schrieb Mitchell Joblin:

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>

a few words of commit description could do no harm, so people know
why this is useful.
---
codeface/R/developer_classification.r | 18 ++++++++++++++++++
codeface/R/test_developer_classification.r | 16 +++++++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/codeface/R/developer_classification.r
b/codeface/R/developer_classification.r
index dff8c92..75c25b4 100644
--- a/codeface/R/developer_classification.r
+++ b/codeface/R/developer_classification.r
@@ -1,3 +1,5 @@
+suppressMessages(library(igraph))
+
source("db.r")
source("query.r")

@@ -28,3 +30,19 @@ get.developer.class <- function(commit.df, threshold=0.8) {
return(res)
}

+## Determine developer class based on vertex centrality
+get.developer.class.centrality <- function(edgelist, vertex.ids,
threshold=0.8,
+ FUN=igraph::degree) {
+ graph <- graph.data.frame(edgelist, directed=TRUE,
+ vertices=data.frame(vertex.ids))
+ centrality.vec <- sort(FUN(graph), decreasing=T)
+ centrality.df <- data.frame(author=names(centrality.vec),
+ centrality=as.vector(centrality.vec))
+ centrality.threshold <- 0.8 * sum(centrality.vec)

I suppose that should be threshold*sum(...)?

+ core.test <- cumsum(centrality.vec) < centrality.threshold
+ core.developers <- centrality.df[core.test,]
+ peripheral.developers <- centrality.df[!core.test,]
+ res <- list(core=core.developers, peripheral=peripheral.developers)
+
+ return(res)
+}
diff --git a/codeface/R/test_developer_classification.r
b/codeface/R/test_developer_classification.r
index 4a9640d..4430780 100644
--- a/codeface/R/test_developer_classification.r
+++ b/codeface/R/test_developer_classification.r
@@ -7,11 +7,25 @@ get.developer.class.test <- function() {

commit.df <- data.frame(author=sample(1:50, size=sample.size, replace=T))
developer.class <- get.developer.class(commit.df, threshold)
-
res <- sum(developer.class$core$freq) < threshold*sample.size
return(res)
}

+get.developer.class.centrality.test <- function() {
+ threshold <- 0.8

to catch errors like the above, it would be better to chose a
value different than the standard choice here.

+ g <- barabasi.game(300)
+ edgelist <- get.data.frame(g)
+ vertex.ids <- c(as.vector(V(g)), 301:305)
+ developer.class <- get.developer.class.centrality(edgelist, vertex.ids,
+ threshold, degree)
+ res <- sum(developer.class$core$centrality) < threshold*sum(degree(g))
+ return(res)
+}
+
test_that("get.developer.class returns expected values", {
expect_true(get.developer.class.test())
})
+
+test_that("get.developer.class.centrality returns expected values", {
+ expect_true(get.developer.class.centrality.test())
+ })
\ No newline at end of file

and the usual missing newline ;) (this is not much of an issue, but
it can cause CR/CRLF problems when dealing with different platforms,
and makes one single line special -- all lines have a newline, except
for the last one).

Reviewed-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>

Thanks, Wolfgang




Other related posts: