[codeface] Re: [PATCH] Add function to compute developer classification based on centrality

  • From: Mitchell Joblin <joblin.m@xxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Fri, 9 Oct 2015 10:34:05 +0200

On Thu, Oct 8, 2015 at 7:19 PM, Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxxxxxxxx> wrote:

Hi Mitchell,

Am 08/10/2015 um 17:48 schrieb Mitchell Joblin:
Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>

a few words of commit description could do no harm, so people know
why this is useful.

Sure. This is mostly for the research experiments ATM so it has no
practical application in its present form. Shall I still post
additions which experimental or only once they are in a mature form?

---
codeface/R/developer_classification.r | 18 ++++++++++++++++++
codeface/R/test_developer_classification.r | 16 +++++++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/codeface/R/developer_classification.r
b/codeface/R/developer_classification.r
index dff8c92..75c25b4 100644
--- a/codeface/R/developer_classification.r
+++ b/codeface/R/developer_classification.r
@@ -1,3 +1,5 @@
+suppressMessages(library(igraph))
+
source("db.r")
source("query.r")

@@ -28,3 +30,19 @@ get.developer.class <- function(commit.df, threshold=0.8)
{
return(res)
}

+## Determine developer class based on vertex centrality
+get.developer.class.centrality <- function(edgelist, vertex.ids,
threshold=0.8,
+ FUN=igraph::degree) {
+ graph <- graph.data.frame(edgelist, directed=TRUE,
+ vertices=data.frame(vertex.ids))
+ centrality.vec <- sort(FUN(graph), decreasing=T)
+ centrality.df <- data.frame(author=names(centrality.vec),
+ centrality=as.vector(centrality.vec))
+ centrality.threshold <- 0.8 * sum(centrality.vec)

I suppose that should be threshold*sum(...)?

Yes, that's right. Thanks!


+ core.test <- cumsum(centrality.vec) < centrality.threshold
+ core.developers <- centrality.df[core.test,]
+ peripheral.developers <- centrality.df[!core.test,]
+ res <- list(core=core.developers, peripheral=peripheral.developers)
+
+ return(res)
+}
diff --git a/codeface/R/test_developer_classification.r
b/codeface/R/test_developer_classification.r
index 4a9640d..4430780 100644
--- a/codeface/R/test_developer_classification.r
+++ b/codeface/R/test_developer_classification.r
@@ -7,11 +7,25 @@ get.developer.class.test <- function() {

commit.df <- data.frame(author=sample(1:50, size=sample.size, replace=T))
developer.class <- get.developer.class(commit.df, threshold)
-
res <- sum(developer.class$core$freq) < threshold*sample.size
return(res)
}

+get.developer.class.centrality.test <- function() {
+ threshold <- 0.8

to catch errors like the above, it would be better to chose a
value different than the standard choice here.

+ g <- barabasi.game(300)
+ edgelist <- get.data.frame(g)
+ vertex.ids <- c(as.vector(V(g)), 301:305)
+ developer.class <- get.developer.class.centrality(edgelist, vertex.ids,
+ threshold, degree)
+ res <- sum(developer.class$core$centrality) < threshold*sum(degree(g))
+ return(res)
+}
+
test_that("get.developer.class returns expected values", {
expect_true(get.developer.class.test())
})
+
+test_that("get.developer.class.centrality returns expected values", {
+ expect_true(get.developer.class.centrality.test())
+ })
\ No newline at end of file

and the usual missing newline ;) (this is not much of an issue, but
it can cause CR/CRLF problems when dealing with different platforms,
and makes one single line special -- all lines have a newline, except
for the last one).

Sure, I will fix that.

Thanks for the review.

--Mitchell


Reviewed-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>

Thanks, Wolfgang





Other related posts: