[codeface] Re: [PATCH 2/2] Add functions to compute developer classification

  • From: Mitchell Joblin <joblin.m@xxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Thu, 8 Oct 2015 11:16:12 +0200

On Wed, Oct 7, 2015 at 4:54 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote:

Am 07/10/2015 um 16:30 schrieb Mitchell Joblin:
- This implements the standard notion of core and pheripheral
developer based on participation in the version control system
Ref: Terceiro A, Rios LR, Chavez C (2010) An empirical study on
the structural complexity introduced by core and peripheral
developers in free software projects.

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/developer_classification.R | 26 ++++++++++++++++++++++++++
codeface/R/test_developer_classification.R | 17 +++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 codeface/R/developer_classification.R
create mode 100644 codeface/R/test_developer_classification.R

diff --git a/codeface/R/developer_classification.R
b/codeface/R/developer_classification.R
new file mode 100644
index 0000000..251099c
--- /dev/null
+++ b/codeface/R/developer_classification.R
that would be the only file with suffix "R" (as opposed to "r")
in this directory. Any reason for this? If not, please change to
the standard suffix since this simplifies grepping through the files.

This is the specification provided on Google's style guide and I think
this is also the convention used in most R packages. I got in the
habit of using that convention in other work but I forgot we were not
following that in Codeface. We do want to use the Google style guide
though right?


@@ -0,0 +1,26 @@
+source("db.r")
+source("query.r")
+
+## Classify a set of developers based on the number of commits made withing
a
withing -> within

+## time range using the standard participation based notion
I'm not sure if everybody knows what the "standard participation based
notion" is.

I'll explain that a bit better.

Kind regards,

Mitchell


+get.developer.class.con <- function(con, project.id, start.date, end.date) {
+ commit.df <- get.commits.by.date.con(con, project.id, start.date,
end.date)
+ developer.class <- get.developer.class(commit.df)
+
+ return(developer.class)
+}
+
+## Low-level function to compute classification
+get.developer.class <- function(commit.df, threshold=0.8) {
a sentence that describes what you are doing would be helpful
(it's clear that you classify into core and peripheral developers,
but figuring out how you do that takes a moment)

+ author.commit.count <- count(commit.df, "author")
+ author.commit.count <-
author.commit.count[order(-author.commit.count$freq),]
+ num.commits <- nrow(commit.df)
+ commit.threshold <- round(threshold * num.commits)
+ core.test <- cumsum(author.commit.count$freq) < commit.threshold
+ core.developers <- author.commit.count[core.test,]
+ peripheral.developers <- author.commit.count[!core.test,]
+ res <- list(core=core.developers, peripheral=peripheral.developers)

+
+ return(res)
+}
+
diff --git a/codeface/R/test_developer_classification.R
b/codeface/R/test_developer_classification.R
new file mode 100644
index 0000000..1fbc6cd
--- /dev/null
+++ b/codeface/R/test_developer_classification.R
@@ -0,0 +1,17 @@
+library(testthat)
+source("developer_classification.R")
+
+get.developer.class.test <- function() {
+ threshold <- 0.8
+ sample.size <- 1000
+
+ commit.df <- data.frame(author=sample(1:50, size=sample.size, replace=T))
+ developer.class <- get.developer.class(commit.df, threshold)
+
+ res <- sum(developer.class$core$freq) < threshold*sample.size
+ return(res)
+}
+
+test_that("get.developer.class returns expected values", {
+ expect_true(get.developer.class.test())
+ })
\ No newline at end of file


looks good except for the missing newline.

Acked-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>

Best regards, Wolfgang


Other related posts: