On Wed, Oct 7, 2015 at 4:54 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote:
Am 07/10/2015 um 16:30 schrieb Mitchell Joblin:
- This implements the standard notion of core and pheripheralthat would be the only file with suffix "R" (as opposed to "r")
developer based on participation in the version control system
Ref: Terceiro A, Rios LR, Chavez C (2010) An empirical study on
the structural complexity introduced by core and peripheral
developers in free software projects.
Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/developer_classification.R | 26 ++++++++++++++++++++++++++
codeface/R/test_developer_classification.R | 17 +++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 codeface/R/developer_classification.R
create mode 100644 codeface/R/test_developer_classification.R
diff --git a/codeface/R/developer_classification.R
b/codeface/R/developer_classification.R
new file mode 100644
index 0000000..251099c
--- /dev/null
+++ b/codeface/R/developer_classification.R
in this directory. Any reason for this? If not, please change to
the standard suffix since this simplifies grepping through the files.
@@ -0,0 +1,26 @@withing -> within
+source("db.r")
+source("query.r")
+
+## Classify a set of developers based on the number of commits made withing
a
+## time range using the standard participation based notionI'm not sure if everybody knows what the "standard participation based
notion" is.
+get.developer.class.con <- function(con, project.id, start.date, end.date) {a sentence that describes what you are doing would be helpful
+ commit.df <- get.commits.by.date.con(con, project.id, start.date,
end.date)
+ developer.class <- get.developer.class(commit.df)
+
+ return(developer.class)
+}
+
+## Low-level function to compute classification
+get.developer.class <- function(commit.df, threshold=0.8) {
(it's clear that you classify into core and peripheral developers,
but figuring out how you do that takes a moment)
+ author.commit.count <- count(commit.df, "author")
+ author.commit.count <-
author.commit.count[order(-author.commit.count$freq),]
+ num.commits <- nrow(commit.df)
+ commit.threshold <- round(threshold * num.commits)
+ core.test <- cumsum(author.commit.count$freq) < commit.threshold
+ core.developers <- author.commit.count[core.test,]
+ peripheral.developers <- author.commit.count[!core.test,]
+ res <- list(core=core.developers, peripheral=peripheral.developers)
+
+ return(res)
+}
+
diff --git a/codeface/R/test_developer_classification.R
b/codeface/R/test_developer_classification.R
new file mode 100644
index 0000000..1fbc6cd
--- /dev/null
+++ b/codeface/R/test_developer_classification.R
@@ -0,0 +1,17 @@
+library(testthat)
+source("developer_classification.R")
+
+get.developer.class.test <- function() {
+ threshold <- 0.8
+ sample.size <- 1000
+
+ commit.df <- data.frame(author=sample(1:50, size=sample.size, replace=T))
+ developer.class <- get.developer.class(commit.df, threshold)
+
+ res <- sum(developer.class$core$freq) < threshold*sample.size
+ return(res)
+}
+
+test_that("get.developer.class returns expected values", {
+ expect_true(get.developer.class.test())
+ })
\ No newline at end of file
looks good except for the missing newline.
Acked-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
Best regards, Wolfgang