[codeface] Re: [PATCH 2/2] Add functions to compute developer classification

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Thu, 8 Oct 2015 11:19:40 +0200



Am 08/10/2015 um 11:16 schrieb Mitchell Joblin:

On Wed, Oct 7, 2015 at 4:54 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote:
Am 07/10/2015 um 16:30 schrieb Mitchell Joblin:
- This implements the standard notion of core and pheripheral
developer based on participation in the version control system
Ref: Terceiro A, Rios LR, Chavez C (2010) An empirical study on
the structural complexity introduced by core and peripheral
developers in free software projects.

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/developer_classification.R | 26 ++++++++++++++++++++++++++
codeface/R/test_developer_classification.R | 17 +++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 codeface/R/developer_classification.R
create mode 100644 codeface/R/test_developer_classification.R

diff --git a/codeface/R/developer_classification.R
b/codeface/R/developer_classification.R
new file mode 100644
index 0000000..251099c
--- /dev/null
+++ b/codeface/R/developer_classification.R
that would be the only file with suffix "R" (as opposed to "r")
in this directory. Any reason for this? If not, please change to
the standard suffix since this simplifies grepping through the files.

This is the specification provided on Google's style guide and I think
this is also the convention used in most R packages. I got in the
habit of using that convention in other work but I forgot we were not
following that in Codeface. We do want to use the Google style guide
though right?

good catch -- we do want to use the google style guide, indeed.
However, instead of renaming all files except one, let's keep the lower
case suffix and make this explicit in our conventions. The main thing
is not upper/lower case, but consistency.

I'll update the wiki accordingly.

Thanks, Wolfgang



@@ -0,0 +1,26 @@
+source("db.r")
+source("query.r")
+
+## Classify a set of developers based on the number of commits made
withing a
withing -> within

+## time range using the standard participation based notion
I'm not sure if everybody knows what the "standard participation based
notion" is.

I'll explain that a bit better.

Kind regards,

Mitchell


+get.developer.class.con <- function(con, project.id, start.date, end.date)
{
+ commit.df <- get.commits.by.date.con(con, project.id, start.date,
end.date)
+ developer.class <- get.developer.class(commit.df)
+
+ return(developer.class)
+}
+
+## Low-level function to compute classification
+get.developer.class <- function(commit.df, threshold=0.8) {
a sentence that describes what you are doing would be helpful
(it's clear that you classify into core and peripheral developers,
but figuring out how you do that takes a moment)

+ author.commit.count <- count(commit.df, "author")
+ author.commit.count <-
author.commit.count[order(-author.commit.count$freq),]
+ num.commits <- nrow(commit.df)
+ commit.threshold <- round(threshold * num.commits)
+ core.test <- cumsum(author.commit.count$freq) < commit.threshold
+ core.developers <- author.commit.count[core.test,]
+ peripheral.developers <- author.commit.count[!core.test,]
+ res <- list(core=core.developers, peripheral=peripheral.developers)

+
+ return(res)
+}
+
diff --git a/codeface/R/test_developer_classification.R
b/codeface/R/test_developer_classification.R
new file mode 100644
index 0000000..1fbc6cd
--- /dev/null
+++ b/codeface/R/test_developer_classification.R
@@ -0,0 +1,17 @@
+library(testthat)
+source("developer_classification.R")
+
+get.developer.class.test <- function() {
+ threshold <- 0.8
+ sample.size <- 1000
+
+ commit.df <- data.frame(author=sample(1:50, size=sample.size, replace=T))
+ developer.class <- get.developer.class(commit.df, threshold)
+
+ res <- sum(developer.class$core$freq) < threshold*sample.size
+ return(res)
+}
+
+test_that("get.developer.class returns expected values", {
+ expect_true(get.developer.class.test())
+ })
\ No newline at end of file


looks good except for the missing newline.

Acked-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>

Best regards, Wolfgang



Other related posts: