[codeface] [PATCH 2/2] Change developer classification to use new db query

  • From: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Thu, 15 Oct 2015 10:48:20 +0200

- For the project evolution analysis we need to make many
large queries regarding the commits in parallel and it
far more efficient to do the aggregation in database

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/developer_classification.r | 10 +++++-----
codeface/R/test_developer_classification.r | 3 ++-
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/codeface/R/developer_classification.r
b/codeface/R/developer_classification.r
index 67cf035..84a24b5 100644
--- a/codeface/R/developer_classification.r
+++ b/codeface/R/developer_classification.r
@@ -10,17 +10,17 @@ source("query.r")
## the structural complexity introduced by core and peripheral
## developers in free software projects.
get.developer.class.con <- function(con, project.id, start.date, end.date) {
- commit.df <- get.commits.by.date.con(con, project.id, start.date, end.date)
- developer.class <- get.developer.class(commit.df)
+ commit.count.df <- get.commits.by.date.con(con, project.id, start.date,
end.date,
+ commit.count=TRUE)
+ developer.class <- get.developer.class(commit.count.df)

return(developer.class)
}

## Low-level function to compute classification
-get.developer.class <- function(commit.df, threshold=0.8) {
- author.commit.count <- count(commit.df, "author")
+get.developer.class <- function(author.commit.count, threshold=0.8) {
author.commit.count <- author.commit.count[order(-author.commit.count$freq),]
- num.commits <- nrow(commit.df)
+ num.commits <- sum(author.commit.count$freq)
commit.threshold <- round(threshold * num.commits)
core.test <- cumsum(author.commit.count$freq) < commit.threshold
core.developers <- author.commit.count[core.test,]
diff --git a/codeface/R/test_developer_classification.r
b/codeface/R/test_developer_classification.r
index a4b9071..a3903bb 100644
--- a/codeface/R/test_developer_classification.r
+++ b/codeface/R/test_developer_classification.r
@@ -6,7 +6,8 @@ get.developer.class.test <- function() {
sample.size <- 1000

commit.df <- data.frame(author=sample(1:50, size=sample.size, replace=T))
- developer.class <- get.developer.class(commit.df, threshold)
+ author.commit.count <- count(commit.df, "author")
+ developer.class <- get.developer.class(author.commit.count, threshold)
res <- sum(developer.class$core$freq) < threshold*sample.size
return(res)
}
--
2.1.4


Other related posts: