[codeface] Re: [PATCH 1/3] Add query to compute dev-dev edgelist based on mailing list communication

  • From: Mitchell Joblin <joblin.m@xxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Mon, 26 Oct 2015 13:53:28 +0100

On Mon, Oct 26, 2015 at 1:08 PM, Andreas Ringlstetter
<andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx> wrote:



Am 26.10.2015 um 11:54 schrieb Mitchell Joblin:
Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/query.r | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/codeface/R/query.r b/codeface/R/query.r
index fa295cf..f087b0d 100644
--- a/codeface/R/query.r
+++ b/codeface/R/query.r
@@ -542,6 +542,21 @@ query.top.contributors.changes <- function(con,
range.id, limit=20) {
return(dat)
}

+## Compute edgelist for mailing list communication

Is this description accurate? It's only yielding edges for the OP of
each thread, it's omitting all edges occurring inside each thread.

Why should it only yield edges for the OP when it is joined with
thread_responses?


+query.mail.edgelist <- function(con, pid, start.date, end.date) {
+ query <- str_c("SELECT who AS `from`, createdBy AS `to`, COUNT(*) AS
`weight`",
+ "FROM mail_thread, thread_responses",
+ "WHERE
mail_thread.mailThreadId=thread_responses.mailThreadId",
+ "AND projectId=", pid,
+ "AND mailDate >=", sq(start.date),
+ "AND mailDate <", sq(end.date),

Does it make sense to partition a thread like that or should this match
on the thread creation date instead?

Each thread is assigned a unique Id, the creation date is not
necessarily unique. What if two threads are created at the exact same
moment in time?


+ "GROUP BY mail_thread.mailThreadId",

I don't think this is doing what you expected it to do. You would need
to group both by mail_thread.mailThreadId and thread_responses.who,
otherwise you will get just a single edge per thread with a random value
in the "from" field.

Right, that is totally wrong. Thanks for catching it.

--Mitchell


-- Andreas

+ sep=" ")
+ dat <- dbGetQuery(con, query)
+
+ return(dat)
+}
+
## Distributions for commit statistics
query.contributions.stats.range <- function(con, range.id,
include.id=FALSE) {
if (include.id) {



Other related posts: