[codeface] [PATCH] Fix double-counting of commits

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Fri, 23 Sep 2016 09:37:53 -1000

Commit 1460004ce (attached) in branch for-yedemin fixes a problem with
double counting of commits that appears in most analysis modes
(exception: tagging). While it does not affect any of the community and
cluster analyses, it will in many cases incorrectly over-estimate
the amount of contributions in terms of LoC and patches of individual
authors.

If there are no objections, I'll merge early next week.

Best regards, Wolfgang

From 1460004ce49af305b5af34be67cb5b63cd54d227 Mon Sep 17 00:00:00 2001
From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
Date: Thu, 22 Sep 2016 15:43:55 -1000
Subject: [PATCH 1/1] Fix double-counting of commits
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When committer and author are distinguished (as is the
case for proximity, committer2author, and the file based
analysis modes), both persons are associated with one given
commit. However, both roles can be assumed by one person,
which leads to a double-counting of commit information in
the current implementation. Fix this.

Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
Reported-by: 叶德铭 <ydm14@xxxxxxxxxxxxxxxxxxxxx>
Tested-by: 李明杰 <li-mj14@xxxxxxxxxxxxxxxxxxxxx>
---
 codeface/cluster/cluster.py | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/codeface/cluster/cluster.py b/codeface/cluster/cluster.py
index d363ff3..04db423 100755
--- a/codeface/cluster/cluster.py
+++ b/codeface/cluster/cluster.py
@@ -1380,10 +1380,14 @@ def populatePersonDB(cmtlist, id_mgr, link_type=None):
                 (LinkType.proximity, LinkType.committer2author,
                  LinkType.file, LinkType.feature, LinkType.feature_file):
             #create person for committer
-            ID = id_mgr.getPersonID(cmt.getCommitterName())
-            pi = id_mgr.getPI(ID)
-            cmt.setCommitterPI(pi)
-            pi.addCommit(cmt)
+            ID_c = id_mgr.getPersonID(cmt.getCommitterName())
+            pi_c = id_mgr.getPI(ID_c)
+            cmt.setCommitterPI(pi_c)
+            if ID_c != ID:
+                # Only add the commit to the committer's person instance
+                # if committer and author differ, otherwise contributions
+                # will be counted twice.
+                pi_c.addCommit(cmt)
 
     return None
 
-- 
2.8.3

Other related posts:

  • » [codeface] [PATCH] Fix double-counting of commits - Wolfgang Mauerer