[codeface] Re: [PATCH 3/5] Allow mailing list analysis to run even if no vcs analysis

  • From: Mitchell Joblin <joblin.m@xxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Thu, 5 Nov 2015 17:53:44 +0100

On Thu, Nov 5, 2015 at 5:36 PM, Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxxxxxxxx> wrote:



Am 05/11/15 um 17:11 schrieb Mitchell Joblin:
On Thu, Nov 5, 2015 at 5:00 PM, Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxxxxxxxx> wrote:


Am 05/11/15 um 16:56 schrieb Mitchell Joblin:
On Wed, Nov 4, 2015 at 11:50 PM, Wolfgang Mauerer
<wolfgang.mauerer@xxxxxxxxxxxxxxxxx> wrote:


Am 29/10/15 um 11:57 schrieb Mitchell Joblin:
- The global mail analysis does not really on release ranges
really _rely_?
so it should still be able to execute without them

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/ml/analysis.r | 33 +++++++++++++++++++--------------
codeface/R/query.r | 2 +-
2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
index eb137b6..80b36d5 100644
--- a/codeface/R/ml/analysis.r
+++ b/codeface/R/ml/analysis.r
@@ -333,22 +333,27 @@ dispatch.all <- function(conf, repo.path, resdir) {
## Compute a list of intervals for the project release cycles
cycles <- get.cycles(conf)

- ## NOTE: We store the lubridate intervals in a list (instead of
- ## simply appending them to the cycles data frame) because they
- ## are coerced to numeric by the conversion to a data frame.
- release.intervals <- list(dim(cycles)[1])
- for (i in 1:(dim(cycles)[1])) {
- release.intervals[[i]] <- new_interval(cycles$date.start[[i]],
- cycles$date.end[[i]])
- }
+ if(nrow(cycles)!=0){
Coding style
+ ## NOTE: We store the lubridate intervals in a list (instead of
+ ## simply appending them to the cycles data frame) because they
+ ## are coerced to numeric by the conversion to a data frame.
+ release.intervals <- list(dim(cycles)[1])
+ for (i in 1:(dim(cycles)[1])) {
+ release.intervals[[i]] <- new_interval(cycles$date.start[[i]],
+ cycles$date.end[[i]])
+ }

- release.labels <- as.list(cycles$cycle)
+ release.labels <- as.list(cycles$cycle)

- ## The mailing list data may not cover the complete timeframe of
- ## the repository, so remove any empty intervals
- nonempty.release.intervals <- get.nonempty.intervals(dates,
release.intervals)
- release.intervals <- release.intervals[nonempty.release.intervals]
- release.labels <- release.labels[nonempty.release.intervals]
+ ## The mailing list data may not cover the complete timeframe of
+ ## the repository, so remove any empty intervals
+ nonempty.release.intervals <- get.nonempty.intervals(dates,
release.intervals)
+ release.intervals <- release.intervals[nonempty.release.intervals]
+ release.labels <- release.labels[nonempty.release.intervals]
+ }
+ else {
+ nonempty.release.intervals <- NULL
+ }

## Obtain a unique numerical ID for the mailing list (and clear
## any existing results on the way)
diff --git a/codeface/R/query.r b/codeface/R/query.r
index d30b749..720e9b9 100644
--- a/codeface/R/query.r
+++ b/codeface/R/query.r
@@ -129,7 +129,7 @@ get.cycles.con <- function(con, pid,
boundaries=FALSE) {
"FROM revisions_view ",
"WHERE projectId=", pid))
if (nrow(res) == 0) {
- stop(paste("No release range found for projectId=", pid))
+ logwarn(paste("No release range found for projectId=", pid))

please add some information that the mailing list analysis is run
nonetheless -- otherwise, users might not know what the point of
the warning is.

The warning is coming from the database query, but that query might be
used in multiple place other than for the mailing list analysis. We
don't send warning for any other queries that return empty data
frames. I think the original stop statement should have actually been
handles in the code calling the query, not the query itself. Shall I
just remove the warning?

originally, all analyses were required to have at least one release
range, so stop()ing centrally was reasonable. Now we've relaxed this
assumption, but I'd still like to have a clear indication of when things
are badly broken (VCS analysis without release ranges) instead of
running into hard to debug follow-up problems. So we could either
fix all callers and do an explicit check when we require a num-empty
list of ranges, or introduce an optional parameter like
allow.empty.ranges that defaults to false, and keep the stop()
conditionally.

Now I'm confused. You were originally were ok with removing the stop
statement in exchange for the warning. But you wanted to add that the
global email analysis would still run. My point was that it doesn't
make sense to mention the email analysis because code other than the
mailing list analysis makes calls to this query. But now the change
has nothing to do with enhancing the warning message, a warning
message is simply not sufficient here?

your remark about the check taking place in the db layer caused me to
indeed change my mind -- since the warning is not emitted in the ML
specific part, we should not weaken existing checks. Henceforth my
new suggestion. --Wolfgang

If we have preconditions for the state of the database tables after an
the vcs analysis that are imposed by the web frontend or whatever,
then all these types of checks should be collected in a central place
rather than have preconditions scattered throughout the query
implementations. We should run those checks immediately after
populating the data base because the query for cycles could occur ages
later in a totally different part of the code. I guess the default
parameter is just a quick band aid solution?

--Mitchell



Other related posts: