Am 05/03/2015 um 14:19 schrieb Matthias Gemmer: >> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im >> Auftrag von Mitchell Joblin <joblin.m@xxxxxxxxx> >> Gesendet: Donnerstag, 5. März 2015 13:14 >> An: Wolfgang Mauerer >> Cc: codeface@xxxxxxxxxxxxx >> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: >> Preparing time series data - sloccount analysis >> >> On Thu, Mar 5, 2015 at 11:12 AM, Wolfgang Mauerer >> <wolfgang.mauerer@xxxxxxxxxxx> wrote: >>> On 05.03.2015 12:04, Matthias Gemmer wrote: >>>>>>>>>>>> >>>>>>>>>>>> Browse[1]> print(plot.id) >>>>>>>>>>>> numeric(0) >>>>>>>>>>> >>>>>>>>>>> so that's the culprit... There is no valid plot ID for the time >>>>>>>>>>> series in the database. Can you please check that an appropriate >>>>>>>>>>> table is available in the database? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> There is a table called timeseries with the column plotId. >>>>>>>>>> mysql> DESCRIBE timeseries; >>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>> | Field | Type | Null | Key | Default | Extra | >>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>> | plotId | bigint(20) | NO | MUL | NULL | | >>>>>>>>>> | time | datetime | NO | | NULL | | >>>>>>>>>> | value | double | NO | | NULL | | >>>>>>>>>> | value_scaled | double | YES | | NULL | | >>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>> 4 rows in set (0.02 sec) >>>>>>>>>> >>>>>>>>>> The table is also filled with data. The table contains datasets for >>>>>>>>>> plotId=5, plotId=6, plotId=7 and plotId=8. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Which values do sloccount.plot.id (and understand.plot.id) have >>>>>>>>>>> in do.complexity.analysis (Frame 3/4)? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The values for sloccount.plot.id and understand.plot.id are >>>>>>>>>> obviously >>>>>>>>>> invalid. >>>>>>>>>> >>>>>>>>>> Browse[1]> print(sloccount.plot.id) >>>>>>>>>> numeric(0) >>>>>>>>>> Browse[1]> print(understand.plot.id) >>>>>>>>>> numeric(0) >>>>>>>>> >>>>>>>>> >>>>>>>>> it was not so obvious to me; I was trying to ensure that >>>>>>>>> parallelisation did not introduce any issues here. But your >>>>>>>>> observation >>>>>>>>> clarified that this is not the case. >>>>>>>>> >>>>>>>>> Since the error seems to be deterministically reproducible at your >>>>>>>>> site, can you debug around the creation of the index (for instance by >>>>>>>>> printing out what's going on; alternatively, you could also use the >>>>>>>>> built-in debugger)? >>>>>>>>> >>>>>>>> >>>>>>>> In the file codeface/R/complexity.r: >>>>>>>> >>>>>>>> Assignment of sloccount.plot.id and understand.plot.id: >>>>>>>> ## Obtain a plot IDs for the sloccount and understand raw time >>>>>>>> series before >>>>>>>> ## parallel processing commences to avoid race conditions >>>>>>>> sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount") >>>>>>>> understand.plot.id <- get.or.create.plot.id(conf, "understand_raw") >>>>>>>> -> sloccount.plot.id and understand.plot.id have the value "x". >>>>>>>> Are these values feasible? Or Shall I have a closer look >>>>>>>> at the function 'get.or.create.plot.id'? >>>>>>> >>>>>>> >>>>>>> since the SQL specification for the plot ID is >>>>>>> >>>>>>> `id` BIGINT NOT NULL AUTO_INCREMENT >>>>>>> >>>>>>> the value "x" seems quite impossible. Can you please query your >>>>>>> database to see what value is stored there? >>>>>>> >>>>>> >>>>>> The table is empty. >>>>>> mysql> select * from plots; >>>>>> Empty set (0.01 sec) >>>>> >>>>> >>>>> please try to run the other SQL statements produced by the code to see >>>>> why no entry is created. get.or.create.plot.id() inserts a new entry >>>>> into the table is no ID for a desired plot is available. >>>> >>>> >>>> The branch which creates a plot ID is not entered. The condition >>>> 'length(res) < 1' is >>>> in both cases (sloccount.plot.id and understand.plot.id) not satisfied. >>>> >>>> For sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount"): >>>> res <- dbGetQuery(con, str_c(query, ";")) >>>> # str_c(query, ";"): SELECT id FROM plots WHERE name='sloccount' AND >>>> projectId=2; >>>> # res: "id" >>>> # length(res): 1 >>>> if (length(res) < 1) { >>>> ## Plot ID is not assigned yet, create one >>>> res <- get.clear.plot.id.con(con, pid, plot.name, range.id) >>>> } else { >>>> res <- res$id >>>> } >>>> # res: "x" >>> >>> >>> @Mitchell, could you try to reproduce this? I don't see why a result >>> with non-zero length should be returned from the SQL query if the >>> database is empty. >> >> The SQL query probably returns a data frame and length(..) called on a >> data frame does not return the number of rows. To get the number of >> rows of a data frame you should be using nrow(..) instead of >> length(..). >> >> --Mitchell >> > > That worked for me. > After replacing 'length' with 'nrow' a new plot ID is created! The following patch should fix this for good then: > diff --git a/codeface/R/db.r b/codeface/R/db.r > index db53811..32da240 100644 > --- a/codeface/R/db.r > +++ b/codeface/R/db.r > @@ -59,10 +59,10 @@ get.clear.plot.id.con <- function(con, pid, plot.name, > range.id=NULL, > > res <- dbGetQuery(con, str_c("SELECT id", query)) > > - if (length(res) != 1) { > + if (nrow(res) != 1) { > stop("Internal error: Plot ", plot.name, " appears multiple times in DB", > "for project ID ", pid) > - } > +} > > return(res$id) > } > @@ -81,7 +81,7 @@ get.plot.id.con <- function(con, pid, plot.name, > range.id=NULL) { > query <- str_c(query, " AND releaseRangeId=", range.id) > } > res <- dbGetQuery(con, str_c("SELECT id", query)) > - if (length(res) < 1) { > + if (nrow(res) < 1) { > stop("Internal error: Plot ", plot.name, " not found in DB", > " for project ID ", pid) > } > @@ -104,7 +104,7 @@ get.or.create.plot.id.con <- function(con, pid, > plot.name, range.id=NULL) { > } > res <- dbGetQuery(con, str_c(query, ";")) > > - if (length(res) < 1) { > + if (nrow(res) < 1) { > ## Plot ID is not assigned yet, create one > res <- get.clear.plot.id.con(con, pid, plot.name, range.id) > } else { > @@ -125,7 +125,7 @@ get.revision.id <- function(conf, tag) { > str_c("SELECT id FROM release_timeline WHERE projectId=", > conf$pid, " AND tag=", sq(tag), " AND > type='release'")) > > - if (length(res) > 1) { > + if (nrow(res) > 1) { > stop("Internal error: Revision if for tag ", tag, " (project ", > conf$project, > ") appears multiple times in DB!") > } However, I don't really understand why it worked before in this case... @Mitchell: I'll push the patch to master unless you object, but can you please try to understand why we did not run into problems earlier? Thanks, Wolfgang