On Thu, Mar 5, 2015 at 1:50 PM, Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx> wrote: > > > Am 05/03/2015 um 14:19 schrieb Matthias Gemmer: >>> Von: codeface-bounce@xxxxxxxxxxxxx <codeface-bounce@xxxxxxxxxxxxx> im >>> Auftrag von Mitchell Joblin <joblin.m@xxxxxxxxx> >>> Gesendet: Donnerstag, 5. März 2015 13:14 >>> An: Wolfgang Mauerer >>> Cc: codeface@xxxxxxxxxxxxx >>> Betreff: [codeface] Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: AW: Re: >>> Preparing time series data - sloccount analysis >>> >>> On Thu, Mar 5, 2015 at 11:12 AM, Wolfgang Mauerer >>> <wolfgang.mauerer@xxxxxxxxxxx> wrote: >>>> On 05.03.2015 12:04, Matthias Gemmer wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Browse[1]> print(plot.id) >>>>>>>>>>>>> numeric(0) >>>>>>>>>>>> >>>>>>>>>>>> so that's the culprit... There is no valid plot ID for the time >>>>>>>>>>>> series in the database. Can you please check that an appropriate >>>>>>>>>>>> table is available in the database? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There is a table called timeseries with the column plotId. >>>>>>>>>>> mysql> DESCRIBE timeseries; >>>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>>> | Field | Type | Null | Key | Default | Extra | >>>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>>> | plotId | bigint(20) | NO | MUL | NULL | | >>>>>>>>>>> | time | datetime | NO | | NULL | | >>>>>>>>>>> | value | double | NO | | NULL | | >>>>>>>>>>> | value_scaled | double | YES | | NULL | | >>>>>>>>>>> +--------------+------------+------+-----+---------+-------+ >>>>>>>>>>> 4 rows in set (0.02 sec) >>>>>>>>>>> >>>>>>>>>>> The table is also filled with data. The table contains datasets for >>>>>>>>>>> plotId=5, plotId=6, plotId=7 and plotId=8. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Which values do sloccount.plot.id (and understand.plot.id) have >>>>>>>>>>>> in do.complexity.analysis (Frame 3/4)? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The values for sloccount.plot.id and understand.plot.id are >>>>>>>>>>> obviously >>>>>>>>>>> invalid. >>>>>>>>>>> >>>>>>>>>>> Browse[1]> print(sloccount.plot.id) >>>>>>>>>>> numeric(0) >>>>>>>>>>> Browse[1]> print(understand.plot.id) >>>>>>>>>>> numeric(0) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> it was not so obvious to me; I was trying to ensure that >>>>>>>>>> parallelisation did not introduce any issues here. But your >>>>>>>>>> observation >>>>>>>>>> clarified that this is not the case. >>>>>>>>>> >>>>>>>>>> Since the error seems to be deterministically reproducible at your >>>>>>>>>> site, can you debug around the creation of the index (for instance by >>>>>>>>>> printing out what's going on; alternatively, you could also use the >>>>>>>>>> built-in debugger)? >>>>>>>>>> >>>>>>>>> >>>>>>>>> In the file codeface/R/complexity.r: >>>>>>>>> >>>>>>>>> Assignment of sloccount.plot.id and understand.plot.id: >>>>>>>>> ## Obtain a plot IDs for the sloccount and understand raw time >>>>>>>>> series before >>>>>>>>> ## parallel processing commences to avoid race conditions >>>>>>>>> sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount") >>>>>>>>> understand.plot.id <- get.or.create.plot.id(conf, "understand_raw") >>>>>>>>> -> sloccount.plot.id and understand.plot.id have the value "x". >>>>>>>>> Are these values feasible? Or Shall I have a closer look >>>>>>>>> at the function 'get.or.create.plot.id'? >>>>>>>> >>>>>>>> >>>>>>>> since the SQL specification for the plot ID is >>>>>>>> >>>>>>>> `id` BIGINT NOT NULL AUTO_INCREMENT >>>>>>>> >>>>>>>> the value "x" seems quite impossible. Can you please query your >>>>>>>> database to see what value is stored there? >>>>>>>> >>>>>>> >>>>>>> The table is empty. >>>>>>> mysql> select * from plots; >>>>>>> Empty set (0.01 sec) >>>>>> >>>>>> >>>>>> please try to run the other SQL statements produced by the code to see >>>>>> why no entry is created. get.or.create.plot.id() inserts a new entry >>>>>> into the table is no ID for a desired plot is available. >>>>> >>>>> >>>>> The branch which creates a plot ID is not entered. The condition >>>>> 'length(res) < 1' is >>>>> in both cases (sloccount.plot.id and understand.plot.id) not satisfied. >>>>> >>>>> For sloccount.plot.id <- get.or.create.plot.id(conf, "sloccount"): >>>>> res <- dbGetQuery(con, str_c(query, ";")) >>>>> # str_c(query, ";"): SELECT id FROM plots WHERE name='sloccount' AND >>>>> projectId=2; >>>>> # res: "id" >>>>> # length(res): 1 >>>>> if (length(res) < 1) { >>>>> ## Plot ID is not assigned yet, create one >>>>> res <- get.clear.plot.id.con(con, pid, plot.name, range.id) >>>>> } else { >>>>> res <- res$id >>>>> } >>>>> # res: "x" >>>> >>>> >>>> @Mitchell, could you try to reproduce this? I don't see why a result >>>> with non-zero length should be returned from the SQL query if the >>>> database is empty. >>> >>> The SQL query probably returns a data frame and length(..) called on a >>> data frame does not return the number of rows. To get the number of >>> rows of a data frame you should be using nrow(..) instead of >>> length(..). >>> >>> --Mitchell >>> >> >> That worked for me. >> After replacing 'length' with 'nrow' a new plot ID is created! > > The following patch should fix this for good then: > >> diff --git a/codeface/R/db.r b/codeface/R/db.r >> index db53811..32da240 100644 >> --- a/codeface/R/db.r >> +++ b/codeface/R/db.r >> @@ -59,10 +59,10 @@ get.clear.plot.id.con <- function(con, pid, plot.name, >> range.id=NULL, >> >> res <- dbGetQuery(con, str_c("SELECT id", query)) >> >> - if (length(res) != 1) { >> + if (nrow(res) != 1) { >> stop("Internal error: Plot ", plot.name, " appears multiple times in >> DB", >> "for project ID ", pid) >> - } >> +} >> >> return(res$id) >> } >> @@ -81,7 +81,7 @@ get.plot.id.con <- function(con, pid, plot.name, >> range.id=NULL) { >> query <- str_c(query, " AND releaseRangeId=", range.id) >> } >> res <- dbGetQuery(con, str_c("SELECT id", query)) >> - if (length(res) < 1) { >> + if (nrow(res) < 1) { >> stop("Internal error: Plot ", plot.name, " not found in DB", >> " for project ID ", pid) >> } >> @@ -104,7 +104,7 @@ get.or.create.plot.id.con <- function(con, pid, >> plot.name, range.id=NULL) { >> } >> res <- dbGetQuery(con, str_c(query, ";")) >> >> - if (length(res) < 1) { >> + if (nrow(res) < 1) { >> ## Plot ID is not assigned yet, create one >> res <- get.clear.plot.id.con(con, pid, plot.name, range.id) >> } else { >> @@ -125,7 +125,7 @@ get.revision.id <- function(conf, tag) { >> str_c("SELECT id FROM release_timeline WHERE >> projectId=", >> conf$pid, " AND tag=", sq(tag), " AND >> type='release'")) >> >> - if (length(res) > 1) { >> + if (nrow(res) > 1) { >> stop("Internal error: Revision if for tag ", tag, " (project ", >> conf$project, >> ") appears multiple times in DB!") >> } > > However, I don't really understand why it worked before in this case... > @Mitchell: I'll push the patch to master unless you object, but can you > please try to understand why we did not run into problems earlier? Please push to master after running the test suite, otherwise please make a pull request and then I will test it before merging. Thanks, Mitchell > > Thanks, Wolfgang >